the issue about the + in column,can we support the string please?

2018-03-25 Thread 1427357...@qq.com
Hi all, I have a table like below: +---+-+---+ | id| name|sharding_id| +---+-+---+ | 1|leader us| 1| | 3|mycat| 1| +---+-+---+ My schema is : root |-- id: integer (nullable = false) |-- name: string (nullable = true)

Re:Re: how to use lit() in spark-java

2018-03-25 Thread 崔苗
It works,thanks 在 2018-03-23 21:33:41,Anil Langote 写道: You have import functions dataset.withColumn(columnName,functions.lit("constant")) Thank you Anil Langote Sent from my iPhone _ From: 崔苗 Sent: Friday,

Re:Re: how to use lit() in spark-java

2018-03-25 Thread 崔苗
It works,thanks 在 2018-03-23 21:47:52,"Anthony, Olufemi" 写道: You can us import static to import it directly: import static org.apache.spark.sql.functions.lit; Femi From: 崔苗 Date: Friday, March 23, 2018 at 8:34 AM To:

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Shmuel Blitz
Hi Rohit, Thanks for the analysis. I can use repartition on the slow task. But how can I tell what part of the code is in charge of the slow tasks? It would be great if you could further explain the rest of the output. Thanks in advance, Shmuel On Sun, Mar 25, 2018 at 12:46 PM, Rohit Karlupia

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Rohit Karlupia
Thanks Shamuel for trying out sparklens! Couple of things that I noticed: 1) 250 executors is probably overkill for this job. It would run in same time with around 100. 2) Many of stages that take long time have only 200 tasks where as we have 750 cores available for the job. 200 is the default

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Shmuel Blitz
I ran it on a single job. SparkLens has an overhead on the job duration. I'm not ready to enable it by default on all our jobs. Attached is the output. Still trying to understand what exactly it means. On Sun, Mar 25, 2018 at 10:40 AM, Fawze Abujaber wrote: > Nice! > >

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Fawze Abujaber
Nice! Shmuel, Were you able to run on a cluster level or for a specific job? Did you configure it on the spark-default.conf? On Sun, 25 Mar 2018 at 10:34 Shmuel Blitz wrote: > Just to let you know, I have managed to run SparkLens on our cluster. > > I switched to

Re: Open sourcing Sparklens: Qubole's Spark Tuning Tool

2018-03-25 Thread Shmuel Blitz
Just to let you know, I have managed to run SparkLens on our cluster. I switched to the spark_1.6 branch, and also compiled against the specific image of Spark we are using (cdh5.7.6). Now I need to figure out what the output means... :P Shmuel On Fri, Mar 23, 2018 at 7:24 PM, Fawze Abujaber