Re: pySpark driver memory limit

2017-11-08 Thread Nicolas Paris
Le 06 nov. 2017 à 19:56, Nicolas Paris écrivait : > Can anyone clarify the driver memory aspects of pySpark? > According to [1], spark.driver.memory limits JVM + python memory. > > In case: > spark.driver.memory=2G > Then does it mean the user won't be able to use more than 2G, whatever > the

Why the merge method in StructType is private

2017-11-08 Thread sathy
Hi, The spark/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala has a merge method but defined private to sql package and used for parquet merge schema. Is there a reason why this method cannot be public ? If not this method, is there a equivalent that is available to merge

Re: spark job paused(active stages finished)

2017-11-08 Thread Margusja
You have to deal with failed jobs. In example try catch in your code. Br Margus Roo > On 9 Nov 2017, at 05:37, bing...@iflytek.com wrote: > > Dear,All > I have a simple spark job, as below, all tasks in the stage 2(sth failed, > retry) already finished. But the next stage never run. > >

Spark http: Not showing completed apps

2017-11-08 Thread purna pradeep
Hi, I'm using spark standalone in aws ec2 .And I'm using spark rest API http::8080/Json to get completed apps but the Json completed apps as empty array though the job ran successfully.

Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-08 Thread Davide.Mandrini
In this case, the only way to check the status is via REST calls to the Spark json API, accessible at http://:/json/ -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-08 Thread bsikander
Thank you for the reply. I am currently not using SparkLauncher to launch my driver. Rather, I am using the old fashion spark-submit and moving to SparkLauncher is not an option right now. Do I have any options there? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: [Spark Structured Streaming] Changing partitions of (flat)MapGroupsWithState

2017-11-08 Thread Michael Armbrust
The relevant config is spark.sql.shuffle.partitions. Note that once you start a query, this number is fixed. The config will only affect queries starting from an empty checkpoint. On Wed, Nov 8, 2017 at 7:34 AM, Teemu Heikkilä wrote: > I have spark structured streaming job

[Spark Structured Streaming] Changing partitions of (flat)MapGroupsWithState

2017-11-08 Thread Teemu Heikkilä
I have spark structured streaming job and I'm crunching through few terabytes of data. I'm using file stream reader and it works flawlessly, I can adjust the partitioning of that with spark.default.parallelism However I'm doing sessionization for the data after loading it and I'm currently

Measure executor idle time

2017-11-08 Thread samar kumar
Hi, Is there a ways to measure idle time for spark executor. Any metrics or accumulators currently exposed. Regards, Samar