Re: Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Zahid Rahman
http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options says "Note that this information is only available for the duration of the application by default. To view the web UI after the fact, set spark.eventLog.enabled to true before starting the application. This configures

Re: Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Ruijing Li
Thanks Zahid, Yes I am using history server to see previous UIs. However, my question still remains on viewing old thread dumps, as I cannot see them on the old completed spark UIs, only when spark context is running. On Wed, Apr 8, 2020 at 4:01 PM Zahid Rahman wrote: > Spark UI is only

Re: Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Zahid Rahman
Spark UI is only available while SparkContext is running. However You can get to the Spark UI after your application completes or crashes. To do this Spark includes a tool called the Spark History Server that allows you to reconstruct the Spark UI. You can find up to date information on how

Can you view thread dumps on spark UI if job finished

2020-04-08 Thread Ruijing Li
Hi all, As stated in title, currently when I view the spark UI of a completed spark job, I see there are thread dump links in the executor tab, but clicking on them does nothing. Is it possible to see the thread dumps somehow even if the job finishes? On spark 2.4.5. Thanks. -- Cheers, Ruijing

[Pyspark] - Spark uses all available memory; unrelated to size of dataframe

2020-04-08 Thread Daniel Stojanov
My setup: using Pyspark; Mongodb to retrieve and store final results; Spark is in standalone cluster mode, on a single desktop. Spark v.2.4.4. Openjdk 8. My spark application (using pyspark) uses all available system memory. This seems to be unrelated to the data being processed. I tested with

Re: Spark Streaming on Compact Kafka topic - consumers 1 message per partition per batch

2020-04-08 Thread Hrishikesh Mishra
It seems, I found the issue. The actual problem is something related to back pressure. When I am adding these config *spark.streaming.kafka.maxRatePerPartition* or *spark.streaming.backpressure.initialRate* (the of these configs are 100). After that it starts consuming one message per partition

How to handle Null values in Array of struct elements in pyspark

2020-04-08 Thread anbutech
Hello All, We have a data in a column in pyspark dataframe having array of struct type having multiple nested fields present.if the value is not blank it will save the data in the same array of struct type in spark delta table. please advise on the below case: if the same column coming as blank