Re: How to debug Spark job

2018-09-08 Thread Marco Mistroni
Hi Might sound like a dumb advice. But try to break apart your process. Sounds like you Are doing ETL start basic with just ET. and do the changes that results in issues If no problem add the load step Enable spark logging so that you can post error message to the list I think you can have a look

Re: [External Sender] How to debug Spark job

2018-09-08 Thread Sonal Goyal
You could also try to profile your program on the executor or driver by using jvisualvm or yourkit to see if there is any memory/cpu optimization you could do. Thanks, Sonal Nube Technologies On Fri, Sep 7, 2018 at 6:35 PM, James

Re: [External Sender] How to debug Spark job

2018-09-07 Thread James Starks
Got the root cause eventually as it throws java.lang.OutOfMemoryError: Java heap space. Increasing --driver-memory temporarily fixes the problem. Thanks. ‐‐‐ Original Message ‐‐‐ On 7 September 2018 12:32 PM, Femi Anthony wrote: > One way I would go about this would be to try running

Re: [External Sender] How to debug Spark job

2018-09-07 Thread Femi Anthony
One way I would go about this would be to try running a new_df.show(numcols, truncate=False) on a few columns before you try writing to parquet to force computation of newdf and see whether the hanging is occurring at that point or during the write. You may also try doing a newdf.count() as well.

How to debug Spark job

2018-09-07 Thread James Starks
I have a Spark job that reads from a postgresql (v9.5) table, and write result to parquet. The code flow is not complicated, basically case class MyCaseClass(field1: String, field2: String) val df = spark.read.format("jdbc")...load() df.createOrReplaceTempView(...) val newdf =