Re: Spark job running for long time
Hi Yeikel, I can not copy anything from the system. But I have seen explain output. It was doing sortMergeJoin for all tables. There are 10 tables , all of them doing left outer join. Out of 10 tables, 1 table is of 50MB and second table is of 200MB. Rest are big tables. Also the data is in Avro form. I am using spark 2.2 I suspect broadcast can help , not sure because broadcast works for 10MB sized smaller tables Thanks Rajat On Wed, 17 Apr 2019, 23:53 Yeikel Can you share the output of df.explain() ? > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Spark job running for long time
Can you share the output of df.explain() ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark job running for long time
Hi , Thanks for response! We are doing 12 left outer joins. Also I see GC is colored as red in Spark UI. It seems GC is also taking time. We have tried using kyro serialization. Tried giving more memory to executor as well as driver. But it didn't work. On Wed, 17 Apr 2019, 23:35 Yeikel We need more information about your job to be able to help you. Please > share > some snippets or the overall idea of what you are doing > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Spark job running for long time
We need more information about your job to be able to help you. Please share some snippets or the overall idea of what you are doing -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark job running for long time
Hi All, One of my containers is still running for long time. In logs it is showing "Thread 240 spilling sort data of 10.4 GB to disk". This is happening every minute. Thanks Rajat