Re: Spark job running for long time

2019-04-21 Thread rajat kumar
Hi Yeikel,

I can not copy anything from the system.
But I have seen explain output.

It was doing sortMergeJoin for all tables.
There are 10 tables , all of them doing left outer join.

Out of 10 tables, 1 table is of 50MB and second table is of 200MB. Rest are
big tables.

Also the data is in Avro form.

I am using spark 2.2

I suspect broadcast can help , not sure because broadcast works for 10MB
sized smaller tables

Thanks
Rajat

On Wed, 17 Apr 2019, 23:53 Yeikel  Can you share the output of df.explain() ?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark job running for long time

2019-04-17 Thread Yeikel
Can you share the output of df.explain() ?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark job running for long time

2019-04-17 Thread rajat kumar
Hi ,

Thanks for response!

We are doing 12 left outer joins. Also I see GC is colored as red in Spark
UI. It seems GC is also taking time.
We have tried using kyro serialization.  Tried  giving more memory to
executor as well as driver. But it didn't work.





On Wed, 17 Apr 2019, 23:35 Yeikel  We need more information about your job to be able to help you. Please
> share
> some snippets or the overall idea of what you are doing
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark job running for long time

2019-04-17 Thread Yeikel
We need more information about your job to be able to help you. Please share
some snippets or the overall idea of what you are doing



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark job running for long time

2019-04-17 Thread rajat kumar
Hi All,

One of my containers is still running for long time.
In logs it is showing "Thread 240 spilling sort data of 10.4 GB to disk".
This is happening every minute.


Thanks
Rajat