Re: Spark Job Hanging on Join

2016-02-23 Thread Dave Moyers
Congrats! Sent from my iPad > On Feb 23, 2016, at 2:43 AM, Mohannad Ali wrote: > > Hello Everyone, > > Thanks a lot for the help. We also managed to solve it but without resorting > to spark 1.6. > > The problem we were having was because of a really bad join condition: >

Re: Spark Job Hanging on Join

2016-02-23 Thread Alonso Isidoro Roman
thanks for sharing the know how guys Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dijkstra My favorite quotes (today): "If debugging is the process of

Re: Spark Job Hanging on Join

2016-02-23 Thread Mohannad Ali
Hello Everyone, Thanks a lot for the help. We also managed to solve it but without resorting to spark 1.6. The problem we were having was because of a really bad join condition: ON ((a.col1 = b.col1) or (a.col1 is null and b.col1 is null)) AND ((a.col2 = b.col2) or (a.col2 is null and b.col2 is

Re: Spark Job Hanging on Join

2016-02-22 Thread Dave Moyers
Good article! Thanks for sharing! > On Feb 22, 2016, at 11:10 AM, Davies Liu wrote: > > This link may help: > https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html > > Spark 1.6 had improved the CatesianProduct, you should

Re: Spark Job Hanging on Join

2016-02-22 Thread Davies Liu
This link may help: https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html Spark 1.6 had improved the CatesianProduct, you should turn of auto broadcast and go with CatesianProduct in 1.6 On Mon, Feb 22, 2016 at 1:45 AM, Mohannad Ali

Re: Spark Job Hanging on Join

2016-02-22 Thread Mohannad Ali
Hello everyone, I'm working with Tamara and I wanted to give you guys an update on the issue: 1. Here is the output of .explain(): > Project >

Re: Spark Job Hanging on Join

2016-02-21 Thread Gourav Sengupta
Sorry, please include the following questions to the list above: the SPARK version? whether you are using RDD or DataFrames? is the code run locally or in SPARK Cluster mode or in AWS EMR? Regards, Gourav Sengupta On Sun, Feb 21, 2016 at 7:37 PM, Gourav Sengupta

Re: Spark Job Hanging on Join

2016-02-21 Thread Gourav Sengupta
Hi Tamara, few basic questions first. How many executors are you using? Is the data getting all cached into the same executor? How many partitions do you have of the data? How many fields are you trying to use in the join? If you need any help in finding answer to these questions please let me

Re: Spark Job Hanging on Join

2016-02-20 Thread Dave Moyers
Try this setting in your Spark defaults: spark.sql.autoBroadcastJoinThreshold=-1 I had a similar problem with joins hanging and that resolved it for me. You might be able to pass that value from the driver as a --conf option, but I have not tried that, and not sure if that will work. Sent

Re: Spark Job Hanging on Join

2016-02-19 Thread Michael Armbrust
Please include the output of running explain() when reporting performance issues with DataFrames. On Fri, Feb 19, 2016 at 9:31 AM, Tamara Mendt wrote: > Hi all, > > I am running a Spark job that gets stuck attempting to join two > dataframes. The dataframes are not very

Spark Job Hanging on Join

2016-02-19 Thread Tamara Mendt
Hi all, I am running a Spark job that gets stuck attempting to join two dataframes. The dataframes are not very large, one is about 2 M rows, and the other a couple of thousand rows and the resulting joined dataframe should be about the same size as the smaller dataframe. I have tried triggering