Re: Fw: Significant performance difference for same spark job in scala vs pyspark

2016-05-06 Thread pratik gawande
ther parts will bring in overheads. So the performance difference is expected, but you could tune the application to reduce the gap. Also because python RDD wraps a lot, so the DAG you saw is different from Scala, that is also expected. Thanks Saisai On Fri, May 6, 2016 at 12:47 PM, pratik

Fw: Significant performance difference for same spark job in scala vs pyspark

2016-05-05 Thread pratik gawande
Hello, I am new to spark. For one of job I am finding significant performance difference when run in pyspark vs scala. Could you please let me know if this is known and scala is preferred over python for writing spark jobs? Also DAG visualization shows completely different DAGs for scala and