Re: Scala vs Python performance differences
I was interested in this as I had some Spark code in Python that was too slow and wanted to know whether Scala would fix it for me. So I re-wrote my code in Scala. In my particular case the Scala version was 10 times faster. But I think that is because I did an awful lot of computation in my own code rather than in a library like numpy. (I put a bit more detail here http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala in case you are interested) So there's one data point, if only for the obvious data point comparing computations in Scala to computations in pure Python. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Scala vs Python performance differences
Hey Phil, Thank you sharing this. The result didn't surprise me a lot, it's normal to do the prototype in Python, once it get stable and you really need the performance, then rewrite part of it in C or whole of it in another language does make sense, it will not cause you much time. Davies On Fri, Jan 16, 2015 at 7:38 AM, philpearl p...@tanktop.tv wrote: I was interested in this as I had some Spark code in Python that was too slow and wanted to know whether Scala would fix it for me. So I re-wrote my code in Scala. In my particular case the Scala version was 10 times faster. But I think that is because I did an awful lot of computation in my own code rather than in a library like numpy. (I put a bit more detail here http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala in case you are interested) So there's one data point, if only for the obvious data point comparing computations in Scala to computations in pure Python. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Scala vs Python performance differences
Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I'd also be interested in seeing such a benchmark. On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira ianferre...@hotmail.com wrote: This would be super useful. Thanks. On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor mance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Scala vs Python performance differences
I was about to ask this question. On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash and...@andrewash.com wrote: Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I'd also be interested in seeing such a benchmark. On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira ianferre...@hotmail.com wrote: This would be super useful. Thanks. On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor mance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Scala vs Python performance differences
This would be super useful. Thanks. On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor mance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Scala vs Python performance differences
Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.