Re: Scala vs Python performance differences

2015-01-16 Thread philpearl
I was interested in this as I had some Spark code in Python that was too slow
and wanted to know whether Scala would fix it for me.  So I re-wrote my code
in Scala.

In my particular case the Scala version was 10 times faster.  But I think
that is because I did an awful lot of computation in my own code rather than
in a library like numpy. (I put a bit more detail  here
http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala  
in case you are interested)

So there's one data point, if only for the obvious data point comparing
computations in Scala to computations in pure Python.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Scala vs Python performance differences

2015-01-16 Thread Davies Liu
Hey Phil,

Thank you sharing this. The result didn't surprise me a lot, it's normal to do
the prototype in Python, once it get stable and you really need the performance,
then rewrite part of it in C or whole of it in another language does make sense,
it will not cause you much time.

Davies

On Fri, Jan 16, 2015 at 7:38 AM, philpearl p...@tanktop.tv wrote:
 I was interested in this as I had some Spark code in Python that was too slow
 and wanted to know whether Scala would fix it for me.  So I re-wrote my code
 in Scala.

 In my particular case the Scala version was 10 times faster.  But I think
 that is because I did an awful lot of computation in my own code rather than
 in a library like numpy. (I put a bit more detail  here
 http://tttv-engineering.tumblr.com/post/108260351966/spark-python-vs-scala
 in case you are interested)

 So there's one data point, if only for the obvious data point comparing
 computations in Scala to computations in pure Python.





 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Scala vs Python performance differences

2014-11-12 Thread Andrew Ash
Jeremy,

Did you complete this benchmark in a way that's shareable with those
interested here?

Andrew

On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 I'd also be interested in seeing such a benchmark.


 On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira ianferre...@hotmail.com
 wrote:

 This would be super useful. Thanks.

 On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote:

 Hi Andrew,
 
 I'm putting together some benchmarks for PySpark vs Scala. I'm focusing
 on
 ML algorithms, as I'm particularly curious about the relative performance
 of
 MLlib in Scala vs the Python MLlib API vs pure Python implementations.
 
 Will share real results as soon as I have them, but roughly, in our
 hands,
 that 40% number is ballpark correct, at least for some basic operations
 (e.g
 textFile, count, reduce).
 
 -- Jeremy
 
 -
 Jeremy Freeman, PhD
 Neuroscientist
 @thefreemanlab
 
 
 
 --
 View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
 mance-differences-tp4247p4261.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.






Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
I was about to ask this question.

On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash and...@andrewash.com wrote:

 Jeremy,

 Did you complete this benchmark in a way that's shareable with those
 interested here?

 Andrew

 On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 I'd also be interested in seeing such a benchmark.


 On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira ianferre...@hotmail.com
 wrote:

 This would be super useful. Thanks.

 On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote:

 Hi Andrew,
 
 I'm putting together some benchmarks for PySpark vs Scala. I'm focusing
 on
 ML algorithms, as I'm particularly curious about the relative
 performance
 of
 MLlib in Scala vs the Python MLlib API vs pure Python implementations.
 
 Will share real results as soon as I have them, but roughly, in our
 hands,
 that 40% number is ballpark correct, at least for some basic operations
 (e.g
 textFile, count, reduce).
 
 -- Jeremy
 
 -
 Jeremy Freeman, PhD
 Neuroscientist
 @thefreemanlab
 
 
 
 --
 View this message in context:
 
 http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
 mance-differences-tp4247p4261.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.







Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
This would be super useful. Thanks.

On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote:

Hi Andrew,

I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
ML algorithms, as I'm particularly curious about the relative performance
of
MLlib in Scala vs the Python MLlib API vs pure Python implementations.

Will share real results as soon as I have them, but roughly, in our hands,
that 40% number is ballpark correct, at least for some basic operations
(e.g
textFile, count, reduce).

-- Jeremy

-
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab



--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor
mance-differences-tp4247p4261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.




Re: Scala vs Python performance differences

2014-04-14 Thread Jeremy Freeman
Hi Andrew,

I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on
ML algorithms, as I'm particularly curious about the relative performance of
MLlib in Scala vs the Python MLlib API vs pure Python implementations. 

Will share real results as soon as I have them, but roughly, in our hands,
that 40% number is ballpark correct, at least for some basic operations (e.g
textFile, count, reduce).

-- Jeremy

-
Jeremy Freeman, PhD
Neuroscientist
@thefreemanlab



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p4261.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.