Our experience matches Reynold's comments; pure-Python implementations of
anything are generally sub-optimal compared to pure Scala implementations,
or Scala versions exposed to Python (which are faster, but still slower than
pure Scala). It also seems on first glance that some of the implementations
in the paper themselves might not have been optimal (regardless of Python vs
Scala).

All that said, we have found it useful to implement some workflows purely in
Python, mainly when we want to exploit libraries like NumPy, SciPy, or
Scikit Learn, or incorporate existing Python code bases, in which case the
flexibility is worth a drop in performance, at least for us! This might also
make more sense for specialized routines as opposed to core, low-level
algorithms.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/A-Comparison-of-Platforms-for-Implementing-and-Running-Very-Large-Scale-Machine-Learning-Algorithms-tp7823p7825.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to