Our experience matches Reynold's comments; pure-Python implementations of anything are generally sub-optimal compared to pure Scala implementations, or Scala versions exposed to Python (which are faster, but still slower than pure Scala). It also seems on first glance that some of the implementations in the paper themselves might not have been optimal (regardless of Python vs Scala).
All that said, we have found it useful to implement some workflows purely in Python, mainly when we want to exploit libraries like NumPy, SciPy, or Scikit Learn, or incorporate existing Python code bases, in which case the flexibility is worth a drop in performance, at least for us! This might also make more sense for specialized routines as opposed to core, low-level algorithms. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/A-Comparison-of-Platforms-for-Implementing-and-Running-Very-Large-Scale-Machine-Learning-Algorithms-tp7823p7825.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org