On Wed, May 25, 2016 at 9:52 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> Spark is more for machine learning working iteravely over the whole same > dataset in memory. Additionally it has streaming and graph processing > capabilities that can be used together. > Hi Jörn, The first part is actually no true. Spark can handle data far greater than the aggregate memory available on a cluster. The more recent versions (1.3+) of Spark have external operations for almost all built-in operators, and while things may not be perfect, those external operators are becoming more and more robust with each version of Spark.