Re: What influences the space complexity of Spark operations?

2016-04-05 Thread Steve Johnston
Submitted: SPARK-14389 - OOM during BroadcastNestedLoopJoin. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/What-influences-the-space-complexity-of-Spark-operations-tp16944p17029.html Sent from the Apache Spark Developers List mailing list archive

Re: What influences the space complexity of Spark operations?

2016-04-01 Thread Michael Armbrust
ineitem.tbl').map(converter) > lineitem = sqlContext.createDataFrame(lineitem, schema) > lineitem.persist(StorageLevel.MEMORY_AND_DISK) > repartitioned = lineitem.repartition(partition_count) > joined = repartitioned.join(repartitioned) > joined.show() > > > *Questions* > > Generally, what

What influences the space complexity of Spark operations?

2016-03-31 Thread Steve Johnston
= lineitem.repartition(partition_count)joined = repartitioned.join(repartitioned)joined.show() *Questions* Generally, what influences the space complexity of Spark operations? Is it the case that a single partition of each operand’s data set + a single partition of the resulting data set all need to fit