Submitted: SPARK-14389 - OOM during BroadcastNestedLoopJoin.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/What-influences-the-space-complexity-of-Spark-operations-tp16944p17029.html
Sent from the Apache Spark Developers List mailing list archive
ineitem.tbl').map(converter)
> lineitem = sqlContext.createDataFrame(lineitem, schema)
> lineitem.persist(StorageLevel.MEMORY_AND_DISK)
> repartitioned = lineitem.repartition(partition_count)
> joined = repartitioned.join(repartitioned)
> joined.show()
>
>
> *Questions*
>
> Generally, what
=
lineitem.repartition(partition_count)joined =
repartitioned.join(repartitioned)joined.show()
*Questions*
Generally, what influences the space complexity of Spark operations? Is it
the case that a single partition of each operand’s data set + a single
partition of the resulting data set all need to fit