[ https://issues.apache.org/jira/browse/SPARK-24986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24986. ---------------------------------- Resolution: Incomplete > OOM in BufferHolder during writes to a stream > --------------------------------------------- > > Key: SPARK-24986 > URL: https://issues.apache.org/jira/browse/SPARK-24986 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.1.0, 2.2.0, 2.3.0 > Reporter: Sanket Reddy > Priority: Major > Labels: bulk-closed > > We have seen out of memory exception while running one of our prod jobs. We > expect the memory allocation to be managed by unified memory manager during > run time. > So the buffer which is growing during write is somewhat like this if the > rowlength is constant then the buffer does not grow… it keeps resetting and > writing the values to the buffer… if the rows are variable and it is skewed > and has huge stuff to be written this happens and i think the estimator which > requests for initial execution memory does not account for this i think… > Checking for underlying heap before growing the global buffer might be a > viable option > java.lang.OutOfMemoryError: Java heap space > at > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_1$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:232) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:221) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:159) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:29) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:1075) > at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:1091) > at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1129) > at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1132) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:513) > at > org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:329) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1966) > at > org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:270) > 18/06/11 21:18:41 ERROR SparkUncaughtExceptionHandler: [Container in > shutdown] Uncaught exception in thread Thread[stdout writer for > Python/bin/python3.6,5,main] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org