Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Ted Yu Mon, 23 May 2016 09:17:27 -0700

Can you tell us the commit hash using which the test was run ?

For #2, if you can give full stack trace, that would be nice.


Thanks

On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cristian MARCU <
ovidiu-cristian.ma...@inria.fr> wrote:

> Hi
>
> 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9
> queries and then it ends in the OutOfMemoryError [1].
>
> *What was the configuration used for running this benchmark? Can you
> explain the meaning of 4 shuffle partitions? Thanks!*
>
> On my local system I use:
> ./bin/spark-submit --class
> org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master
> local[4] jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar
> configured with:
>       .set("spark.sql.parquet.compression.codec", "snappy")
>       .set("spark.sql.shuffle.partitions", "4")
>       .set("spark.driver.memory", "3g")
>       .set("spark.executor.memory", "3g")
>       .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 1024
> ).toString)
>
> Scale factor of TPCDS is 5, data generated using notes from
> https://github.com/databricks/spark-sql-perf.
>
> 2) Running spark-sql-perf with: val experiment =
> tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some
> exceptions:
>
> Running execution *q9-v1.4* iteration: 1, StandardRun=true
> java.lang.NullPointerException
> at
> org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
> at
> org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
> at
> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
> at
> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
> ... at
> org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
> at
> org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289)
> at
> org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61)
> at
> org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
> at
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>
> or
>
> Running execution q25-v1.4 iteration: 1, StandardRun=true
> java.lang.IllegalStateException: Task -1024 has already locked
> broadcast_755_piece0 for writing
> at
> org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232)
> at
> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296)
>
> Best,
> Ovidiu
>
> [1]
> Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError:
> Java heap space
> at
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539)
> at
> org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803)
> at
> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105)
> at
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816)
> at
> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812)
> at
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89)
> at
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71)
> at
> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94)
> at
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
> at
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Reply via email to