Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Ovidiu-Cristian MARCU Tue, 24 May 2016 03:13:34 -0700

Do you need more information?


> On 23 May 2016, at 19:16, Ovidiu-Cristian MARCU 
> <ovidiu-cristian.ma...@inria.fr> wrote:
> 
> Yes,
> 
> git log
> commit dafcb05c2ef8e09f45edfb7eabf58116c23975a0
> Author: Sameer Agarwal <sam...@databricks.com <mailto:sam...@databricks.com>>
> Date:   Sun May 22 23:32:39 2016 -0700
> 
> for #2 see my comments in https://issues.apache.org/jira/browse/SPARK-15078 
> <https://issues.apache.org/jira/browse/SPARK-15078>
> 
>> On 23 May 2016, at 18:16, Ted Yu <yuzhih...@gmail.com 
>> <mailto:yuzhih...@gmail.com>> wrote:
>> 
>> Can you tell us the commit hash using which the test was run ?
>> 
>> For #2, if you can give full stack trace, that would be nice.
>> 
>> Thanks
>> 
>> On Mon, May 23, 2016 at 8:58 AM, Ovidiu-Cristian MARCU 
>> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> 
>> wrote:
>> Hi
>> 
>> 1) Using latest spark 2.0 I've managed to run TPCDSQueryBenchmark first 9 
>> queries and then it ends in the OutOfMemoryError [1].
>> 
>> What was the configuration used for running this benchmark? Can you explain 
>> the meaning of 4 shuffle partitions? Thanks!
>> 
>> On my local system I use:
>> ./bin/spark-submit --class 
>> org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark --master 
>> local[4] jars/spark-sql_2.11-2.0.0-SNAPSHOT-tests.jar
>> configured with:
>>       .set("spark.sql.parquet.compression.codec", "snappy")
>>       .set("spark.sql.shuffle.partitions", "4")
>>       .set("spark.driver.memory", "3g")
>>       .set("spark.executor.memory", "3g")
>>       .set("spark.sql.autoBroadcastJoinThreshold", (20 * 1024 * 
>> 1024).toString)
>> 
>> Scale factor of TPCDS is 5, data generated using notes from 
>> https://github.com/databricks/spark-sql-perf 
>> <https://github.com/databricks/spark-sql-perf>.
>> 
>> 2) Running spark-sql-perf with: val experiment = 
>> tpcds.runExperiment(tpcds.runnable) on the same dataset reveals some 
>> exceptions:
>> 
>> Running execution q9-v1.4 iteration: 1, StandardRun=true
>> java.lang.NullPointerException
>>      at 
>> org.apache.spark.sql.execution.ScalarSubquery.dataType(subquery.scala:45)
>>      at 
>> org.apache.spark.sql.catalyst.expressions.CaseWhenBase.dataType(conditionalExpressions.scala:103)
>>      at 
>> org.apache.spark.sql.catalyst.expressions.Alias.toAttribute(namedExpressions.scala:165)
>>      at 
>> org.apache.spark.sql.execution.ProjectExec$$anonfun$output$1.apply(basicPhysicalOperators.scala:33)
>> ...  at 
>> org.apache.spark.sql.execution.ProjectExec.output(basicPhysicalOperators.scala:33)
>>      at 
>> org.apache.spark.sql.execution.WholeStageCodegenExec.output(WholeStageCodegenExec.scala:289)
>>      at 
>> org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:61)
>>      at 
>> org.apache.spark.sql.execution.DeserializeToObject$$anonfun$2.apply(objects.scala:60)
>>      at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>>      at 
>> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$23.apply(RDD.scala:774)
>> 
>> or
>> 
>> Running execution q25-v1.4 iteration: 1, StandardRun=true
>> java.lang.IllegalStateException: Task -1024 has already locked 
>> broadcast_755_piece0 for writing
>>      at 
>> org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:232)
>>      at 
>> org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1296)
>> 
>> Best,
>> Ovidiu
>> 
>> [1]
>> Exception in thread "broadcast-exchange-164" java.lang.OutOfMemoryError: 
>> Java heap space
>>      at 
>> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.append(HashedRelation.scala:539)
>>      at 
>> org.apache.spark.sql.execution.joins.LongHashedRelation$.apply(HashedRelation.scala:803)
>>      at 
>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:105)
>>      at 
>> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:816)
>>      at 
>> org.apache.spark.sql.execution.joins.HashedRelationBroadcastMode.transform(HashedRelation.scala:812)
>>      at 
>> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:89)
>>      at 
>> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:71)
>>      at 
>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94)
>>      at 
>> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
>>      at 
>> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:71)
>>      at 
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>      at 
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> 
>

Re: Running TPCDSQueryBenchmark results in java.lang.OutOfMemoryError

Reply via email to