[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944166#comment-14944166 ]
Hans van den Bogert edited comment on SPARK-10474 at 10/5/15 10:39 PM: ----------------------------------------------------------------------- Had this patch against tag 1.5.1: https://gist.github.com/f110f64887f4739b7dd8 Output of the added println()'s are near the beginning in: {noformat} Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel("INFO") Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0) Type in expressions to have them evaluated. Type :help for more information. 15/10/06 00:16:00 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN). numCores:0 1048576 15/10/06 00:16:02 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. I1006 00:16:02.414851 25123 sched.cpp:137] Version: 0.21.0 I1006 00:16:02.423246 25115 sched.cpp:234] New master detected at master@10.149.3.5:5050 I1006 00:16:02.423482 25115 sched.cpp:242] No credentials provided. Attempting to register without authentication I1006 00:16:02.427250 25121 sched.cpp:408] Framework registered with 20151006-001500-84120842-5050-6847-0000 15/10/06 00:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context available as sc. 15/10/06 00:16:04 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:04 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 15/10/06 00:16:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 15/10/06 00:16:11 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:11 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SQL context available as sqlContext. {noformat} was (Author: hbogert): Had this patch against tag 1.5.1: https://gist.github.com/f110f64887f4739b7dd8.git Output of the added println()'s are near the beginning in: {noformat} Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties To adjust logging level use sc.setLogLevel("INFO") Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.5.1 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0) Type in expressions to have them evaluated. Type :help for more information. 15/10/06 00:16:00 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN). numCores:0 1048576 15/10/06 00:16:02 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. I1006 00:16:02.414851 25123 sched.cpp:137] Version: 0.21.0 I1006 00:16:02.423246 25115 sched.cpp:234] New master detected at master@10.149.3.5:5050 I1006 00:16:02.423482 25115 sched.cpp:242] No credentials provided. Attempting to register without authentication I1006 00:16:02.427250 25121 sched.cpp:408] Framework registered with 20151006-001500-84120842-5050-6847-0000 15/10/06 00:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Spark context available as sc. 15/10/06 00:16:04 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:04 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:09 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 15/10/06 00:16:09 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException 15/10/06 00:16:11 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:11 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/06 00:16:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable SQL context available as sqlContext. {noformat} > TungstenAggregation cannot acquire memory for pointer array after switching > to sort-based > ----------------------------------------------------------------------------------------- > > Key: SPARK-10474 > URL: https://issues.apache.org/jira/browse/SPARK-10474 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Yi Zhou > Assignee: Andrew Or > Priority: Blocker > Fix For: 1.5.1, 1.6.0 > > > In aggregation case, a Lost task happened with below error. > {code} > java.io.IOException: Could not acquire 65536 bytes of memory > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.initializeForWriting(UnsafeExternalSorter.java:169) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:220) > at > org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:126) > at > org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:257) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.switchToSortBasedAggregation(TungstenAggregationIterator.scala:435) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:379) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.start(TungstenAggregationIterator.scala:622) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.org$apache$spark$sql$execution$aggregate$TungstenAggregate$$anonfun$$executePartition$1(TungstenAggregate.scala:110) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > at > org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:119) > at > org.apache.spark.rdd.MapPartitionsWithPreparationRDD.compute(MapPartitionsWithPreparationRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Key SQL Query > {code:sql} > INSERT INTO TABLE test_table > SELECT > ss.ss_customer_sk AS cid, > count(CASE WHEN i.i_class_id=1 THEN 1 ELSE NULL END) AS id1, > count(CASE WHEN i.i_class_id=3 THEN 1 ELSE NULL END) AS id3, > count(CASE WHEN i.i_class_id=5 THEN 1 ELSE NULL END) AS id5, > count(CASE WHEN i.i_class_id=7 THEN 1 ELSE NULL END) AS id7, > count(CASE WHEN i.i_class_id=9 THEN 1 ELSE NULL END) AS id9, > count(CASE WHEN i.i_class_id=11 THEN 1 ELSE NULL END) AS id11, > count(CASE WHEN i.i_class_id=13 THEN 1 ELSE NULL END) AS id13, > count(CASE WHEN i.i_class_id=15 THEN 1 ELSE NULL END) AS id15, > count(CASE WHEN i.i_class_id=2 THEN 1 ELSE NULL END) AS id2, > count(CASE WHEN i.i_class_id=4 THEN 1 ELSE NULL END) AS id4, > count(CASE WHEN i.i_class_id=6 THEN 1 ELSE NULL END) AS id6, > count(CASE WHEN i.i_class_id=8 THEN 1 ELSE NULL END) AS id8, > count(CASE WHEN i.i_class_id=10 THEN 1 ELSE NULL END) AS id10, > count(CASE WHEN i.i_class_id=14 THEN 1 ELSE NULL END) AS id14, > count(CASE WHEN i.i_class_id=16 THEN 1 ELSE NULL END) AS id16 > FROM store_sales ss > INNER JOIN item i ON ss.ss_item_sk = i.i_item_sk > WHERE i.i_category IN ('Books') > AND ss.ss_customer_sk IS NOT NULL > GROUP BY ss.ss_customer_sk > HAVING count(ss.ss_item_sk) > 5 > {code} > Note: > the store_sales is a big fact table and item is a small dimension table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org