Hi, Below is an error I got while using Spark 1.6.1 on AWS EMR 4.5. I am trying to understand what exactly this error is telling me. I see the exception, then what I am assuming is the plan being executed the resulting stack trace, followed by two caused by stack traces, and then a driver stack trace followed by the two caused by stack traces repeated. I am used to errors that only produce one stack trace, can someone explain why I am getting 6 stack traces (4 unique ones)? Should I focus more on one of these stack traces versus the other? I believe that there was one error then a sort of cascading of errors across workers/driver. I am just not sure which is the originating error.
I am assuming I am reading something wrong with this particular stack trace because the first part is confusing me. I thought that persist calls do not actually trigger work because of the lazy evaluation, and that only actions will do this, but the initial stack trace seems to be showing a persist call with underlying executing work. -Thank you. -James Stack Trace: An error occurred while calling o236.persist. : org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange rangepartitioning(nation#9998 ASC,o_year#9999 DESC,5120), None +- ConvertToSafe +- TungstenAggregate(key=[nation#9998,o_year#9999], functions=[(sum(amount#10000),mode=Final,isDistinct=false)], output=[nation#9998,o_year#9999,sum_profit#9988]) +- TungstenExchange hashpartitioning(nation#9998,o_year#9999,5120), None +- TungstenAggregate(key=[nation#9998,o_year#9999], functions=[(sum(amount#10000),mode=Partial,isDistinct=false)], output=[nation#9998,o_year#9999,sum#10255]) +- Project [n_name#196 AS nation#9998,year(o_orderdate#96) AS o_year#9999,CheckOverflow((cast(CheckOverflow((cast(l_extendedprice#344 as decimal(13,2)) * CheckOverflow((1.00 - cast(l_discount#345 as decimal(13,2))), DecimalType(13,2))), DecimalType(26,4)) as decimal(27,4)) - cast(CheckOverflow((ps_supplycost#283 * l_quantity#343), DecimalType(25,4)) as decimal(27,4))), DecimalType(27,4)) AS amount#10000] +- BroadcastHashJoin [s_nationkey#522], [n_nationkey#195], BuildRight :- Project [l_extendedprice#344,l_quantity#343,ps_supplycost#283,l_discount#345,s_nationkey#522,o_orderdate#96] : +- SortMergeJoin [l_orderkey#339], [o_orderkey#92] : :- Sort [l_orderkey#339 ASC], false, 0 : : +- TungstenExchange hashpartitioning(l_orderkey#339,5120), None : : +- Project [l_orderkey#339,l_extendedprice#344,l_quantity#343,ps_supplycost#283,l_discount#345,s_nationkey#522] : : +- SortMergeJoin [l_suppkey#341,l_partkey#340], [ps_suppkey#281,ps_partkey#280] : : :- Sort [l_suppkey#341 ASC,l_partkey#340 ASC], false, 0 : : : +- TungstenExchange hashpartitioning(l_suppkey#341,l_partkey#340,5120), None : : : +- Project [l_orderkey#339,l_suppkey#341,l_extendedprice#344,l_quantity#343,l_partkey#340,l_discount#345,s_nationkey#522] : : : +- SortMergeJoin [l_suppkey#341], [s_suppkey#519] : : : :- Sort [l_suppkey#341 ASC], false, 0 : : : : +- TungstenExchange hashpartitioning(l_suppkey#341,5120), None : : : : +- Project [l_orderkey#339,l_suppkey#341,l_extendedprice#344,l_quantity#343,l_partkey#340,l_discount#345] : : : : +- SortMergeJoin [p_partkey#600], [l_partkey#340] : : : : :- Sort [p_partkey#600 ASC], false, 0 : : : : : +- TungstenExchange hashpartitioning(p_partkey#600,5120), None : : : : : +- Project [p_partkey#600] : : : : : +- Filter Contains(p_name#601, almond) : : : : : +- InMemoryColumnarTableScan [p_partkey#600,p_name#601], [Contains(p_name#601, almond)], InMemoryRelation [p_partkey#600,p_name#601,p_mfgr#602,p_brand#603,p_type#604,p_size#605,p_container#606,p_retailprice#607,p_comment#608], true, 10000, StorageLevel(true, true, false, true, 1), Scan ParquetRelation[p_partkey#600,p_name#601,p_mfgr#602,p_brand#603,p_type#604,p_size#605,p_container#606,p_retailprice#607,p_comment#608] InputPaths: s3://aqapop/DataAlgebraData/tpch/1000G/5120/part.parquet, None : : : : +- Sort [l_partkey#340 ASC], false, 0 : : : : +- TungstenExchange hashpartitioning(l_partkey#340,5120), None : : : : +- InMemoryColumnarTableScan [l_orderkey#339,l_suppkey#341,l_extendedprice#344,l_quantity#343,l_partkey#340,l_discount#345], InMemoryRelation [l_orderkey#339,l_partkey#340,l_suppkey#341,l_linenumber#342,l_quantity#343,l_extendedprice#344,l_discount#345,l_tax#346,l_returnflag#347,l_linestatus#348,l_shipdate#349,l_commitdate#350,l_receiptdate#351,l_shipinstruct#352,l_shipmode#353,l_comment#354], true, 10000, StorageLevel(true, true, false, true, 1), Scan ParquetRelation[l_orderkey#339,l_partkey#340,l_suppkey#341,l_linenumber#342,l_quantity#343,l_extendedprice#344,l_discount#345,l_tax#346,l_returnflag#347,l_linestatus#348,l_shipdate#349,l_commitdate#350,l_receiptdate#351,l_shipinstruct#352,l_shipmode#353,l_comment#354] InputPaths: s3://aqapop/DataAlgebraData/tpch/1000G/5120/lineitem.parquet, None : : : +- Sort [s_suppkey#519 ASC], false, 0 : : : +- TungstenExchange hashpartitioning(s_suppkey#519,5120), None : : : +- InMemoryColumnarTableScan [s_nationkey#522,s_suppkey#519], InMemoryRelation [s_suppkey#519,s_name#520,s_address#521,s_nationkey#522,s_phone#523,s_acctbal#524,s_comment#525], true, 10000, StorageLevel(true, true, false, true, 1), Scan ParquetRelation[s_suppkey#519,s_name#520,s_address#521,s_nationkey#522,s_phone#523,s_acctbal#524,s_comment#525] InputPaths: s3://aqapop/DataAlgebraData/tpch/1000G/5120/supplier.parquet, None : : +- Sort [ps_suppkey#281 ASC,ps_partkey#280 ASC], false, 0 : : +- TungstenExchange hashpartitioning(ps_suppkey#281,ps_partkey#280,5120), None : : +- InMemoryColumnarTableScan [ps_supplycost#283,ps_partkey#280,ps_suppkey#281], InMemoryRelation [ps_partkey#280,ps_suppkey#281,ps_availqty#282,ps_supplycost#283,ps_comment#284], true, 10000, StorageLevel(true, true, false, true, 1), Scan ParquetRelation[ps_partkey#280,ps_suppkey#281,ps_availqty#282,ps_supplycost#283,ps_comment#284] InputPaths: s3://aqapop/DataAlgebraData/tpch/1000G/5120/partsupp.parquet, None : +- Sort [o_orderkey#92 ASC], false, 0 : +- TungstenExchange hashpartitioning(o_orderkey#92,5120), None : +- InMemoryColumnarTableScan [o_orderkey#92,o_orderdate#96], InMemoryRelation [o_orderkey#92,o_custkey#93,o_orderstatus#94,o_totalprice#95,o_orderdate#96,o_orderpriority#97,o_clerk#98,o_shippriority#99,o_comment#100], true, 10000, StorageLevel(true, true, false, true, 1), Scan ParquetRelation[o_orderkey#92,o_custkey#93,o_orderstatus#94,o_totalprice#95,o_orderdate#96,o_orderpriority#97,o_clerk#98,o_shippriority#99,o_comment#100] InputPaths: s3://aqapop/DataAlgebraData/tpch/1000G/5120/orders.parquet, None +- InMemoryColumnarTableScan [n_name#196,n_nationkey#195], InMemoryRelation [n_nationkey#195,n_name#196,n_regionkey#197,n_comment#198], true, 10000, StorageLevel(true, true, false, true, 1), Scan ParquetRelation[n_nationkey#195,n_name#196,n_regionkey#197,n_comment#198] InputPaths: s3://aqapop/DataAlgebraData/tpch/1000G/5120/nation.parquet, None at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.Sort.doExecute(Sort.scala:64) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryColumnarTableScan.scala:129) at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryColumnarTableScan.scala:118) at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryColumnarTableScan.scala:41) at org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:93) at org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:60) at org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:84) at org.apache.spark.sql.DataFrame.persist(DataFrame.scala:1601) at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 832 in stage 277.0 failed 4 times, most recent failure: Lost task 832.3 in stage 277.0 (TID 1044111, ip-10-0-0-194.ec2.internal): org.apache.spark.storage.BlockFetchException: Failed to fetch block from 1 locations. Most recent failure cause: at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595) at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585) at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:630) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Connection from ip-10-0-0-193.ec2.internal/10.0.0.193:58887 closed at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:124) at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.collect(RDD.scala:926) at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:264) at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:126) at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:179) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254) at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) ... 32 more Caused by: org.apache.spark.storage.BlockFetchException: Failed to fetch block from 1 locations. Most recent failure cause: at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595) at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585) at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570) at org.apache.spark.storage.BlockManager.get(BlockManager.scala:630) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44) at org.apache.spark.rdd.RDD.iterator(RDD.scala:268) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ... 1 more Caused by: java.io.IOException: Connection from ip-10-0-0-193.ec2.internal/10.0.0.193:58887 closed at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:124) at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:94) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158) at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144) at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739) at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org