[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987190#comment-14987190 ] melvin mendoza edited comment on SPARK-1867 at 11/3/15 12:33 PM: - [~srowen] having problem with spark java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Code snippet: def main(args: Array[String]) { val LOG = Logger.getLogger(this.getClass().getName() + "Testing") LOG.info("SAMPLE START") LOG.info("Testing") try { val conf = new SparkConf() val sc = new SparkContext(conf) val phoenixSpark = sc.phoenixTableAsRDD( "SAMPLE_TABLE", Seq("ID", "NAME"), zkUrl = Some("r3r31gateway.clustered.com:2181:/hbase-unsecure")) val name = phoenixSpark.map(f => f.toString()) val sample = phoenixSpark.map(f => (f.get("ID") + "," + f.get("NAME"))) sample.foreach(println) LOG.info("SAMPLE TABLE: " + name.toString()) sc.stop() } catch { case e: Exception => { e.printStackTrace() val msg = e.getMessage LOG.error("Phoenix Testing failure: errorMsg: " + msg) } } } I'm using HDP 2.2 was (Author: mamendoza): [~srowen] having problem with spark java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Code snippet: def main(args: Array[String]) { val LOG = Logger.getLogger(this.getClass().getName() + "Testing") LOG.info("SAMPLE START") LOG.info("Testing") try { val conf = new SparkConf() val sc = new SparkContext(conf) val phoenixSpark = sc.phoenixTableAsRDD( "SAMPLE_TABLE", Seq("ID", "NAME"), zkUrl = Some("r3r31gateway.clustered.com:2181:/hbase-unsecure")) val name = phoenixSpark.map(f => f.toString()) val sample = phoenixSpark.map(f => (f.get("ID") + "," + f.get("NAME"))) sample.foreach(println) LOG.info("SAMPLE TABLE: " + name.toString()) sc.stop() } catch { case e: Exception => { e.printStackTrace() val msg = e.getMessage LOG.error("Phoenix Testing failure: errorMsg: " + msg) } } } > Spark Documentation Error causes java.lang.IllegalStateException: unread > block data > --- > > Key: SPARK-1867 > URL: https://issues.apache.org/jira/browse/SPARK-1867 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: sam > > I've employed two System Administrators on a contract basis (for quite a bit > of money), and both contractors have independently hit the following > exception. What we ar
[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526627#comment-14526627 ] Marc Reichman edited comment on SPARK-1867 at 5/4/15 1:33 PM: -- I'm running into the same problem, and have yet to see it resolved. My driver and workers are running 1.7.0_71, and I'm building it using 1.7.0_71 as well (albeit on a Windows machine but that shouldn't matter I hope!). I'm using the spark-provided 1.3.1hadoop2.6 bundle for runtime, and the maven spark-core to build. Both sets of spark components appear to be build with 1.6.0_30. I'm using the AccumuloInputFormat with the new API hadoop RDD method. My Key.class and Value.class from Accumulo are on the KryoSerializer registration list. I do NOT run into this issue if I run with a local execution, but I do run into it when submitting with spark-submit for YARN or the spark master. The trace is similar to the previous comment, with the ObjectInputStream steps. It smells like an issue serializing either the driver class or a closure inside it. I'm currently working to make sure all my versions are lined up as they should be (double checking). I'm holding off trying to build everything by hand with my 1.7.0_71 JDK but I will probably try that later if I can't resolve it otherwise. No CDH in any way involved, my hadoop build is the binary 2.6.0 from apache. was (Author: marcreichman): I'm running into the same problem, and have yet to see it resolved. My driver and workers are running 1.7.0_71, and I'm building it using 1.7.0_71 as well (albeit on a Windows machine but that shouldn't matter I hope!). I'm using the spark-provided 1.3.1hadoop2.6 bundle for runtime, and the maven spark-core to build. Both sets of spark components appear to be build with 1.6.0_30. I do NOT run into this issue if I run with a local execution, but I do run into it when submitting with spark-submit for YARN or the spark master. The trace is similar to the previous comment, with the ObjectInputStream steps. It smells like an issue serializing either the driver class or a closure inside it. I'm currently working to make sure all my versions are lined up as they should be (double checking). I'm holding off trying to build everything by hand with my 1.7.0_71 JDK but I will probably try that later if I can't resolve it otherwise. No CDH in any way involved, my hadoop build is the binary 2.6.0 from apache. > Spark Documentation Error causes java.lang.IllegalStateException: unread > block data > --- > > Key: SPARK-1867 > URL: https://issues.apache.org/jira/browse/SPARK-1867 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: sam > > I've employed two System Administrators on a contract basis (for quite a bit > of money), and both contractors have independently hit the following > exception. What we are doing is: > 1. Installing Spark 0.9.1 according to the documentation on the website, > along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. > 2. Building a fat jar with a Spark app with sbt then trying to run it on the > cluster > I've also included code snippets, and sbt deps at the bottom. > When I've Googled this, there seems to be two somewhat vague responses: > a) Mismatching spark versions on nodes/user code > b) Need to add more jars to the SparkConf > Now I know that (b) is not the problem having successfully run the same code > on other clusters while only including one jar (it's a fat jar). > But I have no idea how to check for (a) - it appears Spark doesn't have any > version checks or anything - it would be nice if it checked versions and > threw a "mismatching version exception: you have user code using version X > and node Y has version Z". > I would be very grateful for advice on this. > The exception: > Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task > 0.0:1 failed 32 times (most recent failure: Exception failure: > java.lang.IllegalStateException: unread block data) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at >
[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520925#comment-14520925 ] meiyoula edited comment on SPARK-1867 at 4/30/15 6:17 AM: -- Hi all, My cluster information is: /opt/jdk1.8.0_40, hadoop2.6.0, hbase1.0.0, zookeeper 3.5.0, spark1.3 (all build by myself with JDK8). These days I run the command with beeline shell. {quote} create table s1 ( key1 string, c11 int, c12 string, c13 string, c14 string ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties( "hbase.columns.mapping" = ":key, info:c11, info:c12, info:c13, info:c14 ") tblproperties("hbase.table.name" = "shb1"); {quote} then run {quote} select * from s1 {quote} It always throw the exception: {quote} org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, vm115): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:74 {quote} was (Author: meiyoula): Hi all, My cluster information is: /opt/jdk1.8.0_40, hadoop2.6.0, hbase1.0.0, zookeeper 3.5.0, spark1.3 (all build by myself with JDK8). These days I run the command {quote} create table s1 ( key1 string, c11 int, c12 string, c13 string, c14 string ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties( "hbase.columns.mapping" = ":key, info:c11, info:c12, info:c13, info:c14 ") tblproperties("hbase.table.name" = "shb1"); {quote} then run "select * from s1" with beeline shell. It always throw the exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, vm115): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:74 > Spark Documentation Error causes java.lang.IllegalStateException: unread > block data > --- > > Key: SPARK-1867 > URL: https://issues.apache.org/jira/browse/SPARK-1867 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: sam > > I've employed two System Administrators on a contract basis (for quite a bit > of money), and both contractors have independently hit the following > exception. What we are doing is: > 1. Installing Spark 0.9.1 according to the documentation on the website, > along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. > 2. Building a fat jar with a Spark app with sbt then trying to run it on the > cluster > I've also included code snippets, and sbt deps at the bottom. > When I
[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520925#comment-14520925 ] meiyoula edited comment on SPARK-1867 at 4/30/15 6:16 AM: -- Hi all, My cluster information is: /opt/jdk1.8.0_40, hadoop2.6.0, hbase1.0.0, zookeeper 3.5.0, spark1.3 (all build by myself with JDK8). These days I run the command {quote} create table s1 ( key1 string, c11 int, c12 string, c13 string, c14 string ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties( "hbase.columns.mapping" = ":key, info:c11, info:c12, info:c13, info:c14 ") tblproperties("hbase.table.name" = "shb1"); {quote} then run "select * from s1" with beeline shell. It always throw the exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, vm115): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:74 was (Author: meiyoula): Hi all, My cluster information is: /opt/jdk1.8.0_40, hadoop2.6.0, hbase1.0.0, zookeeper 3.5.0, spark1.3 (all build by myself with JDK8). These days I run the command "create table s1 ( key1 string, c11 int, c12 string, c13 string, c14 string ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties( "hbase.columns.mapping" = ":key, info:c11, info:c12, info:c13, info:c14 ") tblproperties("hbase.table.name" = "shb1");“ then run "select * from s1" with beeline shell. It always throw the exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, vm115): java.lang.IllegalStateException: unread block data at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2424) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1383) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:69) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:95) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:74 > Spark Documentation Error causes java.lang.IllegalStateException: unread > block data > --- > > Key: SPARK-1867 > URL: https://issues.apache.org/jira/browse/SPARK-1867 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: sam > > I've employed two System Administrators on a contract basis (for quite a bit > of money), and both contractors have independently hit the following > exception. What we are doing is: > 1. Installing Spark 0.9.1 according to the documentation on the website, > along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. > 2. Building a fat jar with a Spark app with sbt then trying to run it on the > cluster > I've also included code snippets, and sbt deps at the bottom. > When I've Googled this, there seems to be two somewhat vagu
[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212852#comment-14212852 ] Anson Abraham edited comment on SPARK-1867 at 11/17/14 10:40 PM: - Yes. i added 3 data nodes just for this. And I had 2 of them as my worker node and the other as my master. Still getting that issue. Also the jar files were all supplied by Cloudera. Also all this was done through Cloudera manager parcels. I installed spark as standalone on CDH5.2 as stated above. So the jars have to be the same. But as a just in case, i rsync'd them across the machines and still hitting this issue. This is all occurring when running through spark-shell of course. was (Author: ansonism): Yes. i added 3 data nodes just for this. And I had 2 of them as my worker node and the other as my master. Still getting that issue. Also the jar files were all supplied by Cloudera. > Spark Documentation Error causes java.lang.IllegalStateException: unread > block data > --- > > Key: SPARK-1867 > URL: https://issues.apache.org/jira/browse/SPARK-1867 > Project: Spark > Issue Type: Bug >Reporter: sam > > I've employed two System Administrators on a contract basis (for quite a bit > of money), and both contractors have independently hit the following > exception. What we are doing is: > 1. Installing Spark 0.9.1 according to the documentation on the website, > along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs. > 2. Building a fat jar with a Spark app with sbt then trying to run it on the > cluster > I've also included code snippets, and sbt deps at the bottom. > When I've Googled this, there seems to be two somewhat vague responses: > a) Mismatching spark versions on nodes/user code > b) Need to add more jars to the SparkConf > Now I know that (b) is not the problem having successfully run the same code > on other clusters while only including one jar (it's a fat jar). > But I have no idea how to check for (a) - it appears Spark doesn't have any > version checks or anything - it would be nice if it checked versions and > threw a "mismatching version exception: you have user code using version X > and node Y has version Z". > I would be very grateful for advice on this. > The exception: > Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task > 0.0:1 failed 32 times (most recent failure: Exception failure: > java.lang.IllegalStateException: unread block data) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to > java.lang.IllegalStateException: unread block data [duplicate 59] > My code snippet: > val conf = new SparkConf() >.setMaster(clusterMaster) >.setAppName(appName) >.setSparkHome(sparkHome) >.setJars(SparkContext.jarOfClass(this.getClass)) > println("count = " + new SparkContext(conf).textFile(someHdfsPath).count()) > My SBT de