RE: spark 1.1.0 save data to hdfs failed
I modified my pom.xml according to the Spark pom.xml. It is working right now. Hadoop2 classes are no longer packaged into my jar. Thanks. From: eyc...@hotmail.com To: so...@cloudera.com CC: user@spark.apache.org Subject: RE: spark 1.1.0 save data to hdfs failed Date: Sat, 24 Jan 2015 07:30:45 -0800 Thanks for the information. I changed the dependencies for Spark jars as follows: org.apache.spark spark-core_2.10 1.1.0 provided org.apache.spark spark-sql_2.10 1.1.0 provided I don't know how these libraries are built, but I saw Spark has maven pom files. I think these jars should be built from the corresponding pom files. These pom files have dependencies on hadoop version 1.0.4. So I don't know where the hadoop2 jar come from. What follows is a major fragment of my current dependency tree. I don't know where the hadoop2 classes come into my built jar. == [INFO] | \- org.apache.hadoop:hadoop-core:jar:1.2.1:provided [INFO] | +- xmlenc:xmlenc:jar:0.52:provided [INFO] | +- (com.sun.jersey:jersey-core:jar:1.8:provided - omitted for duplicate) [INFO] | +- (com.sun.jersey:jersey-json:jar:1.8:provided - omitted for duplicate) [INFO] | +- (com.sun.jersey:jersey-server:jar:1.8:provided - omitted for duplicate) [INFO] | +- (commons-io:commons-io:jar:2.1:provided - omitted for conflict with 2.4) [INFO] | +- (commons-codec:commons-codec:jar:1.4:compile - scope updated from provided; omitted for duplicate) [INFO] | +- (org.apache.commons:commons-math:jar:2.1:provided - omitted for duplicate) [INFO] | +- commons-configuration:commons-configuration:jar:1.6:provided [INFO] | | +- (commons-collections:commons-collections:jar:3.2.1:provided - omitted for duplicate) [INFO] | | +- (commons-lang:commons-lang:jar:2.4:provided - omitted for conflict with 2.6) [INFO] | | +- (commons-logging:commons-logging:jar:1.1.1:provided - omitted for duplicate) [INFO] | | +- commons-digester:commons-digester:jar:1.8:provided [INFO] | | | +- commons-beanutils:commons-beanutils:jar:1.7.0:provided [INFO] | | | | \- (commons-logging:commons-logging:jar:1.0.3:provided - omitted for conflict with 1.1.1) [INFO] | | | \- (commons-logging:commons-logging:jar:1.1:provided - omitted for conflict with 1.1.1) [INFO] | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:provided [INFO] | | \- (commons-logging:commons-logging:jar:1.1.1:provided - omitted for duplicate) [INFO] | +- (commons-net:commons-net:jar:1.4.1:provided - omitted for conflict with 2.2) [INFO] | +- commons-el:commons-el:jar:1.0:provided [INFO] | | \- (commons-logging:commons-logging:jar:1.0.3:provided - omitted for conflict with 1.1.1) [INFO] | +- hsqldb:hsqldb:jar:1.8.0.10:provided [INFO] | +- oro:oro:jar:2.0.8:provided [INFO] | \- (org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:provided - omitted for conflict with 1.9.13) [INFO] +- org.apache.spark:spark-core_2.10:jar:1.1.0:provided [INFO] | +- (org.apache.hadoop:hadoop-client:jar:1.0.4:provided - omitted for conflict with 1.2.1) [INFO] | +- net.java.dev.jets3t:jets3t:jar:0.7.1:provided [INFO] | | +- (commons-codec:commons-codec:jar:1.3:provided - omitted for conflict with 1.4) [INFO] | | \- (commons-httpclient:commons-httpclient:jar:3.1:provided - omitted for duplicate) [INFO] | +- org.apache.curator:curator-recipes:jar:2.4.0:provided [INFO] | | +- org.apache.curator:curator-framework:jar:2.4.0:provided [INFO] | | | +- org.apache.curator:curator-client:jar:2.4.0:provided [INFO] | | | | +- (org.slf4j:slf4j-api:jar:1.6.4:provided - omitted for conflict with 1.6.1) [INFO] | | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:provided - omitted for duplicate) [INFO] | | | | \- (com.google.guava:guava:jar:14.0.1:provided - omitted for duplicate) [INFO] | | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:provided - omitted for duplicate) [INFO] | | | \- (com.google.guava:guava:jar:14.0.1:provided - omitted for duplicate) [INFO] | | +- (org.apache.zookeeper:zookeeper:jar:3.4.5:provided - omitted for conflict with 3.4.6) [INFO] | | \- (com.google.guava:guava:jar:14.0.1:provided - omitted for duplicate) [INFO] | +- org.eclipse.jetty:jetty-plus:jar:8.1.14.v20131031:provided [INFO] | | +- org.eclipse.jetty.orbit:javax.transaction:jar:1.1.1.v201105210645:provided [INFO] | | +- org.eclipse.jetty:jetty-webapp:jar:8.1.14.v20131031:provided [INFO] | | | +- org.eclipse.jetty:jetty-xml:jar:8.1.14.v20131031:provided [INFO] | | | | \- (org.eclipse.jetty:jetty-util:jar:8.1.14.v20131031:provided - omitted for d
RE: spark 1.1.0 save data to hdfs failed
h 1.9.13) [INFO] | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.8:provided [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.8.8:provided - omitted for conflict with 1.9.13) [INFO] | | \- (org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:provided - omitted for conflict with 1.9.13) [INFO] | +- tomcat:jasper-compiler:jar:5.5.23:provided [INFO] | +- tomcat:jasper-runtime:jar:5.5.23:provided [INFO] | | \- (commons-el:commons-el:jar:1.0:provided - omitted for duplicate) [INFO] | +- org.jamon:jamon-runtime:jar:2.3.1:provided [INFO] | +- (com.google.protobuf:protobuf-java:jar:2.5.0:provided - omitted for conflict with 2.4.1) [INFO] | +- com.sun.jersey:jersey-core:jar:1.8:provided [INFO] | +- com.sun.jersey:jersey-json:jar:1.8:provided [INFO] | | +- org.codehaus.jettison:jettison:jar:1.1:provided [INFO] | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:provided [INFO] | | | \- (javax.xml.bind:jaxb-api:jar:2.2.2:provided - omitted for duplicate) [INFO] | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.7.1:provided - omitted for conflict with 1.9.13) [INFO] | | +- (org.codehaus.jackson:jackson-mapper-asl:jar:1.7.1:provided - omitted for conflict with 1.9.13) [INFO] | | +- (org.codehaus.jackson:jackson-jaxrs:jar:1.7.1:provided - omitted for conflict with 1.8.8) [INFO] | | +- org.codehaus.jackson:jackson-xc:jar:1.7.1:provided [INFO] | | | +- (org.codehaus.jackson:jackson-core-asl:jar:1.7.1:provided - omitted for conflict with 1.9.13) [INFO] | | | \- (org.codehaus.jackson:jackson-mapper-asl:jar:1.7.1:provided - omitted for conflict with 1.9.13) [INFO] | | \- (com.sun.jersey:jersey-core:jar:1.8:provided - omitted for duplicate) [INFO] | +- com.sun.jersey:jersey-server:jar:1.8:provided [INFO] | | +- asm:asm:jar:3.1:provided [INFO] | | \- (com.sun.jersey:jersey-core:jar:1.8:provided - omitted for duplicate) [INFO] | +- javax.xml.bind:jaxb-api:jar:2.2.2:provided [INFO] | | \- javax.activation:activation:jar:1.1:provided [INFO] | +- org.cloudera.htrace:htrace-core:jar:2.04:provided [INFO] | | +- (com.google.guava:guava:jar:12.0.1:provided - omitted for conflict with 14.0) [INFO] | | +- (commons-logging:commons-logging:jar:1.1.1:provided - omitted for duplicate) [INFO] | | \- (org.mortbay.jetty:jetty-util:jar:6.1.26:provided - omitted for duplicate) [INFO] | +- (org.apache.hadoop:hadoop-core:jar:1.2.1:provided - omitted for duplicate) [INFO] | +- com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1:provided [INFO] | \- junit:junit:jar:4.11:provided [INFO] | \- org.hamcrest:hamcrest-core:jar:1.3:provided [INFO] \- org.apache.hbase:hbase-client:jar:0.98.1-hadoop1:provided [INFO]+- (org.apache.hbase:hbase-common:jar:0.98.1-hadoop1:provided - omitted for duplicate) [INFO]+- (org.apache.hbase:hbase-protocol:jar:0.98.1-hadoop1:provided - omitted for duplicate) [INFO]+- (commons-codec:commons-codec:jar:1.7:compile - scope updated from provided; omitted for duplicate) [INFO]+- (commons-io:commons-io:jar:2.4:provided - omitted for duplicate) [INFO]+- (commons-lang:commons-lang:jar:2.6:provided - omitted for duplicate) [INFO]+- (commons-logging:commons-logging:jar:1.1.1:provided - omitted for duplicate) [INFO]+- (com.google.guava:guava:jar:12.0.1:provided - omitted for conflict with 14.0) [INFO]+- (com.google.protobuf:protobuf-java:jar:2.5.0:provided - omitted for conflict with 2.4.1) [INFO]+- (org.apache.zookeeper:zookeeper:jar:3.4.6:provided - omitted for duplicate) [INFO]+- (org.cloudera.htrace:htrace-core:jar:2.04:provided - omitted for duplicate) [INFO]+- (org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:provided - omitted for conflict with 1.9.13) [INFO]+- (org.apache.hadoop:hadoop-core:jar:1.2.1:provided - omitted for duplicate) [INFO]+- (com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1:provided - omitted for duplicate) [INFO]\- (junit:junit:jar:4.11:provided - omitted for duplicate) > From: so...@cloudera.com > Date: Sat, 24 Jan 2015 09:46:02 + > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: user@spark.apache.org > > Hadoop 2's artifact is hadoop-common rather than hadoop-core but I > assume you looked for that too. To answer your earlier question, no, > Spark works with both Hadoop 1 and Hadoop 2 and is source-compatible > with both. It can't be binary-compatible with both at once though. The > code you cite is correct; there is no bug there. > > Your first error definitely indicates you have the wrong version of > Hadoop on the client side. It's not matching your HDFS version. And > the second suggests you are mixing code compiled for different > versions of Hadoop. I think you need to check what version of Hadoop > your Spark is compiled for. For example I saw a reference to CDH 5.2 > which is Hadoop 2.5, but then you're sho
Re: spark 1.1.0 save data to hdfs failed
Hadoop 2's artifact is hadoop-common rather than hadoop-core but I assume you looked for that too. To answer your earlier question, no, Spark works with both Hadoop 1 and Hadoop 2 and is source-compatible with both. It can't be binary-compatible with both at once though. The code you cite is correct; there is no bug there. Your first error definitely indicates you have the wrong version of Hadoop on the client side. It's not matching your HDFS version. And the second suggests you are mixing code compiled for different versions of Hadoop. I think you need to check what version of Hadoop your Spark is compiled for. For example I saw a reference to CDH 5.2 which is Hadoop 2.5, but then you're showing that you are running an old Hadoop 1.x HDFS? there seem to be a number of possible incompatibilities here. On Fri, Jan 23, 2015 at 11:38 PM, ey-chih chow wrote: > Sorry I still did not quiet get your resolution. In my jar, there are > following three related classes: > > org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class > org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.class > org/apache/hadoop/mapreduce/TaskAttemptContext.class > > I think the first two come from hadoop2 and the third from hadoop1. I would > like to get rid of the first two. I checked my source code. It does have a > place using the class (or interface in hadoop2) TaskAttemptContext. > Do you mean I make a separate jar for this portion of code and built with > hadoop1 to get rid of dependency? An alternative way is to modify the code > in SparkHadoopMapReduceUtil.scala and put it into my own source code to > bypass the problem. Any comment on this? Thanks. > > > From: eyc...@hotmail.com > To: so...@cloudera.com > CC: user@spark.apache.org > Subject: RE: spark 1.1.0 save data to hdfs failed > Date: Fri, 23 Jan 2015 11:17:36 -0800 > > > Thanks. I looked at the dependency tree. I did not see any dependent jar > of hadoop-core from hadoop2. However the jar built from maven has the > class: > > org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class > > Do you know why? > > > > > > Date: Fri, 23 Jan 2015 17:01:48 + > Subject: RE: spark 1.1.0 save data to hdfs failed > From: so...@cloudera.com > To: eyc...@hotmail.com > > Are you receiving my replies? I have suggested a resolution. Look at the > dependency tree next. > > On Jan 23, 2015 2:43 PM, "ey-chih chow" wrote: > > I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it > is broken in the following code: > > def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): > TaskAttemptContext = { > val klass = firstAvailableClass( > "org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl", // > hadoop2, hadoop2-yarn > "org.apache.hadoop.mapreduce.TaskAttemptContext") // > hadoop1 > val ctor = klass.getDeclaredConstructor(classOf[Configuration], > classOf[TaskAttemptID]) > ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext] > } > > In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1. Any > suggestion how to resolve it? > > Thanks. > > > >> From: so...@cloudera.com >> Date: Fri, 23 Jan 2015 14:01:45 + >> Subject: Re: spark 1.1.0 save data to hdfs failed >> To: eyc...@hotmail.com >> CC: user@spark.apache.org >> >> These are all definitely symptoms of mixing incompatible versions of >> libraries. >> >> I'm not suggesting you haven't excluded Spark / Hadoop, but, this is >> not the only way Hadoop deps get into your app. See my suggestion >> about investigating the dependency tree. >> >> On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow wrote: >> > Thanks. But I think I already mark all the Spark and Hadoop reps as >> > provided. Why the cluster's version is not used? >> > >> > Any way, as I mentioned in the previous message, after changing the >> > hadoop-client to version 1.2.1 in my maven deps, I already pass the >> > exception and go to another one as indicated below. Any suggestion on >> > this? >> > >> > = >> > >> > Exception in thread "main" java.lang.reflect.InvocationTargetException >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> > at >> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> > at >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(Del
RE: spark 1.1.0 save data to hdfs failed
Sorry I still did not quiet get your resolution. In my jar, there are following three related classes: org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.classorg/apache/hadoop/mapreduce/task/TaskAttemptContextImpl$DummyReporter.classorg/apache/hadoop/mapreduce/TaskAttemptContext.class I think the first two come from hadoop2 and the third from hadoop1. I would like to get rid of the first two. I checked my source code. It does have a place using the class (or interface in hadoop2) TaskAttemptContext.Do you mean I make a separate jar for this portion of code and built with hadoop1 to get rid of dependency? An alternative way is to modify the code in SparkHadoopMapReduceUtil.scala and put it into my own source code to bypass the problem. Any comment on this? Thanks. From: eyc...@hotmail.com To: so...@cloudera.com CC: user@spark.apache.org Subject: RE: spark 1.1.0 save data to hdfs failed Date: Fri, 23 Jan 2015 11:17:36 -0800 Thanks. I looked at the dependency tree. I did not see any dependent jar of hadoop-core from hadoop2. However the jar built from maven has the class: org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class Do you know why? Date: Fri, 23 Jan 2015 17:01:48 + Subject: RE: spark 1.1.0 save data to hdfs failed From: so...@cloudera.com To: eyc...@hotmail.com Are you receiving my replies? I have suggested a resolution. Look at the dependency tree next. On Jan 23, 2015 2:43 PM, "ey-chih chow" wrote: I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is broken in the following code: def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): TaskAttemptContext = {val klass = firstAvailableClass( "org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl", // hadoop2, hadoop2-yarn"org.apache.hadoop.mapreduce.TaskAttemptContext") // hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], classOf[TaskAttemptID])ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext] } In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1. Any suggestion how to resolve it? Thanks. > From: so...@cloudera.com > Date: Fri, 23 Jan 2015 14:01:45 +0000 > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: user@spark.apache.org > > These are all definitely symptoms of mixing incompatible versions of > libraries. > > I'm not suggesting you haven't excluded Spark / Hadoop, but, this is > not the only way Hadoop deps get into your app. See my suggestion > about investigating the dependency tree. > > On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow wrote: > > Thanks. But I think I already mark all the Spark and Hadoop reps as > > provided. Why the cluster's version is not used? > > > > Any way, as I mentioned in the previous message, after changing the > > hadoop-client to version 1.2.1 in my maven deps, I already pass the > > exception and go to another one as indicated below. Any suggestion on this? > > > > = > > > > Exception in thread "main" java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > > Caused by: java.lang.IncompatibleClassChangeError: Implementing class > > at java.lang.ClassLoader.defineClass1(Native Method) > > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:191) > > at > > org.apache.hadoop.mapreduce.Spar
RE: spark 1.1.0 save data to hdfs failed
Thanks. I looked at the dependency tree. I did not see any dependent jar of hadoop-core from hadoop2. However the jar built from maven has the class: org/apache/hadoop/mapreduce/task/TaskAttemptContextImpl.class Do you know why? Date: Fri, 23 Jan 2015 17:01:48 + Subject: RE: spark 1.1.0 save data to hdfs failed From: so...@cloudera.com To: eyc...@hotmail.com Are you receiving my replies? I have suggested a resolution. Look at the dependency tree next. On Jan 23, 2015 2:43 PM, "ey-chih chow" wrote: I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is broken in the following code: def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): TaskAttemptContext = {val klass = firstAvailableClass( "org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl", // hadoop2, hadoop2-yarn"org.apache.hadoop.mapreduce.TaskAttemptContext") // hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], classOf[TaskAttemptID])ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext] } In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1. Any suggestion how to resolve it? Thanks. > From: so...@cloudera.com > Date: Fri, 23 Jan 2015 14:01:45 +0000 > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: user@spark.apache.org > > These are all definitely symptoms of mixing incompatible versions of > libraries. > > I'm not suggesting you haven't excluded Spark / Hadoop, but, this is > not the only way Hadoop deps get into your app. See my suggestion > about investigating the dependency tree. > > On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow wrote: > > Thanks. But I think I already mark all the Spark and Hadoop reps as > > provided. Why the cluster's version is not used? > > > > Any way, as I mentioned in the previous message, after changing the > > hadoop-client to version 1.2.1 in my maven deps, I already pass the > > exception and go to another one as indicated below. Any suggestion on this? > > > > = > > > > Exception in thread "main" java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > > Caused by: java.lang.IncompatibleClassChangeError: Implementing class > > at java.lang.ClassLoader.defineClass1(Native Method) > > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:191) > > at > > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73) > > at > > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35) > > at > > org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53) > > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932) > > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) > > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103) > > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) > > > > ... 6 more > >
RE: spark 1.1.0 save data to hdfs failed
I also think the code is not robust enough. First, Spark works with hadoop1, why the code try hadoop2 first. Also the following code only handle ClassNotFoundException. It should handle all the exceptions. private def firstAvailableClass(first: String, second: String): Class[_] = { try { Class.forName(first)} catch { case e: ClassNotFoundException =>Class.forName(second)} } From: eyc...@hotmail.com To: so...@cloudera.com CC: user@spark.apache.org Subject: RE: spark 1.1.0 save data to hdfs failed Date: Fri, 23 Jan 2015 06:43:00 -0800 I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is broken in the following code: def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): TaskAttemptContext = {val klass = firstAvailableClass( "org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl", // hadoop2, hadoop2-yarn"org.apache.hadoop.mapreduce.TaskAttemptContext") // hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], classOf[TaskAttemptID])ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext] } In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1. Any suggestion how to resolve it? Thanks. > From: so...@cloudera.com > Date: Fri, 23 Jan 2015 14:01:45 +0000 > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: user@spark.apache.org > > These are all definitely symptoms of mixing incompatible versions of > libraries. > > I'm not suggesting you haven't excluded Spark / Hadoop, but, this is > not the only way Hadoop deps get into your app. See my suggestion > about investigating the dependency tree. > > On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow wrote: > > Thanks. But I think I already mark all the Spark and Hadoop reps as > > provided. Why the cluster's version is not used? > > > > Any way, as I mentioned in the previous message, after changing the > > hadoop-client to version 1.2.1 in my maven deps, I already pass the > > exception and go to another one as indicated below. Any suggestion on this? > > > > = > > > > Exception in thread "main" java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > > Caused by: java.lang.IncompatibleClassChangeError: Implementing class > > at java.lang.ClassLoader.defineClass1(Native Method) > > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:191) > > at > > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73) > > at > > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35) > > at > > org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53) > > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932) > > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) > > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103) > > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) > > > > ... 6 more > >
RE: spark 1.1.0 save data to hdfs failed
I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is broken in the following code: def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID): TaskAttemptContext = {val klass = firstAvailableClass( "org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl", // hadoop2, hadoop2-yarn"org.apache.hadoop.mapreduce.TaskAttemptContext") // hadoop1val ctor = klass.getDeclaredConstructor(classOf[Configuration], classOf[TaskAttemptID])ctor.newInstance(conf, attemptId).asInstanceOf[TaskAttemptContext] } In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1. Any suggestion how to resolve it? Thanks. > From: so...@cloudera.com > Date: Fri, 23 Jan 2015 14:01:45 + > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: user@spark.apache.org > > These are all definitely symptoms of mixing incompatible versions of > libraries. > > I'm not suggesting you haven't excluded Spark / Hadoop, but, this is > not the only way Hadoop deps get into your app. See my suggestion > about investigating the dependency tree. > > On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow wrote: > > Thanks. But I think I already mark all the Spark and Hadoop reps as > > provided. Why the cluster's version is not used? > > > > Any way, as I mentioned in the previous message, after changing the > > hadoop-client to version 1.2.1 in my maven deps, I already pass the > > exception and go to another one as indicated below. Any suggestion on this? > > > > = > > > > Exception in thread "main" java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > > Caused by: java.lang.IncompatibleClassChangeError: Implementing class > > at java.lang.ClassLoader.defineClass1(Native Method) > > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:191) > > at > > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73) > > at > > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35) > > at > > org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53) > > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932) > > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) > > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103) > > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) > > > > ... 6 more > >
Re: spark 1.1.0 save data to hdfs failed
These are all definitely symptoms of mixing incompatible versions of libraries. I'm not suggesting you haven't excluded Spark / Hadoop, but, this is not the only way Hadoop deps get into your app. See my suggestion about investigating the dependency tree. On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow wrote: > Thanks. But I think I already mark all the Spark and Hadoop reps as > provided. Why the cluster's version is not used? > > Any way, as I mentioned in the previous message, after changing the > hadoop-client to version 1.2.1 in my maven deps, I already pass the > exception and go to another one as indicated below. Any suggestion on this? > > = > > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: java.lang.IncompatibleClassChangeError: Implementing class > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73) > at > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35) > at > org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103) > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) > > ... 6 more > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: spark 1.1.0 save data to hdfs failed
Thanks. But I think I already mark all the Spark and Hadoop reps as provided. Why the cluster's version is not used? Any way, as I mentioned in the previous message, after changing the hadoop-client to version 1.2.1 in my maven deps, I already pass the exception and go to another one as indicated below. Any suggestion on this? =Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73) at org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35) at org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103) at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)... 6 more > From: so...@cloudera.com > Date: Fri, 23 Jan 2015 10:41:12 + > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: user@spark.apache.org > > So, you should not depend on Hadoop artifacts unless you use them > directly. You should mark Hadoop and Spark deps as provided. Then the > cluster's version is used at runtime with spark-submit. That's the > usual way to do it, which works. > > If you need to embed Spark in your app and are running it outside the > cluster for some reason, and you have to embed Hadoop and Spark code > in your app, the version has to match. You should also use mvn > dependency:tree to see all the dependencies coming in. There may be > many sources of a Hadoop dep. > > On Fri, Jan 23, 2015 at 1:05 AM, ey-chih chow wrote: > > Thanks. But after I replace the maven dependence from > > > > > > org.apache.hadoop > > hadoop-client > > 2.5.0-cdh5.2.0 > > provided > > > > > > org.mortbay.jetty > > servlet-api > > > > > > javax.servlet > > servlet-api > > > > > > io.netty > > netty > > > > > > > > > > to > > > > > > > > org.apache.hadoop > > > > hadoop-client > > > > 1.0.4 > > > > provided > > > > > > > > > > > > org.mortbay.jetty > > > > servlet-api &
Re: spark 1.1.0 save data to hdfs failed
So, you should not depend on Hadoop artifacts unless you use them directly. You should mark Hadoop and Spark deps as provided. Then the cluster's version is used at runtime with spark-submit. That's the usual way to do it, which works. If you need to embed Spark in your app and are running it outside the cluster for some reason, and you have to embed Hadoop and Spark code in your app, the version has to match. You should also use mvn dependency:tree to see all the dependencies coming in. There may be many sources of a Hadoop dep. On Fri, Jan 23, 2015 at 1:05 AM, ey-chih chow wrote: > Thanks. But after I replace the maven dependence from > > > org.apache.hadoop > hadoop-client > 2.5.0-cdh5.2.0 > provided > > > org.mortbay.jetty > servlet-api > > > javax.servlet > servlet-api > > > io.netty > netty > > > > > to > > > > org.apache.hadoop > > hadoop-client > > 1.0.4 > > provided > > > > > > org.mortbay.jetty > > servlet-api > > > > > > javax.servlet > > servlet-api > > > > > > io.netty > > netty > > > > > > > > > the warning message is still shown up in the namenode log. Is there any > other thing I need to do? > > > Thanks. > > > Ey-Chih Chow > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: spark 1.1.0 save data to hdfs failed
After I changed the dependency to the following: org.apache.hadoop hadoop-client 1.2.1 org.mortbay.jetty servlet-api javax.servlet servlet-api io.netty netty I got the following error. Any idea on this? Thanks. ===Caused by: java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73) at org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35) at org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103) at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) ... 6 moreFrom: eyc...@hotmail.com To: so...@cloudera.com CC: yuzhih...@gmail.com; user@spark.apache.org Subject: RE: spark 1.1.0 save data to hdfs failed Date: Thu, 22 Jan 2015 17:05:26 -0800 Thanks. But after I replace the maven dependence from org.apache.hadoop hadoop-client 2.5.0-cdh5.2.0 provided org.mortbay.jetty servlet-api javax.servlet servlet-api io.netty netty to org.apache.hadoop hadoop-client 1.0.4 provided org.mortbay.jetty servlet-api javax.servlet servlet-api io.netty netty the warning message is still shown up in the namenode log. Is there any other thing I need to do? Thanks. Ey-Chih Chow > From: so...@cloudera.com > Date: Thu, 22 Jan 2015 22:34:22 + > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: yuzhih...@gmail.com; user@spark.apache.org > > It means your client app is using Hadoop 2.x and your HDFS is Hadoop 1.x. > > On Thu, Jan 22, 2015 at 10:32 PM, ey-chih chow wrote: > > I looked into the namenode log and found this message: > > > > 2015-01-22 22:18:39,441 WARN org.apache.hadoop.ipc.Server: Incorrect header > > or version mismatch from 10.33.140.233:53776 got version 9 expected version > > 4 > > > > What should I do to fix this? > > > > Thanks. > > > > Ey-Chih > > > > ____ > > From: eyc...@hotmail.com > > To: yuzhih...@gmail.com > > CC: user@spark.apache.o
RE: spark 1.1.0 save data to hdfs failed
Thanks. But after I replace the maven dependence from org.apache.hadoop hadoop-client 2.5.0-cdh5.2.0 provided org.mortbay.jetty servlet-api javax.servlet servlet-api io.netty netty to org.apache.hadoop hadoop-client 1.0.4 provided org.mortbay.jetty servlet-api javax.servlet servlet-api io.netty netty the warning message is still shown up in the namenode log. Is there any other thing I need to do? Thanks. Ey-Chih Chow > From: so...@cloudera.com > Date: Thu, 22 Jan 2015 22:34:22 + > Subject: Re: spark 1.1.0 save data to hdfs failed > To: eyc...@hotmail.com > CC: yuzhih...@gmail.com; user@spark.apache.org > > It means your client app is using Hadoop 2.x and your HDFS is Hadoop 1.x. > > On Thu, Jan 22, 2015 at 10:32 PM, ey-chih chow wrote: > > I looked into the namenode log and found this message: > > > > 2015-01-22 22:18:39,441 WARN org.apache.hadoop.ipc.Server: Incorrect header > > or version mismatch from 10.33.140.233:53776 got version 9 expected version > > 4 > > > > What should I do to fix this? > > > > Thanks. > > > > Ey-Chih > > > > ____ > > From: eyc...@hotmail.com > > To: yuzhih...@gmail.com > > CC: user@spark.apache.org > > Subject: RE: spark 1.1.0 save data to hdfs failed > > Date: Wed, 21 Jan 2015 23:12:56 -0800 > > > > The hdfs release should be hadoop 1.0.4. > > > > Ey-Chih Chow > > > > > > Date: Wed, 21 Jan 2015 16:56:25 -0800 > > Subject: Re: spark 1.1.0 save data to hdfs failed > > From: yuzhih...@gmail.com > > To: eyc...@hotmail.com > > CC: user@spark.apache.org > > > > What hdfs release are you using ? > > > > Can you check namenode log around time of error below to see if there is > > some clue ? > > > > Cheers > > > > On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow wrote: > > > > Hi, > > > > I used the following fragment of a scala program to save data to hdfs: > > > > contextAwareEvents > > .map(e => (new AvroKey(e), null)) > > .saveAsNewAPIHadoopFile("hdfs://" + masterHostname + ":9000/ETL/output/" > > + dateDir, > > classOf[AvroKey[GenericRecord]], > > classOf[NullWritable], > > classOf[AvroKeyOutputFormat[GenericRecord]], > > job.getConfiguration) > > > > But it failed with the following error messages. Is there any people who > > can help? Thanks. > > > > Ey-Chih Chow > > > > = > > > > Exception in thread "main" java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > > at > > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > > Caused by: java.io.IOException: Failed on local exception: > > java.io.EOFException; Host Details : local host is: > > "ip-10-33-140-157/10.33.140.157"; destination host is: > > "ec2-54-203-58-2.us-west-2.compute.amazonaws.com":9000; > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > &
Re: spark 1.1.0 save data to hdfs failed
It means your client app is using Hadoop 2.x and your HDFS is Hadoop 1.x. On Thu, Jan 22, 2015 at 10:32 PM, ey-chih chow wrote: > I looked into the namenode log and found this message: > > 2015-01-22 22:18:39,441 WARN org.apache.hadoop.ipc.Server: Incorrect header > or version mismatch from 10.33.140.233:53776 got version 9 expected version > 4 > > What should I do to fix this? > > Thanks. > > Ey-Chih > > > From: eyc...@hotmail.com > To: yuzhih...@gmail.com > CC: user@spark.apache.org > Subject: RE: spark 1.1.0 save data to hdfs failed > Date: Wed, 21 Jan 2015 23:12:56 -0800 > > The hdfs release should be hadoop 1.0.4. > > Ey-Chih Chow > > ________ > Date: Wed, 21 Jan 2015 16:56:25 -0800 > Subject: Re: spark 1.1.0 save data to hdfs failed > From: yuzhih...@gmail.com > To: eyc...@hotmail.com > CC: user@spark.apache.org > > What hdfs release are you using ? > > Can you check namenode log around time of error below to see if there is > some clue ? > > Cheers > > On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow wrote: > > Hi, > > I used the following fragment of a scala program to save data to hdfs: > > contextAwareEvents > .map(e => (new AvroKey(e), null)) > .saveAsNewAPIHadoopFile("hdfs://" + masterHostname + ":9000/ETL/output/" > + dateDir, > classOf[AvroKey[GenericRecord]], > classOf[NullWritable], > classOf[AvroKeyOutputFormat[GenericRecord]], > job.getConfiguration) > > But it failed with the following error messages. Is there any people who > can help? Thanks. > > Ey-Chih Chow > > = > > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > at > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: java.io.IOException: Failed on local exception: > java.io.EOFException; Host Details : local host is: > "ip-10-33-140-157/10.33.140.157"; destination host is: > "ec2-54-203-58-2.us-west-2.compute.amazonaws.com":9000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1415) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) > at com.crowdstar.etl.ParseAndClean$.main(Par
RE: spark 1.1.0 save data to hdfs failed
I looked into the namenode log and found this message: 2015-01-22 22:18:39,441 WARN org.apache.hadoop.ipc.Server: Incorrect header or version mismatch from 10.33.140.233:53776 got version 9 expected version 4 What should I do to fix this? Thanks. Ey-Chih From: eyc...@hotmail.com To: yuzhih...@gmail.com CC: user@spark.apache.org Subject: RE: spark 1.1.0 save data to hdfs failed Date: Wed, 21 Jan 2015 23:12:56 -0800 The hdfs release should be hadoop 1.0.4. Ey-Chih Chow Date: Wed, 21 Jan 2015 16:56:25 -0800 Subject: Re: spark 1.1.0 save data to hdfs failed From: yuzhih...@gmail.com To: eyc...@hotmail.com CC: user@spark.apache.org What hdfs release are you using ? Can you check namenode log around time of error below to see if there is some clue ? Cheers On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow wrote: Hi, I used the following fragment of a scala program to save data to hdfs: contextAwareEvents .map(e => (new AvroKey(e), null)) .saveAsNewAPIHadoopFile("hdfs://" + masterHostname + ":9000/ETL/output/" + dateDir, classOf[AvroKey[GenericRecord]], classOf[NullWritable], classOf[AvroKeyOutputFormat[GenericRecord]], job.getConfiguration) But it failed with the following error messages. Is there any people who can help? Thanks. Ey-Chih Chow = Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ip-10-33-140-157/10.33.140.157"; destination host is: "ec2-54-203-58-2.us-west-2.compute.amazonaws.com":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:101) at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) ... 6 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950) === -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-save-data-to-hdfs-failed-tp21305.html Sent from the Apac
RE: spark 1.1.0 save data to hdfs failed
The hdfs release should be hadoop 1.0.4. Ey-Chih Chow Date: Wed, 21 Jan 2015 16:56:25 -0800 Subject: Re: spark 1.1.0 save data to hdfs failed From: yuzhih...@gmail.com To: eyc...@hotmail.com CC: user@spark.apache.org What hdfs release are you using ? Can you check namenode log around time of error below to see if there is some clue ? Cheers On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow wrote: Hi, I used the following fragment of a scala program to save data to hdfs: contextAwareEvents .map(e => (new AvroKey(e), null)) .saveAsNewAPIHadoopFile("hdfs://" + masterHostname + ":9000/ETL/output/" + dateDir, classOf[AvroKey[GenericRecord]], classOf[NullWritable], classOf[AvroKeyOutputFormat[GenericRecord]], job.getConfiguration) But it failed with the following error messages. Is there any people who can help? Thanks. Ey-Chih Chow = Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) Caused by: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: "ip-10-33-140-157/10.33.140.157"; destination host is: "ec2-54-203-58-2.us-west-2.compute.amazonaws.com":9000; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:101) at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) ... 6 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950) === -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-save-data-to-hdfs-failed-tp21305.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark 1.1.0 save data to hdfs failed
What hdfs release are you using ? Can you check namenode log around time of error below to see if there is some clue ? Cheers On Wed, Jan 21, 2015 at 4:51 PM, ey-chih chow wrote: > Hi, > > I used the following fragment of a scala program to save data to hdfs: > > contextAwareEvents > .map(e => (new AvroKey(e), null)) > .saveAsNewAPIHadoopFile("hdfs://" + masterHostname + > ":9000/ETL/output/" > + dateDir, > classOf[AvroKey[GenericRecord]], > classOf[NullWritable], > classOf[AvroKeyOutputFormat[GenericRecord]], > job.getConfiguration) > > But it failed with the following error messages. Is there any people who > can help? Thanks. > > Ey-Chih Chow > > = > > Exception in thread "main" java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40) > at > org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala) > Caused by: java.io.IOException: Failed on local exception: > java.io.EOFException; Host Details : local host is: > "ip-10-33-140-157/10.33.140.157"; destination host is: > "ec2-54-203-58-2.us-west-2.compute.amazonaws.com":9000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1415) > at org.apache.hadoop.ipc.Client.call(Client.java:1364) > at > > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > at > > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy15.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1925) > at > > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1079) > at > > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1075) > at > > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1075) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) > at > > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:145) > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) > at > > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832) > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:101) > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala) > ... 6 more > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950) > > === > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-save-data-to-hdfs-failed-tp21305.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >