Re: spark 2.4.3 build fails using java 8 and scala 2.11 with NumberFormatException: Not a version: 9
after blowing away my m2 repo cache; i was able to build just fine... i dont know why; but now it works :-) On Sun, May 19, 2019 at 10:22 PM Bulldog20630405 wrote: > i am trying to build spark 2.4.3 with the following env: > >- fedora 29 >- 1.8.0_202 >- spark 2.4.3 >- scala 2.11.12 >- maven 3.5.4 >- hadoop 2.6.5 > > according to the documentation this can be done with the following > commands: > *export TERM=xterm-color* > *./build/mvn -Pyarn -DskipTests clean package* > > however i get the following error (it seems to me that somehow it think i > am using java 9): > (note: my real goals is to build spark for hadoop 3; however, i need to > understand why the default build is failing first) > > *[ERROR] Failed to execute goal > net.alchim31.maven:scala-maven-plugin:3.2.2:compile *(scala-compile-first)* > on project spark-tags_2.11*: Execution scala-compile-first of goal > net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: CompileFailed > -> [Help 1] > > [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ > spark-tags_2.11 --- > [INFO] Using zinc server for incremental compilation > [info] 'compiler-interface' not yet compiled for Scala 2.11.12. > Compiling... > *error: java.lang.NumberFormatException: Not a version: 9* > at scala.util.PropertiesTrait$class.parts$1(Properties.scala:184) > at scala.util.PropertiesTrait$class.isJavaAtLeast(Properties.scala:187) > at scala.util.Properties$.isJavaAtLeast(Properties.scala:17) > at > scala.tools.util.PathResolverBase$Calculated$.javaBootClasspath(PathResolver.scala:276) > at > scala.tools.util.PathResolverBase$Calculated$.basis(PathResolver.scala:283) > at > scala.tools.util.PathResolverBase$Calculated$.containers$lzycompute(PathResolver.scala:293) > at > scala.tools.util.PathResolverBase$Calculated$.containers(PathResolver.scala:293) > at scala.tools.util.PathResolverBase.containers(PathResolver.scala:309) > at scala.tools.util.PathResolver.computeResult(PathResolver.scala:341) > at scala.tools.util.PathResolver.computeResult(PathResolver.scala:332) > at scala.tools.util.PathResolverBase.result(PathResolver.scala:314) > at > scala.tools.nsc.backend.JavaPlatform$class.classPath(JavaPlatform.scala:28) > at scala.tools.nsc.Global$GlobalPlatform.classPath(Global.scala:115) > at > scala.tools.nsc.Global.scala$tools$nsc$Global$$recursiveClassPath(Global.scala:131) > at scala.tools.nsc.Global$GlobalMirror.rootLoader(Global.scala:64) > at scala.reflect.internal.Mirrors$Roots$RootClass.(Mirrors.scala:307) > at > scala.reflect.internal.Mirrors$Roots.RootClass$lzycompute(Mirrors.scala:321) > at scala.reflect.internal.Mirrors$Roots.RootClass(Mirrors.scala:321) > at > scala.reflect.internal.Mirrors$Roots$EmptyPackageClass.(Mirrors.scala:330) > at > scala.reflect.internal.Mirrors$Roots.EmptyPackageClass$lzycompute(Mirrors.scala:336) > at > scala.reflect.internal.Mirrors$Roots.EmptyPackageClass(Mirrors.scala:336) > at > scala.reflect.internal.Mirrors$Roots.EmptyPackageClass(Mirrors.scala:276) > at scala.reflect.internal.Mirrors$RootsBase.init(Mirrors.scala:250) > at scala.tools.nsc.Global.rootMirror$lzycompute(Global.scala:73) > at scala.tools.nsc.Global.rootMirror(Global.scala:71) > at scala.tools.nsc.Global.rootMirror(Global.scala:39) > at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:257) > at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:257) > at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1390) > at scala.tools.nsc.Global$Run.(Global.scala:1242) > at scala.tools.nsc.Driver.doCompile(Driver.scala:31) > at scala.tools.nsc.MainClass.doCompile(Main.scala:23) > at scala.tools.nsc.Driver.process(Driver.scala:51) > at scala.tools.nsc.Main.process(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sbt.compiler.RawCompiler.apply(RawCompiler.scala:33) > at > sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1$$anonfun$apply$2.apply(AnalyzingCompiler.scala:159) > at > sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1$$anonfun$apply$2.apply(AnalyzingCompiler.scala:155) > at sbt.IO$.withTemporaryDirectory(IO.scala:358) > at > sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1.apply(AnalyzingCompiler.scala:155) > at > sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1.apply(AnalyzingCompiler.scala:152) > at sbt.IO$.withTemporaryDirectory(IO.scala:358) > at > sbt.compiler.AnalyzingCompiler$.compileSources(AnalyzingCompiler.scala:152) > at sbt.compiler.IC$.compileInterfaceJar(IncrementalCompiler.scala:58) > at com.typesafe.zinc.Compiler$.compilerInterface(Compiler.scala:154) > at com.typesafe.zinc.Compiler$.create(Compiler.scala:55) >
spark 2.4.3 build fails using java 8 and scala 2.11 with NumberFormatException: Not a version: 9
i am trying to build spark 2.4.3 with the following env: - fedora 29 - 1.8.0_202 - spark 2.4.3 - scala 2.11.12 - maven 3.5.4 - hadoop 2.6.5 according to the documentation this can be done with the following commands: *export TERM=xterm-color* *./build/mvn -Pyarn -DskipTests clean package* however i get the following error (it seems to me that somehow it think i am using java 9): (note: my real goals is to build spark for hadoop 3; however, i need to understand why the default build is failing first) *[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile *(scala-compile-first)* on project spark-tags_2.11*: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed.: CompileFailed -> [Help 1] [INFO] --- scala-maven-plugin:3.2.2:compile (scala-compile-first) @ spark-tags_2.11 --- [INFO] Using zinc server for incremental compilation [info] 'compiler-interface' not yet compiled for Scala 2.11.12. Compiling... *error: java.lang.NumberFormatException: Not a version: 9* at scala.util.PropertiesTrait$class.parts$1(Properties.scala:184) at scala.util.PropertiesTrait$class.isJavaAtLeast(Properties.scala:187) at scala.util.Properties$.isJavaAtLeast(Properties.scala:17) at scala.tools.util.PathResolverBase$Calculated$.javaBootClasspath(PathResolver.scala:276) at scala.tools.util.PathResolverBase$Calculated$.basis(PathResolver.scala:283) at scala.tools.util.PathResolverBase$Calculated$.containers$lzycompute(PathResolver.scala:293) at scala.tools.util.PathResolverBase$Calculated$.containers(PathResolver.scala:293) at scala.tools.util.PathResolverBase.containers(PathResolver.scala:309) at scala.tools.util.PathResolver.computeResult(PathResolver.scala:341) at scala.tools.util.PathResolver.computeResult(PathResolver.scala:332) at scala.tools.util.PathResolverBase.result(PathResolver.scala:314) at scala.tools.nsc.backend.JavaPlatform$class.classPath(JavaPlatform.scala:28) at scala.tools.nsc.Global$GlobalPlatform.classPath(Global.scala:115) at scala.tools.nsc.Global.scala$tools$nsc$Global$$recursiveClassPath(Global.scala:131) at scala.tools.nsc.Global$GlobalMirror.rootLoader(Global.scala:64) at scala.reflect.internal.Mirrors$Roots$RootClass.(Mirrors.scala:307) at scala.reflect.internal.Mirrors$Roots.RootClass$lzycompute(Mirrors.scala:321) at scala.reflect.internal.Mirrors$Roots.RootClass(Mirrors.scala:321) at scala.reflect.internal.Mirrors$Roots$EmptyPackageClass.(Mirrors.scala:330) at scala.reflect.internal.Mirrors$Roots.EmptyPackageClass$lzycompute(Mirrors.scala:336) at scala.reflect.internal.Mirrors$Roots.EmptyPackageClass(Mirrors.scala:336) at scala.reflect.internal.Mirrors$Roots.EmptyPackageClass(Mirrors.scala:276) at scala.reflect.internal.Mirrors$RootsBase.init(Mirrors.scala:250) at scala.tools.nsc.Global.rootMirror$lzycompute(Global.scala:73) at scala.tools.nsc.Global.rootMirror(Global.scala:71) at scala.tools.nsc.Global.rootMirror(Global.scala:39) at scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:257) at scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:257) at scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1390) at scala.tools.nsc.Global$Run.(Global.scala:1242) at scala.tools.nsc.Driver.doCompile(Driver.scala:31) at scala.tools.nsc.MainClass.doCompile(Main.scala:23) at scala.tools.nsc.Driver.process(Driver.scala:51) at scala.tools.nsc.Main.process(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sbt.compiler.RawCompiler.apply(RawCompiler.scala:33) at sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1$$anonfun$apply$2.apply(AnalyzingCompiler.scala:159) at sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1$$anonfun$apply$2.apply(AnalyzingCompiler.scala:155) at sbt.IO$.withTemporaryDirectory(IO.scala:358) at sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1.apply(AnalyzingCompiler.scala:155) at sbt.compiler.AnalyzingCompiler$$anonfun$compileSources$1.apply(AnalyzingCompiler.scala:152) at sbt.IO$.withTemporaryDirectory(IO.scala:358) at sbt.compiler.AnalyzingCompiler$.compileSources(AnalyzingCompiler.scala:152) at sbt.compiler.IC$.compileInterfaceJar(IncrementalCompiler.scala:58) at com.typesafe.zinc.Compiler$.compilerInterface(Compiler.scala:154) at com.typesafe.zinc.Compiler$.create(Compiler.scala:55) at com.typesafe.zinc.Compiler$$anonfun$apply$1.apply(Compiler.scala:42) at com.typesafe.zinc.Compiler$$anonfun$apply$1.apply(Compiler.scala:42) at com.typesafe.zinc.Cache.get(Cache.scala:41) at com.typesafe.zinc.Compiler$.apply(Compiler.scala:42) at com.typesafe.zinc.Main$.run(Main.scala:96) at com.typesafe.zinc.Nailgun$.zinc(Nailgun.scala:95) at
Re: NumberFormatException while reading and split the file
Response to the 1st approach: When you do spark.read.text("/xyz/a/b/filename") it returns a DataFrame and when applying the rdd methods gives you a RDD[Row], so when you use map, your function get Row as the parameter i.e; ip in your code. Therefore you must use the Row methods to access its members. The error message says it clearly "error : value split is not a member of org.apache.spark.sql.Row" that there is no method like split so it is throwing error. Response to the 2nd approach: There is something fishy there. The if condition in Row ip(0).isEmpty() should catch the case when it is an empty string so when it is not actually empty ip(0).toInt shouldn't fail. But also you need to make sure ip(0) is not just some random string which can't be converted to Int. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
NumberFormatException while reading and split the file
1st Approach: error : value split is not a member of org.apache.spark.sql.Row? val newRdd = spark.read.text("/xyz/a/b/filename").rdd anotherRDD = newRdd. map(ip =>ip.split("\\|")).map(ip => Row(if (ip(0).isEmpty()) { null.asInstanceOf[Int] } else ip(0).toInt, ip(1), ip(2), ip(3), ip(4), ip(5)) I'm getting the error in the line 'ip.split("\\|")' value split is not a member of org.apache.spark.sql.Row? Another approach: error:"java.lang.NumberFormatException: For input string: val newRdd = spark.read.text("/xyz/a/b/filename").rdd anotherRDD = newRdd. map(ip =>ip.toString().split("\\|")).map(ip => Row(if (ip(0).isEmpty()) { null.asInstanceOf[Int] } else ip(0).toInt, ip(1), ip(2), ip(3), ip(4), ip(5)) anotherRDD.collect().foreach(println) In this case I'm getting the error "java.lang.NumberFormatException: For input string: "" -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: NumberFormatException: For input string: "0.00000"
It seems not an issue in Spark. Does "CSVParser" works fine without Spark with the data? BTW, it seems there is something wrong with your email address. I am sending this again. On 20 Sep 2016 8:32 a.m., "Hyukjin Kwon"wrote: > It seems not an issue in Spark. Does "CSVParser" works fine without Spark > with the data? > > On 20 Sep 2016 2:15 a.m., "Mohamed ismail" > wrote: > >> Hi all >> >> I am trying to read: >> >> sc.textFile(DataFile).mapPartitions(lines => { >> val parser = new CSVParser(",") >> lines.map(line=>parseLineToTuple(line, parser)) >> }) >> Data looks like: >> android phone,0,0,0,,0,0,0,0,0,0,0,5,0,0,0,5,0,0.0,0.0,0.000 >> 00,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0 >> ios phone,0,-1,0,,0,0,0,0,0,0,1,0,0,0,0,1,0,0.0,0.0,0.00 >> 000,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0 >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task >> 1 in stage 23055.0 failed 4 times, most recent failure: Lost task 1.3 in >> stage 23055.0 (TID 191607, ): >> java.lang.NumberFormatException: For input string: "0.0" >> >> Has anyone faced such issues. Is there a solution? >> >> Thanks, >> Mohamed >> >>
Re: NumberFormatException: For input string: "0.00000"
It seems not an issue in Spark. Does "CSVParser" works fine without Spark with the data? On 20 Sep 2016 2:15 a.m., "Mohamed ismail"wrote: > Hi all > > I am trying to read: > > sc.textFile(DataFile).mapPartitions(lines => { > val parser = new CSVParser(",") > lines.map(line=>parseLineToTuple(line, parser)) > }) > Data looks like: > android phone,0,0,0,,0,0,0,0,0,0,0,5,0,0,0,5,0,0.0,0.0,0. > 0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0 > ios phone,0,-1,0,,0,0,0,0,0,0,1,0,0,0,0,1,0,0.0,0.0,0. > 0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0 > > org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 > in stage 23055.0 failed 4 times, most recent failure: Lost task 1.3 in > stage 23055.0 (TID 191607, ): > java.lang.NumberFormatException: For input string: "0.0" > > Has anyone faced such issues. Is there a solution? > > Thanks, > Mohamed > >
NumberFormatException: For input string: "0.00000"
Hi all I am trying to read: sc.textFile(DataFile).mapPartitions(lines => { val parser = new CSVParser(",") lines.map(line=>parseLineToTuple(line, parser)) }) Data looks like: android phone,0,0,0,,0,0,0,0,0,0,0,5,0,0,0,5,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0 ios phone,0,-1,0,,0,0,0,0,0,0,1,0,0,0,0,1,0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0.0,0,0,0 org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 23055.0 failed 4 times, most recent failure: Lost task 1.3 in stage 23055.0 (TID 191607, ): java.lang.NumberFormatException: For input string: "0.0" Has anyone faced such issues. Is there a solution? Thanks, Mohamed
Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException
I am using pre built *spark-1.2.0-bin-hadoop2.4* from *[1] *to submit spark applications to yarn, I cannot find the pre built spark for *CDH-5.x* versions. So, In my case the org.apache.hadoop.yarn.util.ConverterUtils class is coming from the spark-assembly-1.1.0-hadoop2.4.0.jar which is part of the pre built spark and hence causing this issue. How / where can I get spark 1.2.0 built for CDH-5.3.0, Icheck in maven repo etc with no luck. *[1]* https://spark.apache.org/downloads.html On Fri, Jan 9, 2015 at 1:12 AM, Marcelo Vanzin van...@cloudera.com wrote: Just to add to Sandy's comment, check your client configuration (generally in /etc/spark/conf). If you're using CM, you may need to run the Deploy Client Configuration command on the cluster to update the configs to match the new version of CDH. On Thu, Jan 8, 2015 at 11:38 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Mukesh, Those line numbers in ConverterUtils in the stack trace don't appear to line up with CDH 5.3: https://github.com/cloudera/hadoop-common/blob/cdh5-2.5.0_5.3.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Is it possible you're still including the old jars on the classpath in some way? -Sandy On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, I am running spark inside YARN job. The spark-streaming job is running fine in CDH-5.0.0 but after the upgrade to 5.3.0 it cannot fetch containers with the below errors. Looks like the container id is incorrect and a string is present in a pace where it's expecting a number. java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 Caused by: java.lang.NumberFormatException: For input string: e01 Is this a bug?? Did you face something similar and any ideas how to fix this? 15/01/08 09:50:28 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/01/08 09:50:29 ERROR yarn.ApplicationMaster: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) at org.apache.spark.deploy.yarn.YarnRMClientImpl.getAttemptId(YarnRMClientImpl.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:515) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:513) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.lang.NumberFormatException: For input string: e01 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ... 11 more 15/01/08 09:50:29 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: Invalid ContainerId: container_e01_1420481081140_0006_01_01) -- Thanks Regards, Mukesh Jha -- Marcelo -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*
Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException
Again this is probably not the place for CDH-specific questions, and this one is already answered at http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/CDH-5-3-0-container-cannot-be-fetched-because-of/m-p/23497#M478 On Fri, Jan 9, 2015 at 9:23 AM, Mukesh Jha me.mukesh@gmail.com wrote: I am using pre built spark-1.2.0-bin-hadoop2.4 from [1] to submit spark applications to yarn, I cannot find the pre built spark for CDH-5.x versions. So, In my case the org.apache.hadoop.yarn.util.ConverterUtils class is coming from the spark-assembly-1.1.0-hadoop2.4.0.jar which is part of the pre built spark and hence causing this issue. How / where can I get spark 1.2.0 built for CDH-5.3.0, Icheck in maven repo etc with no luck. [1] https://spark.apache.org/downloads.html - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException
Hi Experts, I am running spark inside YARN job. The spark-streaming job is running fine in CDH-5.0.0 but after the upgrade to 5.3.0 it cannot fetch containers with the below errors. Looks like the container id is incorrect and a string is present in a pace where it's expecting a number. java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 Caused by: java.lang.NumberFormatException: For input string: e01 Is this a bug?? Did you face something similar and any ideas how to fix this? 15/01/08 09:50:28 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/01/08 09:50:29 ERROR yarn.ApplicationMaster: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) at org.apache.spark.deploy.yarn.YarnRMClientImpl.getAttemptId(YarnRMClientImpl.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:515) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:513) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.lang.NumberFormatException: For input string: e01 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ... 11 more 15/01/08 09:50:29 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: Invalid ContainerId: container_e01_1420481081140_0006_01_01) -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*
Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException
Hi Mukesh, Those line numbers in ConverterUtils in the stack trace don't appear to line up with CDH 5.3: https://github.com/cloudera/hadoop-common/blob/cdh5-2.5.0_5.3.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Is it possible you're still including the old jars on the classpath in some way? -Sandy On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, I am running spark inside YARN job. The spark-streaming job is running fine in CDH-5.0.0 but after the upgrade to 5.3.0 it cannot fetch containers with the below errors. Looks like the container id is incorrect and a string is present in a pace where it's expecting a number. java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 Caused by: java.lang.NumberFormatException: For input string: e01 Is this a bug?? Did you face something similar and any ideas how to fix this? 15/01/08 09:50:28 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/01/08 09:50:29 ERROR yarn.ApplicationMaster: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) at org.apache.spark.deploy.yarn.YarnRMClientImpl.getAttemptId(YarnRMClientImpl.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:515) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:513) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.lang.NumberFormatException: For input string: e01 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ... 11 more 15/01/08 09:50:29 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: Invalid ContainerId: container_e01_1420481081140_0006_01_01) -- Thanks Regards, *Mukesh Jha me.mukesh@gmail.com*
Re: SPARKonYARN failing on CDH 5.3.0 : container cannot be fetched because of NumberFormatException
Just to add to Sandy's comment, check your client configuration (generally in /etc/spark/conf). If you're using CM, you may need to run the Deploy Client Configuration command on the cluster to update the configs to match the new version of CDH. On Thu, Jan 8, 2015 at 11:38 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Mukesh, Those line numbers in ConverterUtils in the stack trace don't appear to line up with CDH 5.3: https://github.com/cloudera/hadoop-common/blob/cdh5-2.5.0_5.3.0/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/ConverterUtils.java Is it possible you're still including the old jars on the classpath in some way? -Sandy On Thu, Jan 8, 2015 at 3:38 AM, Mukesh Jha me.mukesh@gmail.com wrote: Hi Experts, I am running spark inside YARN job. The spark-streaming job is running fine in CDH-5.0.0 but after the upgrade to 5.3.0 it cannot fetch containers with the below errors. Looks like the container id is incorrect and a string is present in a pace where it's expecting a number. java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 Caused by: java.lang.NumberFormatException: For input string: e01 Is this a bug?? Did you face something similar and any ideas how to fix this? 15/01/08 09:50:28 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT] 15/01/08 09:50:29 ERROR yarn.ApplicationMaster: Uncaught exception: java.lang.IllegalArgumentException: Invalid ContainerId: container_e01_1420481081140_0006_01_01 at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:182) at org.apache.spark.deploy.yarn.YarnRMClientImpl.getAttemptId(YarnRMClientImpl.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:79) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:515) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:513) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Caused by: java.lang.NumberFormatException: For input string: e01 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483) at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137) at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177) ... 11 more 15/01/08 09:50:29 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: Invalid ContainerId: container_e01_1420481081140_0006_01_01) -- Thanks Regards, Mukesh Jha -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NumberFormatException
wow, really weird. My intuition is the same as everyone else's, some unprintable character. Here's a couple more debugging tricks I've used in the past: //set up an accumulator to catch the bad rows as a side-effect val nBadRows = sc.accumulator(0) val nGoodRows = sc.accumulator(0) val badRows = sc.accumulableCollection(scala.collection.mutable.Set[String]()) //flatMap so that you can skip the bad rows datastream.flatMap{ str = try { val strArray = str.trim().split(,) val result = (strArray(0).toInt, strArray(1).toInt) nGoodRows += 1 Some(result) } catch { case NumberFormatException = nBadRows += 1 badRows += str None } }.saveAsTextFile(...) if (badRows.value.nonEmpty) { println( BAD ROWS *) badRows.value.foreach{str = //look at a bit more info from each string ... print out length each character one by one println(str) println(str.length) str.foreach{println} println() } } // if it is some data corruption, that you just have to live with, you might leave the flatMap / try // even when you'e running it for real. But then you might want to add a little check that there aren't // t many bad rows. Note that the accumulator[Set] will run out of mem if there are really // a ton of bad rows, in which case you might switch to a reservoir sample val badFrac = nBadRows.value / (nGoodRows.value + nBadRows.value.toDouble) println(s${nBadRows.value} bad rows; ${nGoodRows.value} good rows; ($badFrac) bad fraction) if (badFrac maxAllowedBadRows) { throw new RuntimeException(too many bad rows! + badFrac) } On Mon, Dec 15, 2014 at 3:49 PM, yu yuz1...@iastate.edu wrote: Hello, everyone I know 'NumberFormatException' is due to the reason that String can not be parsed properly, but I really can not find any mistakes for my code. I hope someone may kindly help me. My hdfs file is as follows: 8,22 3,11 40,10 49,47 48,29 24,28 50,30 33,56 4,20 30,38 ... So each line contains an integer + , + an integer + \n My code is as follows: object StreamMonitor { def main(args: Array[String]): Unit = { val myFunc = (str: String) = { val strArray = str.trim().split(,) (strArray(0).toInt, strArray(1).toInt) } val conf = new SparkConf().setAppName(StreamMonitor); val ssc = new StreamingContext(conf, Seconds(30)); val datastream = ssc.textFileStream(/user/yu/streaminput); val newstream = datastream.map(myFunc) newstream.saveAsTextFiles(output/, ); ssc.start() ssc.awaitTermination() } } The exception info is: 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: 8 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) java.lang.Integer.parseInt(Integer.java:492) java.lang.Integer.parseInt(Integer.java:527) scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) scala.collection.immutable.StringOps.toInt(StringOps.scala:31) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) So based on the above info, 8 is the first number in the file and I think it should be parsed to integer without any problems. I know it may be a very stupid question and the answer may be very easy. But I really can not find the reason. I am thankful to anyone who helps! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
NumberFormatException
Hello, everyone I know 'NumberFormatException' is due to the reason that String can not be parsed properly, but I really can not find any mistakes for my code. I hope someone may kindly help me. My hdfs file is as follows: 8,22 3,11 40,10 49,47 48,29 24,28 50,30 33,56 4,20 30,38 ... So each line contains an integer + , + an integer + \n My code is as follows: object StreamMonitor { def main(args: Array[String]): Unit = { val myFunc = (str: String) = { val strArray = str.trim().split(,) (strArray(0).toInt, strArray(1).toInt) } val conf = new SparkConf().setAppName(StreamMonitor); val ssc = new StreamingContext(conf, Seconds(30)); val datastream = ssc.textFileStream(/user/yu/streaminput); val newstream = datastream.map(myFunc) newstream.saveAsTextFiles(output/, ); ssc.start() ssc.awaitTermination() } } The exception info is: 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: 8 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) java.lang.Integer.parseInt(Integer.java:492) java.lang.Integer.parseInt(Integer.java:527) scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) scala.collection.immutable.StringOps.toInt(StringOps.scala:31) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) So based on the above info, 8 is the first number in the file and I think it should be parsed to integer without any problems. I know it may be a very stupid question and the answer may be very easy. But I really can not find the reason. I am thankful to anyone who helps! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NumberFormatException
That certainly looks surprising. Are you sure there are no unprintable characters in the file? On Mon, Dec 15, 2014 at 9:49 PM, yu yuz1...@iastate.edu wrote: The exception info is: 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: 8 - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: NumberFormatException
Hi Yu, Try this : val data = csv.map( line = line.split(,).map(elem = elem.trim)) //lines in rows data.map( rec = (rec(0).toInt, rec(1).toInt)) to convert into integer. On 16 December 2014 at 10:49, yu [via Apache Spark User List] ml-node+s1001560n20694...@n3.nabble.com wrote: Hello, everyone I know 'NumberFormatException' is due to the reason that String can not be parsed properly, but I really can not find any mistakes for my code. I hope someone may kindly help me. My hdfs file is as follows: 8,22 3,11 40,10 49,47 48,29 24,28 50,30 33,56 4,20 30,38 ... So each line contains an integer + , + an integer + \n My code is as follows: object StreamMonitor { def main(args: Array[String]): Unit = { val myFunc = (str: String) = { val strArray = str.trim().split(,) (strArray(0).toInt, strArray(1).toInt) } val conf = new SparkConf().setAppName(StreamMonitor); val ssc = new StreamingContext(conf, Seconds(30)); val datastream = ssc.textFileStream(/user/yu/streaminput); val newstream = datastream.map(myFunc) newstream.saveAsTextFiles(output/, ); ssc.start() ssc.awaitTermination() } } The exception info is: 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: 8 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) java.lang.Integer.parseInt(Integer.java:492) java.lang.Integer.parseInt(Integer.java:527) scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) scala.collection.immutable.StringOps.toInt(StringOps.scala:31) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) So based on the above info, 8 is the first number in the file and I think it should be parsed to integer without any problems. I know it may be a very stupid question and the answer may be very easy. But I really can not find the reason. I am thankful to anyone who helps! -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html To start a new topic under Apache Spark User List, email ml-node+s1001560n1...@n3.nabble.com To unsubscribe from Apache Spark User List, click here http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ== . NAML http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Regards, Harihar Nahak BigData Developer Wynyard Email:hna...@wynyardgroup.com | Extn: 8019 - --Harihar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694p20696.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: NumberFormatException
There could be some other character like a space or ^M etc. You could try the following and see the actual row. val newstream = datastream.map(row = { try{ val strArray = str.trim().split(,) (strArray(0).toInt, strArray(1).toInt) //Instead try this //*(strArray(0).trim().toInt, strArray(1).trim().toInt)* }catch{ case e: Exception = println(W000t!! Exception!! = + e + \n The line was : + row); (0, 0) } }) Thanks Best Regards On Tue, Dec 16, 2014 at 3:19 AM, yu yuz1...@iastate.edu wrote: Hello, everyone I know 'NumberFormatException' is due to the reason that String can not be parsed properly, but I really can not find any mistakes for my code. I hope someone may kindly help me. My hdfs file is as follows: 8,22 3,11 40,10 49,47 48,29 24,28 50,30 33,56 4,20 30,38 ... So each line contains an integer + , + an integer + \n My code is as follows: object StreamMonitor { def main(args: Array[String]): Unit = { val myFunc = (str: String) = { val strArray = str.trim().split(,) (strArray(0).toInt, strArray(1).toInt) } val conf = new SparkConf().setAppName(StreamMonitor); val ssc = new StreamingContext(conf, Seconds(30)); val datastream = ssc.textFileStream(/user/yu/streaminput); val newstream = datastream.map(myFunc) newstream.saveAsTextFiles(output/, ); ssc.start() ssc.awaitTermination() } } The exception info is: 14/12/15 15:35:03 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, h3): java.lang.NumberFormatException: For input string: 8 java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) java.lang.Integer.parseInt(Integer.java:492) java.lang.Integer.parseInt(Integer.java:527) scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229) scala.collection.immutable.StringOps.toInt(StringOps.scala:31) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:9) StreamMonitor$$anonfun$1.apply(StreamMonitor.scala:7) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) scala.collection.Iterator$$anon$11.next(Iterator.scala:328) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:984) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) So based on the above info, 8 is the first number in the file and I think it should be parsed to integer without any problems. I know it may be a very stupid question and the answer may be very easy. But I really can not find the reason. I am thankful to anyone who helps! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NumberFormatException-tp20694.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org