Re: Continue reading dataframe from file despite errors
thanks Suresh. it worked nicely From: Suresh Thalamati <suresh.thalam...@gmail.com> Sent: Tuesday, September 12, 2017 2:59:29 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Continue reading dataframe from file despite errors Try the CSV Option(“mode”, "dropmalformed”), that might skip the error records. On Sep 12, 2017, at 2:33 PM, jeff saremi <jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> wrote: should have added some of the exception to be clear: 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 (TID 15, localhost, executor driver): java.lang.NumberFormatException: For input string: "south carolina" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250) From: jeff saremi <jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> Sent: Tuesday, September 12, 2017 2:32:03 PM To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Continue reading dataframe from file despite errors I'm using a statement like the following to load my dataframe from some text file Upon encountering the first error, the whole thing throws an exception and processing stops. I'd like to continue loading even if that results in zero rows in my dataframe. How can i do that? thanks spark.read.schema(SomeSchema).option("sep", "\t").format("csv").load("somepath")
Re: Continue reading dataframe from file despite errors
Try the CSV Option(“mode”, "dropmalformed”), that might skip the error records. > On Sep 12, 2017, at 2:33 PM, jeff saremiwrote: > > should have added some of the exception to be clear: > > 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 > (TID 15, localhost, executor driver): java.lang.NumberFormatException: For > input string: "south carolina" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) > at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250) > > From: jeff saremi > Sent: Tuesday, September 12, 2017 2:32:03 PM > To: user@spark.apache.org > Subject: Continue reading dataframe from file despite errors > > I'm using a statement like the following to load my dataframe from some text > file > Upon encountering the first error, the whole thing throws an exception and > processing stops. > I'd like to continue loading even if that results in zero rows in my > dataframe. How can i do that? > thanks > > spark.read.schema(SomeSchema).option("sep", > "\t").format("csv").load("somepath")
Re: Continue reading dataframe from file despite errors
should have added some of the exception to be clear: 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 (TID 15, localhost, executor driver): java.lang.NumberFormatException: For input string: "south carolina" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:580) at java.lang.Integer.parseInt(Integer.java:615) at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250) From: jeff saremiSent: Tuesday, September 12, 2017 2:32:03 PM To: user@spark.apache.org Subject: Continue reading dataframe from file despite errors I'm using a statement like the following to load my dataframe from some text file Upon encountering the first error, the whole thing throws an exception and processing stops. I'd like to continue loading even if that results in zero rows in my dataframe. How can i do that? thanks spark.read.schema(SomeSchema).option("sep", "\t").format("csv").load("somepath")