Try the CSV Option(“mode”, "dropmalformed”), that might skip the error records.
> On Sep 12, 2017, at 2:33 PM, jeff saremi <jeffsar...@hotmail.com> wrote: > > should have added some of the exception to be clear: > > 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; > aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 > (TID 15, localhost, executor driver): java.lang.NumberFormatException: For > input string: "south carolina" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) > at java.lang.Integer.parseInt(Integer.java:580) > at java.lang.Integer.parseInt(Integer.java:615) > at > scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) > at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250) > > From: jeff saremi <jeffsar...@hotmail.com> > Sent: Tuesday, September 12, 2017 2:32:03 PM > To: user@spark.apache.org > Subject: Continue reading dataframe from file despite errors > > I'm using a statement like the following to load my dataframe from some text > file > Upon encountering the first error, the whole thing throws an exception and > processing stops. > I'd like to continue loading even if that results in zero rows in my > dataframe. How can i do that? > thanks > > spark.read.schema(SomeSchema).option("sep", > "\t").format("csv").load("somepath")