Try the CSV   Option(“mode”,  "dropmalformed”), that might skip the error 
records. 


> On Sep 12, 2017, at 2:33 PM, jeff saremi <jeffsar...@hotmail.com> wrote:
> 
> should have added some of the exception to be clear:
> 
> 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; 
> aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
> (TID 15, localhost, executor driver): java.lang.NumberFormatException: For 
> input string: "south carolina"
>         at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>         at java.lang.Integer.parseInt(Integer.java:580)
>         at java.lang.Integer.parseInt(Integer.java:615)
>         at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
>         at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
>         at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250)
> 
> From: jeff saremi <jeffsar...@hotmail.com>
> Sent: Tuesday, September 12, 2017 2:32:03 PM
> To: user@spark.apache.org
> Subject: Continue reading dataframe from file despite errors
>  
> I'm using a statement like the following to load my dataframe from some text 
> file
> Upon encountering the first error, the whole thing throws an exception and 
> processing stops.
> I'd like to continue loading even if that results in zero rows in my 
> dataframe. How can i do that?
> thanks
> 
> spark.read.schema(SomeSchema).option("sep", 
> "\t").format("csv").load("somepath")

Reply via email to