subject:"Re\: Continue reading dataframe from file despite errors"

Re: Continue reading dataframe from file despite errors

2017-09-12 Thread jeff saremi

thanks Suresh. it worked nicely


From: Suresh Thalamati <suresh.thalam...@gmail.com>
Sent: Tuesday, September 12, 2017 2:59:29 PM
To: jeff saremi
Cc: user@spark.apache.org
Subject: Re: Continue reading dataframe from file despite errors

Try the CSV   Option(“mode”,  "dropmalformed”), that might skip the error 
records.


On Sep 12, 2017, at 2:33 PM, jeff saremi 
<jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>> wrote:

should have added some of the exception to be clear:

17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; 
aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
(TID 15, localhost, executor driver): java.lang.NumberFormatException: For 
input string: "south carolina"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250)



From: jeff saremi <jeffsar...@hotmail.com<mailto:jeffsar...@hotmail.com>>
Sent: Tuesday, September 12, 2017 2:32:03 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Continue reading dataframe from file despite errors

I'm using a statement like the following to load my dataframe from some text 
file
Upon encountering the first error, the whole thing throws an exception and 
processing stops.
I'd like to continue loading even if that results in zero rows in my dataframe. 
How can i do that?
thanks

spark.read.schema(SomeSchema).option("sep", "\t").format("csv").load("somepath")

Re: Continue reading dataframe from file despite errors

2017-09-12 Thread Suresh Thalamati

Try the CSV   Option(“mode”,  "dropmalformed”), that might skip the error 
records. 


> On Sep 12, 2017, at 2:33 PM, jeff saremi  wrote:
> 
> should have added some of the exception to be clear:
> 
> 17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; 
> aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
> (TID 15, localhost, executor driver): java.lang.NumberFormatException: For 
> input string: "south carolina"
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> at java.lang.Integer.parseInt(Integer.java:580)
> at java.lang.Integer.parseInt(Integer.java:615)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250)
> 
> From: jeff saremi 
> Sent: Tuesday, September 12, 2017 2:32:03 PM
> To: user@spark.apache.org
> Subject: Continue reading dataframe from file despite errors
>  
> I'm using a statement like the following to load my dataframe from some text 
> file
> Upon encountering the first error, the whole thing throws an exception and 
> processing stops.
> I'd like to continue loading even if that results in zero rows in my 
> dataframe. How can i do that?
> thanks
> 
> spark.read.schema(SomeSchema).option("sep", 
> "\t").format("csv").load("somepath")

Re: Continue reading dataframe from file despite errors

2017-09-12 Thread jeff saremi

should have added some of the exception to be clear:


17/09/12 14:14:17 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; 
aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 
(TID 15, localhost, executor driver): java.lang.NumberFormatException: For 
input string: "south carolina"
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:250)



From: jeff saremi 
Sent: Tuesday, September 12, 2017 2:32:03 PM
To: user@spark.apache.org
Subject: Continue reading dataframe from file despite errors


I'm using a statement like the following to load my dataframe from some text 
file

Upon encountering the first error, the whole thing throws an exception and 
processing stops.

I'd like to continue loading even if that results in zero rows in my dataframe. 
How can i do that?
thanks


spark.read.schema(SomeSchema).option("sep", "\t").format("csv").load("somepath")

Re: Continue reading dataframe from file despite errors

Re: Continue reading dataframe from file despite errors

Re: Continue reading dataframe from file despite errors

3 matches

Site Navigation

Mail list logo

Footer information