[ 
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074109#comment-16074109
 ] 

Hossein Falaki commented on SPARK-21263:
----------------------------------------

[~sowen] note that user specified the mode to be PERMISSIVE. In this mode CSV 
data source will try to ignore errors and return some result. If the mode is 
FAILFAST, it should throw an exception. I see the permissiveness of different 
modes as follows:

{code}
PERMISSIVE > DROPMALFORMED > FAILFAST
{code}

Here we have different behavior for {{IntegerType}} vs. {{DoubleType}}. That 
needs to be fixed and behavior should be consistent.

> NumberFormatException is not thrown while converting an invalid string to 
> float/double
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-21263
>                 URL: https://issues.apache.org/jira/browse/SPARK-21263
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>            Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema, 
> exception is not thrown. Refer the details :
> *Data:* 
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','30000'
> '1002','Patient3','40000'
> '1003','Patient4','50000'
> '1004','Patient5','60000'
> *Source code*: 
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data: 
> dataset.collectAsList();
> *Schema1*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException 
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> *Schema2*: 
> [StructField(PatientID,IntegerType,true), 
> StructField(PatientName,StringType,true), 
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*: 
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string 
> "10u000"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to