[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074109#comment-16074109 ]
Hossein Falaki commented on SPARK-21263: ---------------------------------------- [~sowen] note that user specified the mode to be PERMISSIVE. In this mode CSV data source will try to ignore errors and return some result. If the mode is FAILFAST, it should throw an exception. I see the permissiveness of different modes as follows: {code} PERMISSIVE > DROPMALFORMED > FAILFAST {code} Here we have different behavior for {{IntegerType}} vs. {{DoubleType}}. That needs to be fixed and behavior should be consistent. > NumberFormatException is not thrown while converting an invalid string to > float/double > -------------------------------------------------------------------------------------- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.1.1 > Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. Refer the details : > *Data:* > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','30000' > '1002','Patient3','40000' > '1003','Patient4','50000' > '1004','Patient5','60000' > *Source code*: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > *Schema1*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > *Schema2*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org