[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Navya Krishnappa updated SPARK-21263: ------------------------------------- Description: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Data 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','30000' '1002','Patient3','40000' '1003','Patient4','50000' '1004','Patient5','60000' Source code: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); Schema1: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" Schema2: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" was: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Data 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','30000' '1002','Patient3','40000' '1003','Patient4','50000' '1004','Patient5','60000' Schema1: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] Result : Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" Schema2: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] Actual Result: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, Expected Result: Should throw NumberFormatException for input string "10u000" > NumberFormatException is not thrown while converting an invalid string to > float/double > -------------------------------------------------------------------------------------- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.1.1 > Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. > Data > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','30000' > '1002','Patient3','40000' > '1003','Patient4','50000' > '1004','Patient5','60000' > Source code: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > Schema1: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > Schema2: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org