[ https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249508#comment-16249508 ]
Sean Owen commented on SPARK-20387: ----------------------------------- That's not the same example. I believe the underlying number parsing from the JDK will parse, in permissive move, anything it can as a string as a number and ignore the rest. I think that's consistent then. [~hyukjin.kwon]? There also appears to be a problem with your input -- extra blank column. > Permissive mode is not replacing corrupt record with null > --------------------------------------------------------- > > Key: SPARK-20387 > URL: https://issues.apache.org/jira/browse/SPARK-20387 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.1.0 > Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying "mode" as > PERMISSIVE. > Source File: > String,int,f1,bool1 > abc,23111,23.07738,true > abc,23111,23.07738,true > abc,23111,true,true > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.collect(); > Result: Error is thrown > stack trace: > ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15) > java.lang.IllegalArgumentException: For input string: "23.07738" > at > scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290) > at > scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260) > at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166) -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org