[jira] [Assigned] (SPARK-37176) JsonSource's infer should have the same exception handle logic as JacksonParser's parse logic

2021-11-02 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-37176:


Assignee: Xianjin YE

> JsonSource's infer should have the same exception handle logic as 
> JacksonParser's parse logic
> -
>
> Key: SPARK-37176
> URL: https://issues.apache.org/jira/browse/SPARK-37176
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Xianjin YE
>Assignee: Xianjin YE
>Priority: Minor
>
> JacksonParser's exception handle logic is different with 
> org.apache.spark.sql.catalyst.json.JsonInferSchema#infer logic, the different 
> can be saw as below:
> {code:java}
> // code JacksonParser's parse
> try {
>   Utils.tryWithResource(createParser(factory, record)) { parser =>
> // a null first token is equivalent to testing for input.trim.isEmpty
> // but it works on any token stream and not just strings
> parser.nextToken() match {
>   case null => None
>   case _ => rootConverter.apply(parser) match {
> case null => throw 
> QueryExecutionErrors.rootConverterReturnNullError()
> case rows => rows.toSeq
>   }
> }
>   }
> } catch {
>   case e: SparkUpgradeException => throw e
>   case e @ (_: RuntimeException | _: JsonProcessingException | _: 
> MalformedInputException) =>
> // JSON parser currently doesn't support partial results for 
> corrupted records.
> // For such records, all fields other than the field configured by
> // `columnNameOfCorruptRecord` are set to `null`.
> throw BadRecordException(() => recordLiteral(record), () => None, e)
>   case e: CharConversionException if options.encoding.isEmpty =>
> val msg =
>   """JSON parser cannot handle a character in its input.
> |Specifying encoding as an input option explicitly might help to 
> resolve the issue.
> |""".stripMargin + e.getMessage
> val wrappedCharException = new CharConversionException(msg)
> wrappedCharException.initCause(e)
> throw BadRecordException(() => recordLiteral(record), () => None, 
> wrappedCharException)
>   case PartialResultException(row, cause) =>
> throw BadRecordException(
>   record = () => recordLiteral(record),
>   partialResult = () => Some(row),
>   cause)
> }
> {code}
> v.s. 
> {code:java}
> // JsonInferSchema's infer logic
> val mergedTypesFromPartitions = json.mapPartitions { iter =>
>   val factory = options.buildJsonFactory()
>   iter.flatMap { row =>
> try {
>   Utils.tryWithResource(createParser(factory, row)) { parser =>
> parser.nextToken()
> Some(inferField(parser))
>   }
> } catch {
>   case  e @ (_: RuntimeException | _: JsonProcessingException) => 
> parseMode match {
> case PermissiveMode =>
>   Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
> StringType
> case DropMalformedMode =>
>   None
> case FailFastMode =>
>   throw 
> QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
>   }
> }
>   }.reduceOption(typeMerger).toIterator
> }
> {code}
> They should have the same exception handle logic, otherwise it may confuse 
> user because of the inconsistency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37176) JsonSource's infer should have the same exception handle logic as JacksonParser's parse logic

2021-11-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37176:


Assignee: (was: Apache Spark)

> JsonSource's infer should have the same exception handle logic as 
> JacksonParser's parse logic
> -
>
> Key: SPARK-37176
> URL: https://issues.apache.org/jira/browse/SPARK-37176
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Xianjin YE
>Priority: Minor
>
> JacksonParser's exception handle logic is different with 
> org.apache.spark.sql.catalyst.json.JsonInferSchema#infer logic, the different 
> can be saw as below:
> {code:java}
> // code JacksonParser's parse
> try {
>   Utils.tryWithResource(createParser(factory, record)) { parser =>
> // a null first token is equivalent to testing for input.trim.isEmpty
> // but it works on any token stream and not just strings
> parser.nextToken() match {
>   case null => None
>   case _ => rootConverter.apply(parser) match {
> case null => throw 
> QueryExecutionErrors.rootConverterReturnNullError()
> case rows => rows.toSeq
>   }
> }
>   }
> } catch {
>   case e: SparkUpgradeException => throw e
>   case e @ (_: RuntimeException | _: JsonProcessingException | _: 
> MalformedInputException) =>
> // JSON parser currently doesn't support partial results for 
> corrupted records.
> // For such records, all fields other than the field configured by
> // `columnNameOfCorruptRecord` are set to `null`.
> throw BadRecordException(() => recordLiteral(record), () => None, e)
>   case e: CharConversionException if options.encoding.isEmpty =>
> val msg =
>   """JSON parser cannot handle a character in its input.
> |Specifying encoding as an input option explicitly might help to 
> resolve the issue.
> |""".stripMargin + e.getMessage
> val wrappedCharException = new CharConversionException(msg)
> wrappedCharException.initCause(e)
> throw BadRecordException(() => recordLiteral(record), () => None, 
> wrappedCharException)
>   case PartialResultException(row, cause) =>
> throw BadRecordException(
>   record = () => recordLiteral(record),
>   partialResult = () => Some(row),
>   cause)
> }
> {code}
> v.s. 
> {code:java}
> // JsonInferSchema's infer logic
> val mergedTypesFromPartitions = json.mapPartitions { iter =>
>   val factory = options.buildJsonFactory()
>   iter.flatMap { row =>
> try {
>   Utils.tryWithResource(createParser(factory, row)) { parser =>
> parser.nextToken()
> Some(inferField(parser))
>   }
> } catch {
>   case  e @ (_: RuntimeException | _: JsonProcessingException) => 
> parseMode match {
> case PermissiveMode =>
>   Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
> StringType
> case DropMalformedMode =>
>   None
> case FailFastMode =>
>   throw 
> QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
>   }
> }
>   }.reduceOption(typeMerger).toIterator
> }
> {code}
> They should have the same exception handle logic, otherwise it may confuse 
> user because of the inconsistency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37176) JsonSource's infer should have the same exception handle logic as JacksonParser's parse logic

2021-11-01 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37176:


Assignee: Apache Spark

> JsonSource's infer should have the same exception handle logic as 
> JacksonParser's parse logic
> -
>
> Key: SPARK-37176
> URL: https://issues.apache.org/jira/browse/SPARK-37176
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Xianjin YE
>Assignee: Apache Spark
>Priority: Minor
>
> JacksonParser's exception handle logic is different with 
> org.apache.spark.sql.catalyst.json.JsonInferSchema#infer logic, the different 
> can be saw as below:
> {code:java}
> // code JacksonParser's parse
> try {
>   Utils.tryWithResource(createParser(factory, record)) { parser =>
> // a null first token is equivalent to testing for input.trim.isEmpty
> // but it works on any token stream and not just strings
> parser.nextToken() match {
>   case null => None
>   case _ => rootConverter.apply(parser) match {
> case null => throw 
> QueryExecutionErrors.rootConverterReturnNullError()
> case rows => rows.toSeq
>   }
> }
>   }
> } catch {
>   case e: SparkUpgradeException => throw e
>   case e @ (_: RuntimeException | _: JsonProcessingException | _: 
> MalformedInputException) =>
> // JSON parser currently doesn't support partial results for 
> corrupted records.
> // For such records, all fields other than the field configured by
> // `columnNameOfCorruptRecord` are set to `null`.
> throw BadRecordException(() => recordLiteral(record), () => None, e)
>   case e: CharConversionException if options.encoding.isEmpty =>
> val msg =
>   """JSON parser cannot handle a character in its input.
> |Specifying encoding as an input option explicitly might help to 
> resolve the issue.
> |""".stripMargin + e.getMessage
> val wrappedCharException = new CharConversionException(msg)
> wrappedCharException.initCause(e)
> throw BadRecordException(() => recordLiteral(record), () => None, 
> wrappedCharException)
>   case PartialResultException(row, cause) =>
> throw BadRecordException(
>   record = () => recordLiteral(record),
>   partialResult = () => Some(row),
>   cause)
> }
> {code}
> v.s. 
> {code:java}
> // JsonInferSchema's infer logic
> val mergedTypesFromPartitions = json.mapPartitions { iter =>
>   val factory = options.buildJsonFactory()
>   iter.flatMap { row =>
> try {
>   Utils.tryWithResource(createParser(factory, row)) { parser =>
> parser.nextToken()
> Some(inferField(parser))
>   }
> } catch {
>   case  e @ (_: RuntimeException | _: JsonProcessingException) => 
> parseMode match {
> case PermissiveMode =>
>   Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
> StringType
> case DropMalformedMode =>
>   None
> case FailFastMode =>
>   throw 
> QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
>   }
> }
>   }.reduceOption(typeMerger).toIterator
> }
> {code}
> They should have the same exception handle logic, otherwise it may confuse 
> user because of the inconsistency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org