[GitHub] [spark] HyukjinKwon commented on a change in pull request #33706: [SPARK-36477][SQL] Inferring schema from JSON file shall respect ignoreCorruptFiles and handle IOE

GitBox Wed, 11 Aug 2021 03:18:36 -0700


HyukjinKwon commented on a change in pull request #33706:
URL: https://github.com/apache/spark/pull/33706#discussion_r686696280




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
##########
@@ -68,13 +71,19 @@ private[sql] class JsonInferSchema(options: JSONOptions) 
extends Serializable {
             Some(inferField(parser))
           }
         } catch {
-          case  e @ (_: RuntimeException | _: JsonProcessingException) => 
parseMode match {
-            case PermissiveMode =>
-              Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
StringType))))
-            case DropMalformedMode =>
+          case e @ (_: RuntimeException | _: JsonProcessingException | _: 
IOException) =>
+            if (ignoreCorruptFiles) {
+              logWarning(s"Skipped the corrupted file: $row", e)

Review comment:
       Also should we maybe exclude `RuntimeException` for now? I think we 
intentionally throw `RuntimeException` for some places like:
   
   ```
       case token =>
         // We cannot parse this token based on the given data type. So, we 
throw a
         // RuntimeException and this exception will be caught by `parse` 
method.
         throw QueryExecutionErrors.failToParseValueForDataTypeError(parser, 
token, dataType)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33706: [SPARK-36477][SQL] Inferring schema from JSON file shall respect ignoreCorruptFiles and handle IOE

Reply via email to