[ https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ivan Sadikov updated SPARK-44940: --------------------------------- Description: Follow-up on https://issues.apache.org/jira/browse/SPARK-40646. I found that JSON parsing is significantly slower due to exception creation in control flow. Also, some fields are not parsed correctly and the exception is thrown in certain cases: {code:java} Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to org.apache.spark.sql.catalyst.InternalRow at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590) ... 39 more {code} was:Follow-up on https://issues.apache.org/jira/browse/SPARK-40646. > Improve performance of JSON parsing when partial results are enabled > -------------------------------------------------------------------- > > Key: SPARK-44940 > URL: https://issues.apache.org/jira/browse/SPARK-44940 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.4.0, 3.5.0, 4.0.0 > Reporter: Ivan Sadikov > Priority: Major > > Follow-up on https://issues.apache.org/jira/browse/SPARK-40646. > I found that JSON parsing is significantly slower due to exception creation > in control flow. Also, some fields are not parsed correctly and the exception > is thrown in certain cases: > {code:java} > Caused by: java.lang.ClassCastException: > org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to > org.apache.spark.sql.catalyst.InternalRow > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51) > at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51) > at > org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590) > ... 39 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org