Josh Rosen created SPARK-30338: ---------------------------------- Summary: Avoid unnecessary InternalRow copies in ParquetRowConverter Key: SPARK-30338 URL: https://issues.apache.org/jira/browse/SPARK-30338 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Josh Rosen Assignee: Josh Rosen
ParquetRowConverter calls {{InternalRow.copy()}} in cases where the copy is unnecessary; this can severely harm performance when reading deeply-nested Parquet. It looks like this copying was originally added to handle arrays and maps of structs (in which case we need to keep the copying), but we can omit it for the more common case of structs nested directly in structs. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org