Thang Long Vu created SPARK-47731: ------------------------------------- Summary: Fix the 2b+ rows in a single rowgroup for row_index in Parquet reader Key: SPARK-47731 URL: https://issues.apache.org/jira/browse/SPARK-47731 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Thang Long Vu
Parquet reader in Spark has a bug where a file containing 2b+ rows in a single rowgroup causes it to run out of the `Integer` range. This prevents Delta Parquet readers from exposing the row_index field as a metadata field. It would be great to have this fix so that we can use 2b+ rows in a single rowgroup and also to safely allow row_index field to be used in the Delta Parquet readers for any functionalities that might depend on it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org