[jira] [Created] (SPARK-47731) Fix the 2b+ rows in a single rowgroup for row_index in Parquet reader

Thang Long Vu (Jira) Thu, 04 Apr 2024 08:30:34 -0700

Thang Long Vu created SPARK-47731:
-------------------------------------

             Summary: Fix the 2b+ rows in a single rowgroup for row_index in 
Parquet reader
                 Key: SPARK-47731
                 URL: https://issues.apache.org/jira/browse/SPARK-47731
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.5.0, 4.0.0
            Reporter: Thang Long Vu



Parquet reader in Spark has a bug where a file containing 2b+ rows in a single 
rowgroup causes it to run out of the `Integer` range. This prevents Delta 
Parquet readers from exposing the row_index field as a metadata field.

 

It would be great to have this fix so that we can use 2b+ rows in a single 
rowgroup and also to safely allow row_index field to be used in the Delta 
Parquet readers for any functionalities that might depend on it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47731) Fix the 2b+ rows in a single rowgroup for row_index in Parquet reader

Reply via email to