[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720815#comment-17720815 ] ASF GitHub Bot commented on PARQUET-2297: - Fokko commented on code in PR #1089: URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1188231603 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java: ## @@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con } } -if (!reader.getRowGroups().isEmpty()) { +if (!reader.getRowGroups().isEmpty() && + // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8) Review Comment: Thanks for the explanation, I'm fine with leaving out a unit test. Just curious if it would be easy to modify existing tests to make sure that we hit the code. > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720816#comment-17720816 ] ASF GitHub Bot commented on PARQUET-2297: - Fokko merged PR #1092: URL: https://github.com/apache/parquet-mr/pull/1092 > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720775#comment-17720775 ] ASF GitHub Bot commented on PARQUET-2297: - ggershinsky merged PR #1089: URL: https://github.com/apache/parquet-mr/pull/1089 > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720392#comment-17720392 ] ASF GitHub Bot commented on PARQUET-2297: - ggershinsky opened a new pull request, #1092: URL: https://github.com/apache/parquet-mr/pull/1092 https://issues.apache.org/jira/browse/PARQUET-2297 For branch 1.13.x > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720293#comment-17720293 ] ASF GitHub Bot commented on PARQUET-2297: - ggershinsky commented on code in PR #1089: URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186829851 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java: ## @@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con } } -if (!reader.getRowGroups().isEmpty()) { +if (!reader.getRowGroups().isEmpty() && + // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8) Review Comment: - with delta encoding problem: basically impossible to reproduce :), it was resolved in 1.8 - without this problem: I've had a look at the existing unitests, unfortunately none can be used as a basis for adding a function for this particular situation. This will require building a new unitest from scratch. However, given that a) the patch is small and straightforward b) Spark stopped using this parquet read path - building a full unitest can be an overkill. But if you have a different opinion, please let me know. > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720143#comment-17720143 ] ASF GitHub Bot commented on PARQUET-2297: - Fokko commented on code in PR #1089: URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186655384 ## parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java: ## @@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit split, Configuration con } } -if (!reader.getRowGroups().isEmpty()) { +if (!reader.getRowGroups().isEmpty() && + // Encrypted files (parquet-mr 1.12+) can't have the delta encoding problem (resolved in parquet-mr 1.8) Review Comment: Could we add a test for this? > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719638#comment-17719638 ] ASF GitHub Bot commented on PARQUET-2297: - ggershinsky commented on PR #1089: URL: https://github.com/apache/parquet-mr/pull/1089#issuecomment-1535733579 SGTM, I'll send a PR to the parquet-1.13.x branch too > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719593#comment-17719593 ] ASF GitHub Bot commented on PARQUET-2297: - wgtmac commented on PR #1089: URL: https://github.com/apache/parquet-mr/pull/1089#issuecomment-1535610371 Should we include this fix to the next 1.13.1 release: https://lists.apache.org/thread/1mjvdcmwqjcblmfkfgpd9ob2yodx7tom ? @ggershinsky > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem
[ https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719170#comment-17719170 ] ASF GitHub Bot commented on PARQUET-2297: - ggershinsky opened a new pull request, #1089: URL: https://github.com/apache/parquet-mr/pull/1089 https://issues.apache.org/jira/browse/PARQUET-2297 > Encrypted files should not be checked for delta encoding problem > > > Key: PARQUET-2297 > URL: https://issues.apache.org/jira/browse/PARQUET-2297 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.13.0 >Reporter: Gidon Gershinsky >Assignee: Gidon Gershinsky >Priority: Major > Fix For: 1.14.0, 1.13.1 > > > Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) > was fixed in writers since parquet-mr-1.8. This fix also added a > `checkDeltaByteArrayProblem` method in readers, that runs over all columns > and checks for this problem in older files. > This now triggers an unrelated exception when reading encrypted files, in the > following situation: trying to read an unencrypted column, without having > keys for encrypted columns (see > https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, > with nested columns (files with regular columns are ok). > Possible solution: don't call the `checkDeltaByteArrayProblem` method for > encrypted files - because these files can be written only with > parquet-mr-1.12 and newer, where the delta encoding problem is already fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010)