[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720815#comment-17720815
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

Fokko commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1188231603


##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit 
split, Configuration con
   }
 }
 
-if (!reader.getRowGroups().isEmpty()) {
+if (!reader.getRowGroups().isEmpty() &&
+  // Encrypted files (parquet-mr 1.12+) can't have the delta encoding 
problem (resolved in parquet-mr 1.8)

Review Comment:
   Thanks for the explanation, I'm fine with leaving out a unit test. Just 
curious if it would be easy to modify existing tests to make sure that we hit 
the code.





> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720816#comment-17720816
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

Fokko merged PR #1092:
URL: https://github.com/apache/parquet-mr/pull/1092




> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720775#comment-17720775
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

ggershinsky merged PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089




> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720392#comment-17720392
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

ggershinsky opened a new pull request, #1092:
URL: https://github.com/apache/parquet-mr/pull/1092

   https://issues.apache.org/jira/browse/PARQUET-2297
   
   For branch 1.13.x




> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720293#comment-17720293
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

ggershinsky commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186829851


##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit 
split, Configuration con
   }
 }
 
-if (!reader.getRowGroups().isEmpty()) {
+if (!reader.getRowGroups().isEmpty() &&
+  // Encrypted files (parquet-mr 1.12+) can't have the delta encoding 
problem (resolved in parquet-mr 1.8)

Review Comment:
   - with delta encoding problem: basically impossible to reproduce :), it was 
resolved in 1.8
   - without this problem: I've had a look at the existing unitests, 
unfortunately none can be used as a basis for adding a function for this 
particular situation. This will require building a new unitest from scratch. 
However, given that a) the patch is small and straightforward b) Spark stopped 
using this parquet read path - building a full unitest can be an overkill. But 
if you have a different opinion, please let me know.





> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720143#comment-17720143
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

Fokko commented on code in PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#discussion_r1186655384


##
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetRecordReader.java:
##
@@ -173,7 +173,10 @@ private void initializeInternalReader(ParquetInputSplit 
split, Configuration con
   }
 }
 
-if (!reader.getRowGroups().isEmpty()) {
+if (!reader.getRowGroups().isEmpty() &&
+  // Encrypted files (parquet-mr 1.12+) can't have the delta encoding 
problem (resolved in parquet-mr 1.8)

Review Comment:
   Could we add a test for this?





> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719638#comment-17719638
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

ggershinsky commented on PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#issuecomment-1535733579

   SGTM, I'll send a PR to the parquet-1.13.x branch too




> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719593#comment-17719593
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

wgtmac commented on PR #1089:
URL: https://github.com/apache/parquet-mr/pull/1089#issuecomment-1535610371

   Should we include this fix to the next 1.13.1 release: 
https://lists.apache.org/thread/1mjvdcmwqjcblmfkfgpd9ob2yodx7tom ?  
@ggershinsky 




> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PARQUET-2297) Encrypted files should not be checked for delta encoding problem

2023-05-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719170#comment-17719170
 ] 

ASF GitHub Bot commented on PARQUET-2297:
-

ggershinsky opened a new pull request, #1089:
URL: https://github.com/apache/parquet-mr/pull/1089

   https://issues.apache.org/jira/browse/PARQUET-2297




> Encrypted files should not be checked for delta encoding problem
> 
>
> Key: PARQUET-2297
> URL: https://issues.apache.org/jira/browse/PARQUET-2297
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.13.0
>Reporter: Gidon Gershinsky
>Assignee: Gidon Gershinsky
>Priority: Major
> Fix For: 1.14.0, 1.13.1
>
>
> Delta encoding problem (https://issues.apache.org/jira/browse/PARQUET-246) 
> was fixed in writers since parquet-mr-1.8. This fix also added a 
> `checkDeltaByteArrayProblem` method in readers, that runs over all columns 
> and checks for this problem in older files. 
> This now triggers an unrelated exception when reading encrypted files, in the 
> following situation: trying to read an unencrypted column, without having 
> keys for encrypted columns (see 
> https://issues.apache.org/jira/browse/PARQUET-2193). This happens in Spark, 
> with nested columns (files with regular columns are ok).
> Possible solution: don't call the `checkDeltaByteArrayProblem` method for 
> encrypted files - because these files can be written only with 
> parquet-mr-1.12 and newer, where the delta encoding problem is already fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)