[jira] [Updated] (PARQUET-2120) parquet-cli dictionary command fails on pages without dictionary encoding

2022-05-19 Thread Gidon Gershinsky (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gidon Gershinsky updated PARQUET-2120:
--
Fix Version/s: 1.12.3

> parquet-cli dictionary command fails on pages without dictionary encoding
> -
>
> Key: PARQUET-2120
> URL: https://issues.apache.org/jira/browse/PARQUET-2120
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Affects Versions: 1.12.2
>Reporter: Willi Raschkowski
>Priority: Minor
> Fix For: 1.12.3
>
>
> parquet-cli's {{dictionary}} command fails with an NPE if a page does not 
> have dictionary encoding:
> {code}
> $ parquet dictionary --column col a-b-c.snappy.parquet
> Unknown error
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" 
> is null
>   at 
> org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
>   at org.apache.parquet.cli.Main.run(Main.java:155)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.parquet.cli.Main.main(Main.java:185)
> $ parquet meta a-b-c.snappy.parquet  
> ...
> Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B
> 
>  type  encodings count avg size   nulls   min / max
> col  BINARYS   _ 1 46.00 B0   "a" / "a"
> Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B
> 
>  type  encodings count avg size   nulls   min / max
> col  BINARYS _ R 200   0.34 B 0   "b" / "c"
> {code}
> (Note the missing {{R}} / dictionary encoding on that first page.)
> Someone familiar with Parquet might guess from the NPE that there's no 
> dictionary encoding. But for files that mix pages with and without dictionary 
> encoding (like above), the command will fail before getting to pages that 
> actually have dictionaries.
> The problem is that [this 
> line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76]
>  assumes {{readDictionaryPage}} always returns a page and doesn't handle when 
> it does not, i.e. when it returns {{null}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (PARQUET-2120) parquet-cli dictionary command fails on pages without dictionary encoding

2022-02-12 Thread Willi Raschkowski (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated PARQUET-2120:
---
Description: 
parquet-cli's {{dictionary}} command fails with an NPE if a page does not have 
dictionary encoding:

{code}
$ parquet dictionary --column col a-b-c.snappy.parquet
Unknown error
java.lang.NullPointerException: Cannot invoke 
"org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is 
null
at 
org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
at org.apache.parquet.cli.Main.run(Main.java:155)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.parquet.cli.Main.main(Main.java:185)

$ parquet meta a-b-c.snappy.parquet  
...
Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B

 type  encodings count avg size   nulls   min / max
col  BINARYS   _ 1 46.00 B0   "a" / "a"

Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B

 type  encodings count avg size   nulls   min / max
col  BINARYS _ R 200   0.34 B 0   "b" / "c"
{code}
(Note the missing {{R}} / dictionary encoding on that first page.)

Someone familiar with Parquet might guess from the NPE that there's no 
dictionary encoding. But for files that mix pages with and without dictionary 
encoding (like above), the command will fail before getting to pages that 
actually have dictionaries.

The problem is that [this 
line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76]
 assumes {{readDictionaryPage}} always returns a page and doesn't handle when 
it does not, i.e. when it returns {{null}}.

  was:
parquet-cli's {{dictionary}} command fails with an NPE if a page does not have 
dictionary encoding:

{code}
$ parquet dictionary --column col a-b-c.snappy.parquet
Unknown error
java.lang.NullPointerException: Cannot invoke 
"org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" is 
null
at 
org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
at org.apache.parquet.cli.Main.run(Main.java:155)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.parquet.cli.Main.main(Main.java:185)

$ parquet meta a-b-c.snappy.parquet  
...
Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B

 type  encodings count avg size   nulls   min / max
col  BINARYS   _ 1 46.00 B0   "a" / "a"

Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B

 type  encodings count avg size   nulls   min / max
col  BINARYS _ R 200   0.34 B 0   "b" / "c"
{code}
(Note the missing {{R}} / dictionary encoding on that first page.)

The problem is that [this 
line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76]
 assumes {{readDictionaryPage}} always returns a page and doesn't handle when 
it does not, i.e. when it returns {{null}}.


> parquet-cli dictionary command fails on pages without dictionary encoding
> -
>
> Key: PARQUET-2120
> URL: https://issues.apache.org/jira/browse/PARQUET-2120
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Affects Versions: 1.12.2
>Reporter: Willi Raschkowski
>Priority: Minor
>
> parquet-cli's {{dictionary}} command fails with an NPE if a page does not 
> have dictionary encoding:
> {code}
> $ parquet dictionary --column col a-b-c.snappy.parquet
> Unknown error
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" 
> is null
>   at 
> org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
>   at org.apache.parquet.cli.Main.run(Main.java:155)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.parquet.cli.Main.main(Main.java:185)
> $ parquet meta a-b-c.snappy.parquet  
> ...
> Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B
> 

[jira] [Updated] (PARQUET-2120) parquet-cli dictionary command fails on pages without dictionary encoding

2022-02-12 Thread Willi Raschkowski (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Willi Raschkowski updated PARQUET-2120:
---
Summary: parquet-cli dictionary command fails on pages without dictionary 
encoding  (was: parquet-cli dictionary fails on pages without dictionary 
encoding)

> parquet-cli dictionary command fails on pages without dictionary encoding
> -
>
> Key: PARQUET-2120
> URL: https://issues.apache.org/jira/browse/PARQUET-2120
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cli
>Affects Versions: 1.12.2
>Reporter: Willi Raschkowski
>Priority: Minor
>
> parquet-cli's {{dictionary}} command fails with an NPE if a page does not 
> have dictionary encoding:
> {code}
> $ parquet dictionary --column col a-b-c.snappy.parquet
> Unknown error
> java.lang.NullPointerException: Cannot invoke 
> "org.apache.parquet.column.page.DictionaryPage.getEncoding()" because "page" 
> is null
>   at 
> org.apache.parquet.cli.commands.ShowDictionaryCommand.run(ShowDictionaryCommand.java:78)
>   at org.apache.parquet.cli.Main.run(Main.java:155)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.parquet.cli.Main.main(Main.java:185)
> $ parquet meta a-b-c.snappy.parquet  
> ...
> Row group 0:  count: 1  46.00 B records  start: 4  total: 46 B
> 
>  type  encodings count avg size   nulls   min / max
> col  BINARYS   _ 1 46.00 B0   "a" / "a"
> Row group 1:  count: 200  0.34 B records  start: 50  total: 69 B
> 
>  type  encodings count avg size   nulls   min / max
> col  BINARYS _ R 200   0.34 B 0   "b" / "c"
> {code}
> (Note the missing {{R}} / dictionary encoding on that first page.)
> The problem is that [this 
> line|https://github.com/apache/parquet-mr/blob/300200eb72b9f16df36d9a68cf762683234aeb08/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ShowDictionaryCommand.java#L76]
>  assumes {{readDictionaryPage}} always returns a page and doesn't handle when 
> it does not, i.e. when it returns {{null}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)