[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.

2018-01-15 Thread Jorge Machado (JIRATEST)

[ 
https://issues-test.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265591#comment-16265591
 ] 

Jorge Machado commented on PARQUET-1061:


Hi guys, 

I'm trying to read a parquet file in parallel outside of hadoop. Spark is using 
 the class ParquetInputSplit. 

I would like to use it to but I'm wondering how to get the rowGroupOffsets[] ? 
is this the start position from every single block ? 

 

thanks

 

> parquet dictionary filter does not work.
> 
>
> Key: PARQUET-1061
> URL: https://issues-test.apache.org/jira/browse/PARQUET-1061
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.9.0
> Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master
>Reporter: Junjie Chen
>Priority: Major
>
> When perform selective query, we observed that dictionary filter was not 
> applied.  Please see following code snippet. 
> if (rowGroupOffsets != null) {
>   // verify a row group was found for each offset
>   List blocks = reader.getFooter().getBlocks();
>   if (blocks.size() != rowGroupOffsets.length) {
> throw new IllegalStateException(
> "All of the offsets in the split should be found in the file."
> + " expected: " + Arrays.toString(rowGroupOffsets)
> + " found: " + blocks);
>   }
> } else {
> *Why apply data filter when row group offset equal to null? *
>   // apply data filters
>   reader.filterRowGroups(getFilter(configuration));
> }
> I can enable filter after move else block code into second layer if. 



--
This message was sent by Atlassian JIRA
(v7.6.0#76001)


[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.

2017-08-02 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111918#comment-16111918
 ] 

Junjie Chen commented on PARQUET-1061:
--

Hi [~blue_impala_48d6]

Could you please try it out? 

> parquet dictionary filter does not work.
> 
>
> Key: PARQUET-1061
> URL: https://issues.apache.org/jira/browse/PARQUET-1061
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.9.0
> Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master
>Reporter: Junjie Chen
>
> When perform selective query, we observed that dictionary filter was not 
> applied.  Please see following code snippet. 
> if (rowGroupOffsets != null) {
>   // verify a row group was found for each offset
>   List blocks = reader.getFooter().getBlocks();
>   if (blocks.size() != rowGroupOffsets.length) {
> throw new IllegalStateException(
> "All of the offsets in the split should be found in the file."
> + " expected: " + Arrays.toString(rowGroupOffsets)
> + " found: " + blocks);
>   }
> } else {
> *Why apply data filter when row group offset equal to null? *
>   // apply data filters
>   reader.filterRowGroups(getFilter(configuration));
> }
> I can enable filter after move else block code into second layer if. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.

2017-07-18 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091799#comment-16091799
 ] 

Ryan Blue commented on PARQUET-1061:


Did you set the property to enable dictionary filtering? 
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java#L95

> parquet dictionary filter does not work.
> 
>
> Key: PARQUET-1061
> URL: https://issues.apache.org/jira/browse/PARQUET-1061
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.9.0
> Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master
>Reporter: Junjie Chen
>
> When perform selective query, we observed that dictionary filter was not 
> applied.  Please see following code snippet. 
> if (rowGroupOffsets != null) {
>   // verify a row group was found for each offset
>   List blocks = reader.getFooter().getBlocks();
>   if (blocks.size() != rowGroupOffsets.length) {
> throw new IllegalStateException(
> "All of the offsets in the split should be found in the file."
> + " expected: " + Arrays.toString(rowGroupOffsets)
> + " found: " + blocks);
>   }
> } else {
> *Why apply data filter when row group offset equal to null? *
>   // apply data filters
>   reader.filterRowGroups(getFilter(configuration));
> }
> I can enable filter after move else block code into second layer if. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.

2017-07-18 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091310#comment-16091310
 ] 

Junjie Chen commented on PARQUET-1061:
--

Hi [~blue_impala_48d6]
Could you please help take a look?

> parquet dictionary filter does not work.
> 
>
> Key: PARQUET-1061
> URL: https://issues.apache.org/jira/browse/PARQUET-1061
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.9.0
> Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master
>Reporter: Junjie Chen
>
> When perform selective query, we observed that dictionary filter was not 
> applied.  Please see following code snippet. 
> if (rowGroupOffsets != null) {
>   // verify a row group was found for each offset
>   List blocks = reader.getFooter().getBlocks();
>   if (blocks.size() != rowGroupOffsets.length) {
> throw new IllegalStateException(
> "All of the offsets in the split should be found in the file."
> + " expected: " + Arrays.toString(rowGroupOffsets)
> + " found: " + blocks);
>   }
> } else {
> *Why apply data filter when row group offset equal to null? *
>   // apply data filters
>   reader.filterRowGroups(getFilter(configuration));
> }
> I can enable filter after move else block code into second layer if. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)