[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.
[ https://issues-test.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265591#comment-16265591 ] Jorge Machado commented on PARQUET-1061: Hi guys, I'm trying to read a parquet file in parallel outside of hadoop. Spark is using the class ParquetInputSplit. I would like to use it to but I'm wondering how to get the rowGroupOffsets[] ? is this the start position from every single block ? thanks > parquet dictionary filter does not work. > > > Key: PARQUET-1061 > URL: https://issues-test.apache.org/jira/browse/PARQUET-1061 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.9.0 > Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master >Reporter: Junjie Chen >Priority: Major > > When perform selective query, we observed that dictionary filter was not > applied. Please see following code snippet. > if (rowGroupOffsets != null) { > // verify a row group was found for each offset > List blocks = reader.getFooter().getBlocks(); > if (blocks.size() != rowGroupOffsets.length) { > throw new IllegalStateException( > "All of the offsets in the split should be found in the file." > + " expected: " + Arrays.toString(rowGroupOffsets) > + " found: " + blocks); > } > } else { > *Why apply data filter when row group offset equal to null? * > // apply data filters > reader.filterRowGroups(getFilter(configuration)); > } > I can enable filter after move else block code into second layer if. -- This message was sent by Atlassian JIRA (v7.6.0#76001)
[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.
[ https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111918#comment-16111918 ] Junjie Chen commented on PARQUET-1061: -- Hi [~blue_impala_48d6] Could you please try it out? > parquet dictionary filter does not work. > > > Key: PARQUET-1061 > URL: https://issues.apache.org/jira/browse/PARQUET-1061 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.9.0 > Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master >Reporter: Junjie Chen > > When perform selective query, we observed that dictionary filter was not > applied. Please see following code snippet. > if (rowGroupOffsets != null) { > // verify a row group was found for each offset > List blocks = reader.getFooter().getBlocks(); > if (blocks.size() != rowGroupOffsets.length) { > throw new IllegalStateException( > "All of the offsets in the split should be found in the file." > + " expected: " + Arrays.toString(rowGroupOffsets) > + " found: " + blocks); > } > } else { > *Why apply data filter when row group offset equal to null? * > // apply data filters > reader.filterRowGroups(getFilter(configuration)); > } > I can enable filter after move else block code into second layer if. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.
[ https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091799#comment-16091799 ] Ryan Blue commented on PARQUET-1061: Did you set the property to enable dictionary filtering? https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/filter2/compat/RowGroupFilter.java#L95 > parquet dictionary filter does not work. > > > Key: PARQUET-1061 > URL: https://issues.apache.org/jira/browse/PARQUET-1061 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.9.0 > Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master >Reporter: Junjie Chen > > When perform selective query, we observed that dictionary filter was not > applied. Please see following code snippet. > if (rowGroupOffsets != null) { > // verify a row group was found for each offset > List blocks = reader.getFooter().getBlocks(); > if (blocks.size() != rowGroupOffsets.length) { > throw new IllegalStateException( > "All of the offsets in the split should be found in the file." > + " expected: " + Arrays.toString(rowGroupOffsets) > + " found: " + blocks); > } > } else { > *Why apply data filter when row group offset equal to null? * > // apply data filters > reader.filterRowGroups(getFilter(configuration)); > } > I can enable filter after move else block code into second layer if. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1061) parquet dictionary filter does not work.
[ https://issues.apache.org/jira/browse/PARQUET-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16091310#comment-16091310 ] Junjie Chen commented on PARQUET-1061: -- Hi [~blue_impala_48d6] Could you please help take a look? > parquet dictionary filter does not work. > > > Key: PARQUET-1061 > URL: https://issues.apache.org/jira/browse/PARQUET-1061 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.9.0 > Environment: Hive 2.2.0 + Parquet-mr 1.9.0/master >Reporter: Junjie Chen > > When perform selective query, we observed that dictionary filter was not > applied. Please see following code snippet. > if (rowGroupOffsets != null) { > // verify a row group was found for each offset > List blocks = reader.getFooter().getBlocks(); > if (blocks.size() != rowGroupOffsets.length) { > throw new IllegalStateException( > "All of the offsets in the split should be found in the file." > + " expected: " + Arrays.toString(rowGroupOffsets) > + " found: " + blocks); > } > } else { > *Why apply data filter when row group offset equal to null? * > // apply data filters > reader.filterRowGroups(getFilter(configuration)); > } > I can enable filter after move else block code into second layer if. -- This message was sent by Atlassian JIRA (v6.4.14#64029)