[ https://issues.apache.org/jira/browse/PARQUET-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031399#comment-17031399 ]
Gabor Szadovszky edited comment on PARQUET-1787 at 2/6/20 9:26 AM: ------------------------------------------------------------------- I'm working on a general concept of allowing configuration to be set for specific columns. See PARQUET-1784 for details. What do you think of having the mentioned configuration as follows? {code:java} conf.set("parquet.bloom.filter.enabled", false); // Might not be required as this is the default conf.set("parquet.bloom.filter.enabled#content", true); // Might not be necessary as by setting the expected ndv you explicitly sets this one conf.set("parquet.bloom.filter.enabled#line", true); // Might not be necessary as by setting the expected ndv you explicitly sets this one conf.set("parquet.bloom.filter.expected.ndv#content", 1000); conf.set("parquet.bloom.filter.expected.ndv#line", 200); {code} This might require more writing but more clear and less error prone. was (Author: gszadovszky): I'm working on a general concept of allowing configuration to be set for specific columns. See PARQUET-1784 for details. What do you think of having the mentioned configuration as follows? {code:java} conf.set("parquet.bloom.filter.enabled", false); // Might not be required as this is the default conf.set("parquet.bloom.filter.enabled#content", true); // Might not be necessary as by setting the expected ndv you explicitly sets it conf.set("parquet.bloom.filter.enabled#line", true); // Might not be necessary as by setting the expected ndv you explicitly sets it conf.set("parquet.bloom.filter.expected.ndv#content", 1000); conf.set("parquet.bloom.filter.expected.ndv#line", 200); {code} This might require more writing but more clear and less error prone. > Expected distinct numbers is not parsed correctly > ------------------------------------------------- > > Key: PARQUET-1787 > URL: https://issues.apache.org/jira/browse/PARQUET-1787 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Reporter: Walid Gara > Priority: Critical > Labels: pull-request-available > > In the bloom filter feature, when I pass the expected distinct numbers as > below, I got null values instead of 1000 and 200. > {code:java} > import org.apache.hadoop.conf.Configuration; > Configuration conf = new Configuration(); > conf.set("parquet.bloom.filter.column.names", "content,line"); > conf.set("parquet.bloom.filter.expected.ndv","1000,200"); > {code} > > The issue is coming from getting the system property of expected distinct > numbers through > [Long.getLong(expectedNDVs[i])|https://github.com/apache/parquet-mr/blob/a737141a571e3cb6cee2c252dc4406e26e6c1177/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetOutputFormat.java#L251]. > > It's possible to fix it by parsing the string with > Long.parseLong(expectedNDVs[i]). > -- This message was sent by Atlassian Jira (v8.3.4#803005)