[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351374#comment-14351374
 ] 

Alex Levenson commented on PARQUET-41:
--------------------------------------

If the data for the bloom filter is going to be stored in the parquet-format 
thrift statistics schema, I think it should be well a well defined format, that 
is not specific to java (for example, the format should not be whatever comes 
out of an ObjectOutputStream).
I think it's fine that the two runtimes support different features, but so far 
we have maintained binary compatibility between the two.

Even if we didn't care about interoperability, we still need a well defined 
binary format for the bloom filter that is durable over time, unlike plain 
java.io serialization.

> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Alex Levenson
>            Assignee: ferdinand xu
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to