[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14351374#comment-14351374
]
Alex Levenson commented on PARQUET-41:
--------------------------------------
If the data for the bloom filter is going to be stored in the parquet-format
thrift statistics schema, I think it should be well a well defined format, that
is not specific to java (for example, the format should not be whatever comes
out of an ObjectOutputStream).
I think it's fine that the two runtimes support different features, but so far
we have maintained binary compatibility between the two.
Even if we didn't care about interoperability, we still need a well defined
binary format for the bloom filter that is durable over time, unlike plain
java.io serialization.
> Add bloom filters to parquet statistics
> ---------------------------------------
>
> Key: PARQUET-41
> URL: https://issues.apache.org/jira/browse/PARQUET-41
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-mr
> Reporter: Alex Levenson
> Assignee: ferdinand xu
> Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter.
> This could be very useful in filtering entire row groups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)