[
https://issues.apache.org/jira/browse/PARQUET-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312865#comment-14312865
]
Alex Levenson commented on PARQUET-36:
--------------------------------------
in() and notIn() sound useful -- one easy way to benchmark might be to create a
UserDefinedPredicate that does this, see
https://github.com/apache/incubator-parquet-mr/pull/73
specifically:
https://github.com/apache/incubator-parquet-mr/pull/73/files#diff-bfffec155b40f1bbaf730a59a7e0a506R169
for an example. That PR should be merged soon, but you can base off of that
branch. That way you won't have to change any parquet APIs you can just
implement a UDP backed by a fastutils set to get a sense for the performance
improvement.
Now that I think of it though, the UDP interface will box every primitive value
when it visits it, so maybe not a great way to compare performance.
> FilteringPrimitiveConverter should support dictionaries
> -------------------------------------------------------
>
> Key: PARQUET-36
> URL: https://issues.apache.org/jira/browse/PARQUET-36
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Alex Levenson
> Priority: Minor
> Labels: filter2
>
> If the delegated PrimitiveConverter supports dictionaries, then
> FilteringPrimitiveConverter should too.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)