[jira] [Commented] (PARQUET-98) filter2 API performance regression

Ryan Blue (JIRA) Tue, 25 Nov 2014 09:45:14 -0800

    [ 
https://issues.apache.org/jira/browse/PARQUET-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224897#comment-14224897
 ]


Ryan Blue commented on PARQUET-98:
----------------------------------

Could you try out the problem data with VisualVM or another profiling tool to 
see if there's anything funny going on? It doesn't look like your code is the 
problem. If anything, I would have guessed the first version runs more slowly 
because it converts a String to Binary each time it runs.

> filter2 API performance regression
> ----------------------------------
>
>                 Key: PARQUET-98
>                 URL: https://issues.apache.org/jira/browse/PARQUET-98
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Viktor Szathmary
>
> The new filter API seems to be much slower (or perhaps I'm using it wrong \:)
> Code using an UnboundRecordFilter:
> {code:java}
> ColumnRecordFilter.column(column,
>     ColumnPredicates.applyFunctionToBinary(
>     input -> Binary.fromString(value).equals(input)));
> {code}
> vs. code using FilterPredicate:
> {code:java}
> eq(binaryColumn(column), Binary.fromString(value));
> {code}
> The latter performs twice as slow on the same Parquet file (built using 
> 1.6.0rc2).
> Note: the reader is constructed using
> {code:java}
> ParquetReader.builder(new ProtoReadSupport().withFilter(filter).build()
> {code}
> The new filter API based approach seems to create a whole lot more garbage 
> (perhaps due to reconstructing all the rows?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-98) filter2 API performance regression

Reply via email to