guilload commented on a change in pull request #1326:
URL: https://github.com/apache/iceberg/pull/1326#discussion_r473435656
##########
File path:
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java
##########
@@ -51,6 +58,17 @@
forwardConfigSettings(job);
+ //Convert Hive filter to Iceberg filter
+ String hiveFilter = job.get(TableScanDesc.FILTER_EXPR_CONF_STR);
Review comment:
I came to the same conclusion when trying to implement projection
pushdown in the storage handler.
Unfortunately as @cmathiesen stated, the job config is not yet populated
with the projected columns and the filter expression when the storage handler
"hooks" such as `configureJobConf` are called. So the right entry points for
implementing PPD are `getSplits` and `getRecordReader`.
However, there's another catch. The `JobConf` objects passed in `getSplits`
and `getRecordReader` are actually not the same and the filter expression set
in `getSplits` (L#53 in `HiveIcebergInputFormat`) is no longer available when
`getRecordReader ` is subsequently called.
Since in the storage handler we don't decompose the filter expression, Hive
applies the whole thing anyway and this can't be caught in the test suite but
we need to set the filter expression both in `getSplits` and `getRecordReader`
to get to a complete PPD implementation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]