[GitHub] [iceberg] guilload commented on a change in pull request #1326: Hive: Filter pushdown

GitBox Wed, 19 Aug 2020 16:35:20 -0700


guilload commented on a change in pull request #1326:
URL: https://github.com/apache/iceberg/pull/1326#discussion_r473435656




##########
File path: 
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java
##########
@@ -51,6 +58,17 @@
 
     forwardConfigSettings(job);
 
+    //Convert Hive filter to Iceberg filter
+    String hiveFilter = job.get(TableScanDesc.FILTER_EXPR_CONF_STR);

Review comment:
       I came to the same conclusion when trying to implement projection 
pushdown in the storage handler.
   
   Unfortunately as @cmathiesen stated, the job config is not yet populated 
with the projected columns and the filter expression when the storage handler 
"hooks" such as `configureJobConf` are called. So the right entry points for 
implementing PPD are `getSplits` and  `getRecordReader`.
   
   However, there's another catch. The `JobConf` objects passed in `getSplits` 
and `getRecordReader` are actually not the same and the filter expression set 
in `getSplits` (L#53 in `HiveIcebergInputFormat`)  is no longer available when 
`getRecordReader ` is subsequently called.
   
   Since in the storage handler we don't decompose the filter expression, Hive 
applies the whole thing anyway and this can't be caught in the test suite but 
we need to set the filter expression both in `getSplits` and `getRecordReader` 
to get to a complete PPD implementation.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] guilload commented on a change in pull request #1326: Hive: Filter pushdown

Reply via email to