Pavan Lanka created ORC-1027:
--------------------------------
Summary: Filter processing to allow filter injections that cannot
be represented via SArgs
Key: ORC-1027
URL: https://issues.apache.org/jira/browse/ORC-1027
Project: ORC
Issue Type: Improvement
Components: Java
Affects Versions: 1.7.0, 1.8.0
Reporter: Pavan Lanka
Assignee: Pavan Lanka
Currently in the ORCRecordReader the filter logic that perform LazyIO receives
the following inputs:
* SearchArgument as passed by the client using
`Reader.Options.getSearchArgument`
* Input filter as passed by the client using `Reader.Options.getFilterCallback`
The SearchArgument is particularly convenient in allowing for easy integration
with the existing engines such as Spark without necessitating any code changes
on the engine. However this push down is limited to what can be represented via
SearchArguments as an example if we take any predicate that uses a function
this cannot be pushed down.
{quote}SELECT * FROM table WHERE lower(f1) IN ... OR f2 IN ... OR f3 IN ...
{quote}
For the above query none of the filters are pushed down to ORC from the engine
as we have no means for representing Functions and the use of OR to combine the
multiple predicates.
An additional input mechanism is requested for supplying filters that is
plugable without requiring a change in the clients directly. We are proposing
the use of **ServiceLoader** to dynamically determine the desired filters for a
given fully qualified file path.
This filter if determined is applied as an AND in conjunction with the other
available filters. It is understood that the plugin filter cannot differentiate
multiple aliases for the same table.
This generic capability will allow us to represent complex filters that
currently cannot be pushed down to the storage layer from the existing engines
allowing us to reap the benefits of LazyIO in many cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)