[
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726171#comment-16726171
]
Tim Armstrong commented on IMPALA-8011:
---------------------------------------
[~skye] had a prototype patch to add virtual columns like this a few years ago,
the implementation idea was to treat it similarly to partition key columns and
add it to the "template tuple" in the scanner.
> Allow filtering on virtual column for file name
> -----------------------------------------------
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Peter Ebert
> Priority: Major
> Labels: built-in-function
>
> An additional performance enhancement would be to be able to filter on file
> names using a virtual column. It would be somewhat the current optimization
> of sorting data and skipping files based on metadata, but instead you put
> something in the file name to indicate it's contents should be filtered.
> For example say you were writing first names and then searching for them,
> during your writing phase you put the first letter of the first name into
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC"
> then when doing a query you could filter based on where INPUT__FILE__NAME
> contains "D" when searching for David and skip reading the file.
> One use would be if you had a daily partition, and you put the timestamp into
> the file name, then limit the search to only the last hour even though your
> partition is daily. This then leaves you the ability to sort by another
> column making searches even faster on both.
>
> This requires IMPALA-801
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]