[ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726171#comment-16726171
 ] 

Tim Armstrong commented on IMPALA-8011:
---------------------------------------

[~skye] had a prototype patch to add virtual columns like this a few years ago, 
the implementation idea was to treat it similarly to partition key columns and 
add it to the "template tuple" in the scanner.

> Allow filtering on virtual column for file name
> -----------------------------------------------
>
>                 Key: IMPALA-8011
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8011
>             Project: IMPALA
>          Issue Type: Improvement
>            Reporter: Peter Ebert
>            Priority: Major
>              Labels: built-in-function
>
> An additional performance enhancement would be to be able to filter on file 
> names using a virtual column.  It would be somewhat the current optimization 
> of sorting data and skipping files based on metadata, but instead you put 
> something in the file name to indicate it's contents should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> One use would be if you had a daily partition, and you put the timestamp into 
> the file name, then limit the search to only the last hour even though your 
> partition is daily. This then leaves you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to