[ 
https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17533755#comment-17533755
 ] 

Michael Olschimke commented on IMPALA-801:
------------------------------------------

The virtual file name is the best option to forward the technical timeline in a 
data warehouse scenario (e.g. the "LoadDateTS" in a hybrid Data Vault 
architecture). Most other systems (Snowflake, Apache Drill, etc.) support this 
virtual column, which can then be used to extract the timestamp of the data 
delivery from the fully qualified file name using a regex.

Would love to see this on Impala.

> Add function or virtual column for file name
> --------------------------------------------
>
>                 Key: IMPALA-801
>                 URL: https://issues.apache.org/jira/browse/IMPALA-801
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Catalog
>    Affects Versions: Impala 1.2.3
>            Reporter: Udai Kiran Potluri
>            Assignee: Zoltán Borók-Nagy
>            Priority: Minor
>              Labels: built-in-function, ramp-up
>
> Hive can list the data files in a table. For eg the following query lists all 
> the data files for the table or partition:
> {noformat}
> select INPUT__FILE__NAME, count(*) from <table_name> where dt='20140210' 
> group by INPUT__FILE__NAME;
> {noformat}
> This has two advantages over the existing "show files" functionality:
> * The output can be used in arbitrary SQL statements.
> * You can see which record came from which file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to