[jira] [Commented] (IMPALA-801) Add function or virtual column for file name

2022-05-09 Thread Michael Olschimke (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533755#comment-17533755
 ] 

Michael Olschimke commented on IMPALA-801:
--

The virtual file name is the best option to forward the technical timeline in a 
data warehouse scenario (e.g. the "LoadDateTS" in a hybrid Data Vault 
architecture). Most other systems (Snowflake, Apache Drill, etc.) support this 
virtual column, which can then be used to extract the timestamp of the data 
delivery from the fully qualified file name using a regex.

Would love to see this on Impala.

> Add function or virtual column for file name
> 
>
> Key: IMPALA-801
> URL: https://issues.apache.org/jira/browse/IMPALA-801
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Affects Versions: Impala 1.2.3
>Reporter: Udai Kiran Potluri
>Assignee: Zoltán Borók-Nagy
>Priority: Minor
>  Labels: built-in-function, ramp-up
>
> Hive can list the data files in a table. For eg the following query lists all 
> the data files for the table or partition:
> {noformat}
> select INPUT__FILE__NAME, count(*) from  where dt='20140210' 
> group by INPUT__FILE__NAME;
> {noformat}
> This has two advantages over the existing "show files" functionality:
> * The output can be used in arbitrary SQL statements.
> * You can see which record came from which file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-478) Create a Hash builtin function

2019-05-21 Thread Michael Olschimke (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844990#comment-16844990
 ] 

Michael Olschimke commented on IMPALA-478:
--

Hi Tim, please get in touch with me at molschi...@scalefree.com

> Create a Hash builtin function
> --
>
> Key: IMPALA-478
> URL: https://issues.apache.org/jira/browse/IMPALA-478
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 1.0
>Reporter: Nong Li
>Assignee: Matthew Jacobs
>Priority: Minor
> Fix For: Product Backlog
>
>
> It would be good to add a Hash() builtin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-478) Create a Hash builtin function

2019-05-20 Thread Michael Olschimke (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843986#comment-16843986
 ] 

Michael Olschimke commented on IMPALA-478:
--

Hi Nong, Matthew,

we have implemented such functionality in a Impala UDF using C++. Please take a 
look at [https://github.com/ScalefreeCOM/impala-crypto-udf]. Essentially a 
wrapper around [https://cryptopp.com/].

The project came out of the need to implement a GDPR compliant data lake for a 
customer in a regulated industry on a Cloudera cluster. Commonly used hash and 
encryption functions needed for this purpose are implemented already (MD5, SHA, 
AES). Our goal is to implement most if not all functionality from the CryptoPP 
UDF in the near future (in 2019). Current milestone is on the hash and 
encryption functions. 

Hope this helps

Mike

> Create a Hash builtin function
> --
>
> Key: IMPALA-478
> URL: https://issues.apache.org/jira/browse/IMPALA-478
> Project: IMPALA
>  Issue Type: Task
>Affects Versions: Impala 1.0
>Reporter: Nong Li
>Assignee: Matthew Jacobs
>Priority: Minor
> Fix For: Product Backlog
>
>
> It would be good to add a Hash() builtin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org