[jira] [Commented] (IMPALA-801) Add function or virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533755#comment-17533755 ] Michael Olschimke commented on IMPALA-801: -- The virtual file name is the best option to forward the technical timeline in a data warehouse scenario (e.g. the "LoadDateTS" in a hybrid Data Vault architecture). Most other systems (Snowflake, Apache Drill, etc.) support this virtual column, which can then be used to extract the timestamp of the data delivery from the fully qualified file name using a regex. Would love to see this on Impala. > Add function or virtual column for file name > > > Key: IMPALA-801 > URL: https://issues.apache.org/jira/browse/IMPALA-801 > Project: IMPALA > Issue Type: New Feature > Components: Catalog >Affects Versions: Impala 1.2.3 >Reporter: Udai Kiran Potluri >Assignee: Zoltán Borók-Nagy >Priority: Minor > Labels: built-in-function, ramp-up > > Hive can list the data files in a table. For eg the following query lists all > the data files for the table or partition: > {noformat} > select INPUT__FILE__NAME, count(*) from where dt='20140210' > group by INPUT__FILE__NAME; > {noformat} > This has two advantages over the existing "show files" functionality: > * The output can be used in arbitrary SQL statements. > * You can see which record came from which file. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-478) Create a Hash builtin function
[ https://issues.apache.org/jira/browse/IMPALA-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844990#comment-16844990 ] Michael Olschimke commented on IMPALA-478: -- Hi Tim, please get in touch with me at molschi...@scalefree.com > Create a Hash builtin function > -- > > Key: IMPALA-478 > URL: https://issues.apache.org/jira/browse/IMPALA-478 > Project: IMPALA > Issue Type: Task >Affects Versions: Impala 1.0 >Reporter: Nong Li >Assignee: Matthew Jacobs >Priority: Minor > Fix For: Product Backlog > > > It would be good to add a Hash() builtin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-478) Create a Hash builtin function
[ https://issues.apache.org/jira/browse/IMPALA-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843986#comment-16843986 ] Michael Olschimke commented on IMPALA-478: -- Hi Nong, Matthew, we have implemented such functionality in a Impala UDF using C++. Please take a look at [https://github.com/ScalefreeCOM/impala-crypto-udf]. Essentially a wrapper around [https://cryptopp.com/]. The project came out of the need to implement a GDPR compliant data lake for a customer in a regulated industry on a Cloudera cluster. Commonly used hash and encryption functions needed for this purpose are implemented already (MD5, SHA, AES). Our goal is to implement most if not all functionality from the CryptoPP UDF in the near future (in 2019). Current milestone is on the hash and encryption functions. Hope this helps Mike > Create a Hash builtin function > -- > > Key: IMPALA-478 > URL: https://issues.apache.org/jira/browse/IMPALA-478 > Project: IMPALA > Issue Type: Task >Affects Versions: Impala 1.0 >Reporter: Nong Li >Assignee: Matthew Jacobs >Priority: Minor > Fix For: Product Backlog > > > It would be good to add a Hash() builtin. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org