[jira] [Comment Edited] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-22 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083236#comment-17083236
 ] 

David Mollitor edited comment on HIVE-23176 at 4/22/20, 1:54 PM:
-

[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
 * If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
 * This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way. It should not be reflected in the 
actual grammar of the SQL parser. To do implement such a feature, it would make 
sense that it be:

(EDIT: based on discussions)
 * Extends the standard SQL grammar instead of overloading the existing


was (Author: belugabehr):
[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
 * If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
 * This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way. It should not be reflected in the 
actual grammar of the SQL parser. To do implement such a feature, it would make 
sense that it be:
 * Not part of the grammar
 * Configurable (enabled/disabled) for interpreting the literal object 
identifiers supplied in the SQL statement in the Java parser code
 * Applies only to back ticked object identifiers that are ASCII-only

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: backwards-incompatible
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23176) Remove SELECT REGEX Column Feature

2020-04-14 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083236#comment-17083236
 ] 

David Mollitor edited comment on HIVE-23176 at 4/14/20, 1:42 PM:
-

[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
 * If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
 * This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way. It should not be reflected in the 
actual grammar of the SQL parser. To do implement such a feature, it would make 
sense that it be:
 * Not part of the grammar
 * Configurable (enabled/disabled) for interpreting the literal object 
identifiers supplied in the SQL statement in the Java parser code
 * Applies only to back ticked object identifiers that are ASCII-only


was (Author: belugabehr):
[~kgyrtkirk] Thanks for the feedback.

This feature is not standard.

 I discussed the motivation here:

[http://mail-archives.apache.org/mod_mbox/hive-dev/202004.mbox/%3CCAPCi2CmUSVUPkMRgxUQBs6QFosj4Yjr7w51n0_teAqBcZvZHSw%40mail.gmail.com%3E]

There are two primary concerns:
* If Hive is going to support UTF-8 in the same way other major vendors do, 
then there are almost no restrictions to what characters can be in a object 
identifier, so it is not possible to simply "detect" and is therefore ambiguous 
if a user wanted to use a Regex or a complex table name.
* This feature accidentally added a bunch of weird edge cases where object 
identifier parsing takes different code paths

This feature could be interesting, though since it's not a SQL standard, it's a 
bit of a Hive-only shortcut which can cause interoperability problems, but it 
is not currently implemented in a great way.  It should not be reflected in the 
actual grammar of the SQL parser.  To do implement such a feature, it would 
make sense that it be:

* Not part of the grammar
* Configurable (enabled/disabled)
* Applies only to back ticked object identifiers that are ASCII-only

> Remove SELECT REGEX Column Feature
> --
>
> Key: HIVE-23176
> URL: https://issues.apache.org/jira/browse/HIVE-23176
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Attachments: HIVE-23176.1.patch, HIVE-23176.2.patch, 
> HIVE-23176.3.patch, HIVE-23176.4.patch, HIVE-23176.4.patch, HIVE-23176.4.patch
>
>
> Remove the Hive feature: REGEX Column.
>  
> Hive has this interesting feature for doing REGEX to SELECT multiple columns. 
>  This needs to go.  It is not SQL standard and as currently implemented, it 
> is impossible to determine if a column identifier is a REGEX or the actual 
> name of the column.  If a column name is enclosed in back ticks then any 
> UTF-8 character is a valid table name.
>  
> [https://dev.mysql.com/doc/refman/8.0/en/identifiers.html]
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)