[jira] [Commented] (HIVE-3745) Hive does improper "=" based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793658#comment-13793658 ] N Campbell commented on HIVE-3745: -- The ISO-SQL standard is very clear as to what a vendor may choose to do re blank padding semantics. Similarly, how operations such as min, max, distinct etc operate on a variable length character type. Persons simply comparing to another RDBMS need to compare what ISO states, where it allows 'vendor implementation' and to see what a given vendor claims. For example, if you were to use Postgres and other vendors derived from it. You will find various differences with respect to length( char (n) ) vs varchar(n) group by min distinct/union To some persons trailing spaces are of no interest and they may assume that one general string type will 'ignore' spaces. Others may state to their business application that trailing spaces are significant. That is distinct from what a given standard states or perhaps what a vendor chooses to implement irrespective of any given standard. It would help either way if the Hive QL documentation could be improved to state intent of a given construct/feature. > Hive does improper "=" based string comparisons for strings with trailing > whitespaces > - > > Key: HIVE-3745 > URL: https://issues.apache.org/jira/browse/HIVE-3745 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.9.0 >Reporter: Harsh J >Assignee: Kevin Wilfong > > Compared to other systems such as DB2, MySQL, etc., which disregard trailing > whitespaces in a string used when comparing two strings with the "{{=}}" > relational operator, Hive does not do this. > For example, note the following line from the MySQL manual: > http://dev.mysql.com/doc/refman/5.1/en/char.html > {quote} > All MySQL collations are of type PADSPACE. This means that all CHAR and > VARCHAR values in MySQL are compared without regard to any trailing spaces. > {quote} > Hive still is whitespace sensitive and regards trailing spaces of a string as > worthy elements when comparing. Ideally {{LIKE}} should consider this > strongly, but {{=}} should not. > Is there a specific reason behind this difference of implementation in Hive's > SQL? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HIVE-3745) Hive does improper "=" based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766654#comment-13766654 ] Eugene Koifman commented on HIVE-3745: -- Making this configurable may be difficult: suppose you have a join with ON clause comparing 2 strings and the join is a hash join. This setting would affect how hash table is built/probed (depending on how one chooses to implement this) and any other operator that relies on hashing. Predicate pushdown and index building/usage could be affected. I've see DBs that ignore trailing white space and some that do not. I've also seen some that are case sensitive and some that are not. (Unfortunately I don't have a complete list, SQL server is insensitive, for example) > Hive does improper "=" based string comparisons for strings with trailing > whitespaces > - > > Key: HIVE-3745 > URL: https://issues.apache.org/jira/browse/HIVE-3745 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.9.0 >Reporter: Harsh J >Assignee: Kevin Wilfong > > Compared to other systems such as DB2, MySQL, etc., which disregard trailing > whitespaces in a string used when comparing two strings with the "{{=}}" > relational operator, Hive does not do this. > For example, note the following line from the MySQL manual: > http://dev.mysql.com/doc/refman/5.1/en/char.html > {quote} > All MySQL collations are of type PADSPACE. This means that all CHAR and > VARCHAR values in MySQL are compared without regard to any trailing spaces. > {quote} > Hive still is whitespace sensitive and regards trailing spaces of a string as > worthy elements when comparing. Ideally {{LIKE}} should consider this > strongly, but {{=}} should not. > Is there a specific reason behind this difference of implementation in Hive's > SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3745) Hive does improper "=" based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13766629#comment-13766629 ] Dave Winters commented on HIVE-3745: This issue is causing real-world problems. We hit this with our product in building a SQL-H connector. Facebook has patched this internally. Why is it not out in the community or a Hive conf parm that is settable? > Hive does improper "=" based string comparisons for strings with trailing > whitespaces > - > > Key: HIVE-3745 > URL: https://issues.apache.org/jira/browse/HIVE-3745 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.9.0 >Reporter: Harsh J >Assignee: Gang Tim Liu > > Compared to other systems such as DB2, MySQL, etc., which disregard trailing > whitespaces in a string used when comparing two strings with the "{{=}}" > relational operator, Hive does not do this. > For example, note the following line from the MySQL manual: > http://dev.mysql.com/doc/refman/5.1/en/char.html > {quote} > All MySQL collations are of type PADSPACE. This means that all CHAR and > VARCHAR values in MySQL are compared without regard to any trailing spaces. > {quote} > Hive still is whitespace sensitive and regards trailing spaces of a string as > worthy elements when comparing. Ideally {{LIKE}} should consider this > strongly, but {{=}} should not. > Is there a specific reason behind this difference of implementation in Hive's > SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3745) Hive does improper "=" based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706519#comment-13706519 ] Jason Dere commented on HIVE-3745: -- Would it make more sense to support the SQL comparison semantics using new char data types, so that we don't break existing behavior for strings? I've created HIVE-4844. > Hive does improper "=" based string comparisons for strings with trailing > whitespaces > - > > Key: HIVE-3745 > URL: https://issues.apache.org/jira/browse/HIVE-3745 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.9.0 >Reporter: Harsh J >Assignee: Gang Tim Liu > > Compared to other systems such as DB2, MySQL, etc., which disregard trailing > whitespaces in a string used when comparing two strings with the "{{=}}" > relational operator, Hive does not do this. > For example, note the following line from the MySQL manual: > http://dev.mysql.com/doc/refman/5.1/en/char.html > {quote} > All MySQL collations are of type PADSPACE. This means that all CHAR and > VARCHAR values in MySQL are compared without regard to any trailing spaces. > {quote} > Hive still is whitespace sensitive and regards trailing spaces of a string as > worthy elements when comparing. Ideally {{LIKE}} should consider this > strongly, but {{=}} should not. > Is there a specific reason behind this difference of implementation in Hive's > SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3745) Hive does improper "=" based string comparisons for strings with trailing whitespaces
[ https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510258#comment-13510258 ] Harsh J commented on HIVE-3745: --- My colleague, [~esteban], pointed out that the SQL92 standard (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt, 8.2 ) qualifies this thought and that Hive does have a bug. > Hive does improper "=" based string comparisons for strings with trailing > whitespaces > - > > Key: HIVE-3745 > URL: https://issues.apache.org/jira/browse/HIVE-3745 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.9.0 >Reporter: Harsh J > > Compared to other systems such as DB2, MySQL, etc., which disregard trailing > whitespaces in a string used when comparing two strings with the "{{=}}" > relational operator, Hive does not do this. > For example, note the following line from the MySQL manual: > http://dev.mysql.com/doc/refman/5.1/en/char.html > {quote} > All MySQL collations are of type PADSPACE. This means that all CHAR and > VARCHAR values in MySQL are compared without regard to any trailing spaces. > {quote} > Hive still is whitespace sensitive and regards trailing spaces of a string as > worthy elements when comparing. Ideally {{LIKE}} should consider this > strongly, but {{=}} should not. > Is there a specific reason behind this difference of implementation in Hive's > SQL? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira