[ https://issues.apache.org/jira/browse/SPARK-24469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503586#comment-16503586 ]
Eric Maynard commented on SPARK-24469: -------------------------------------- bq. SELECT UPPER(text)....GROUP BY UPPER(text) bq. introduces invalid values into the output set Can you elaborate on this? > Support collations in Spark SQL > ------------------------------- > > Key: SPARK-24469 > URL: https://issues.apache.org/jira/browse/SPARK-24469 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.3.0 > Reporter: Alexander Shkapsky > Priority: Major > > One of our use cases is to support case-insensitive comparison in operations, > including aggregation and text comparison filters. Another use case is to > sort via collator. Support for collations throughout the query processor > appear to be the proper way to support these needs. > Language-based worked arounds (for the aggregation case) are insufficient: > # SELECT UPPER(text)....GROUP BY UPPER(text) > introduces invalid values into the output set > # SELECT MIN(text)...GROUP BY UPPER(text) > results in poor performance in our case, in part due to use of sort-based > aggregate > Examples of collation support in RDBMS: > * [PostgreSQL|https://www.postgresql.org/docs/10/static/collation.html] > * [MySQL|https://dev.mysql.com/doc/refman/8.0/en/charset.html] > * > [Oracle|https://docs.oracle.com/en/database/oracle/oracle-database/18/nlspg/linguistic-sorting-and-matching.html] > * [SQL > Server|https://docs.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017] > * > [DB2|https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5.0/com.ibm.db2.luw.admin.nls.doc/com.ibm.db2.luw.admin.nls.doc-gentopic2.html] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org