[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264452#comment-17264452 ] Noah Kawasaki commented on SPARK-17647: --- I can also confirm that this issue is not fully resolved. Like what [~swiegleb] has shown, escape characters are not fully supported. I have tested Spark versions 2.1, 2.2, 2.3, 2.4, and 3.0 and they all experience the issue: {code:java} # These do not return the expected backslash SET spark.sql.parser.escapedStringLiterals=false; SELECT '\\'; > \ (should return \\) SELECT 'hi\hi'; > hihi (should return hi\hi) # These are correctly escaped SELECT '\"'; > " SELECT '\''; > '{code} If I switch this: {code:java} # These now work SET spark.sql.parser.escapedStringLiterals=true; SELECT '\\'; > \\ SELECT 'hi\hi'; > hi\hi # These are now not correctly escaped SELECT '\"'; > \" (should return ") SELECT '\''; > \' (should return ' ){code} So basically we have to choose: SET spark.sql.parser.escapedStringLiterals=false; if we want backslashes correctly escaped but not other special characters SET spark.sql.parser.escapedStringLiterals=true; if we want other special characters correctly escaped but not backslashes > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Major > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877599#comment-16877599 ] Sascha Wiegleb commented on SPARK-17647: I have tested this behavior with spark 2.4.2 and it seems to be that the bug is still there. With: {code:java} spark.sql.parser.escapedStringLiterals=true {code} backslash it is working, but I run into failures with escaping " and ' . {code:java} the escape character is not allowed to precede '\"' {code} I have tested the behavior with the following special characters and their escaping: {code:java} _ % \ ' " {code} So both modes will not cover the full spectrum of escaping special characters. > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Major > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298456#comment-16298456 ] Wenchen Fan commented on SPARK-17647: - {{spark.sql("select '' like '%\\%'").show}} actually is {{select '\\' like '%\%}} as SQL statement, after the java string escaping. > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296234#comment-16296234 ] Takeshi Yamamuro commented on SPARK-17647: -- Probably, is it okay to set spark.sql.parser.escapedStringLiterals to true? > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295992#comment-16295992 ] Takeshi Yamamuro commented on SPARK-17647: -- It seems the master still handle ''%\\%' as `(?s).*\Q%\E` instead of `(?s).*\Q\\E.*` . > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295976#comment-16295976 ] Takeshi Yamamuro commented on SPARK-17647: -- I'm looking into the code and I'll make a follow-up pr. > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295261#comment-16295261 ] Dong Jiang commented on SPARK-17647: Are we sure this issue is resolved, I tested the following on spark-shell 2.2.0 {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25) Type in expressions to have them evaluated. Type :help for more information. scala> spark.sql("select '' like '%\\%'").show +--+ |\ LIKE %\%| +--+ | false| +--+ {code} same in spark-sql {code} spark-sql> select '' like '%\\%'; false Time taken: 2.296 seconds, Fetched 1 row(s) {code} > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15971675#comment-15971675 ] Apache Spark commented on SPARK-17647: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/17663 > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng > Labels: correctness > Fix For: 2.1.1, 2.2.0 > > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736107#comment-15736107 ] Jakob Odersky commented on SPARK-17647: --- I rebased the PR and resolved the conflict. However, there is still the incompatibility issue with the sql ANTLR parser. I talk about it in my last [two comments | https://github.com/apache/spark/pull/15398#issuecomment-255917940 ] and propose a few solutions. Any feedback is welcome! > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734678#comment-15734678 ] Xiangrui Meng commented on SPARK-17647: --- [~r...@databricks.com] [~yhuai] I think this is a critical correctness bug, which should be fixed in 2.1. Thoughts? > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556767#comment-15556767 ] Apache Spark commented on SPARK-17647: -- User 'jodersky' has created a pull request for this issue: https://github.com/apache/spark/pull/15398 > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553517#comment-15553517 ] Jakob Odersky commented on SPARK-17647: --- Xiao pointed me to this issue, I can take a look at it > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553424#comment-15553424 ] Xiao Li commented on SPARK-17647: - Sure, let me check it with my teammates. Thanks! > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553357#comment-15553357 ] Yin Huai commented on SPARK-17647: -- [~smilegator] Anyone from your side has time to take a look at it? > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17647) SQL LIKE does not handle backslashes correctly
[ https://issues.apache.org/jira/browse/SPARK-17647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523623#comment-15523623 ] Xiangrui Meng commented on SPARK-17647: --- Thanks [~joshrosen]! I updated the JIRA description. The LIKE escaping behaviors in MySQL/PostgreSQL are documented here: * MySQL: http://dev.mysql.com/doc/refman/5.7/en/string-comparison-functions.html#operator_like * PostgreSQL: https://www.postgresql.org/docs/8.3/static/functions-matching.html In particular, MySQL: {noformat} Exception: At the end of the pattern string, backslash can be specified as “\\”. At the end of the string, backslash stands for itself because there is nothing following to escape. Suppose that a table contains the following values: {noformat} That explains why MySQL returns true for both `\\` like `` and `\\` like `\\`. > SQL LIKE does not handle backslashes correctly > -- > > Key: SPARK-17647 > URL: https://issues.apache.org/jira/browse/SPARK-17647 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Xiangrui Meng > Labels: correctness > > Try the following in SQL shell: > {code} > select '' like '%\\%'; > {code} > It returned false, which is wrong. > cc: [~yhuai] [~joshrosen] > A false-negative considered previously: > {code} > select '' rlike '.*.*'; > {code} > It returned true, which is correct if we assume that the pattern is treated > as a Java string but not raw string. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org