mvzink commented on code in PR #1735:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1966627492
##########
tests/sqlparser_mysql.rs:
##########
@@ -2530,6 +2530,16 @@ fn parse_rlike_and_regexp() {
}
}
+#[test]
+fn parse_like_with_escape() {
+ mysql().verified_only_select(r#"SELECT * FROM customer WHERE name LIKE
'a\%c'"#);
Review Comment:
Yeah, it took me a bit to realize what was going on when I encountered this.
I'll try to explain it better in comments and commit messages.
These tests demonstrate that the backslash is not being stripped. Here is
this same test without the fix:
```
thread 'parse_like_with_escape' panicked at src/test_utils.rs:164:13:
assertion `left == right` failed
left: "SELECT * FROM customer WHERE name LIKE 'a\\%c'"
right: "SELECT * FROM customer WHERE name LIKE 'a%c'"
```
The final 2 asserts on the big "Testing:" string in the
`parse_escaped_string_with_unescape` test in `tests/sqlparser_common.rs` may
make it bit clearer than this mysql-only test, because you can see the 2
different escaping behaviors side by side.
In the current behavior (which is correct for all escaping dialects except
MySQL afaik), `'\%'` and `'\_'` are turned into`'%'` and `'_'`. This is the
same as the 'default' escape rule: if there's no escaping rule for a given
character, the backslash is simply stripped and the escaped character left in
place. This 'default' is visible in `parse_escaped_string_with_unescape` with
`'\h \ '` becoming `'h '`: there is no specific escape rule for `h` or ` `.
The specific (and weird) escape rule that MySQL has for `%` and `_` is that
they are left alone but the backslash is *not* stripped. The reason for this is
so that they can be escaped in `LIKE` patterns without double-escaping them
like you have to do in, e.g. Snowflake (as described in the Snowflake docs and
the comment on `Dialect::ignores_like_wildcard_escapes`). But MySQL chose to
make it a special case not just for LIKE patterns, but for all string literals.
MySQL:
```
mysql> select '\_', hex('\\'), hex('_'), hex('\_'), hex('h'), hex('\h');
+----+-----------+----------+-----------+----------+-----------+
| \_ | hex('\\') | hex('_') | hex('\_') | hex('h') | hex('\h') |
+----+-----------+----------+-----------+----------+-----------+
| \_ | 5C | 5F | 5C5F | 68 | 68 |
+----+-----------+----------+-----------+----------+-----------+
1 row in set (0.00 sec)
mysql> select '_' LIKE '\%', '_' LIKE '\_', '\_' LIKE '\_';
+---------------+---------------+----------------+
| '_' LIKE '\%' | '_' LIKE '\_' | '\_' LIKE '\_' |
+---------------+---------------+----------------+
| 0 | 1 | 0 |
+---------------+---------------+----------------+
1 row in set (0.00 sec)
```
(Note in the last LIKE expression, there is nothing to match the unstripped
backslash character in the scrutinee, so it doesn't match.)
As far as I can tell from the Snowflake docs (BigQuery and Clickhouse are
less clear), to express the equivalent LIKE expressions you would have to
double-escape the backslashes, because this special case for `_` and `%` don't
exist:
```
select '_' LIKE '\\%', '_' LIKE '\\_', '\\_' LIKE '\\_';
```
(And, of course, to make things more fun, double-escaping also works in
MySQL...)
Like I said, I'll see what I can do to explain this better in
comments/tests, but let me know if this explanation makes sense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]