[ https://issues.apache.org/jira/browse/SPARK-29034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-29034: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > String Constants with C-style Escapes > ------------------------------------- > > Key: SPARK-29034 > URL: https://issues.apache.org/jira/browse/SPARK-29034 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.1.0 > Reporter: Yuming Wang > Priority: Major > > PostgreSQL also accepts "escape" string constants, which are an extension to > the SQL standard. An escape string constant is specified by writing the > letter {{E}} (upper or lower case) just before the opening single quote, > e.g., {{E'foo'}}. (When continuing an escape string constant across lines, > write {{E}} only before the first opening quote.) Within an escape string, a > backslash character ({{\}}) begins a C-like _backslash escape_ sequence, in > which the combination of backslash and following character(s) represent a > special byte value, as shown in [Table > 4-1|https://www.postgresql.org/docs/9.3/sql-syntax-lexical.html#SQL-BACKSLASH-TABLE]. > *Table 4-1. Backslash Escape Sequences* > ||Backslash Escape Sequence||Interpretation|| > |{{\b}}|backspace| > |{{\f}}|form feed| > |{{\n}}|newline| > |{{\r}}|carriage return| > |{{\t}}|tab| > |{{\}}{{o}}, {{\}}{{oo}}, {{\}}{{ooo}} ({{o}} = 0 - 7)|octal byte value| > |{{\x}}{{h}}, {{\x}}{{hh}} ({{h}} = 0 - 9, A - F)|hexadecimal byte value| > |{{\u}}{{xxxx}}, {{\U}}{{xxxxxxxx}} ({{x}} = 0 - 9, A - F)|16 or 32-bit > hexadecimal Unicode character value| > Any other character following a backslash is taken literally. Thus, to > include a backslash character, write two backslashes ({{\\}}). Also, a single > quote can be included in an escape string by writing {{\'}}, in addition to > the normal way of {{''}}. > It is your responsibility that the byte sequences you create, especially when > using the octal or hexadecimal escapes, compose valid characters in the > server character set encoding. When the server encoding is UTF-8, then the > Unicode escapes or the alternative Unicode escape syntax, explained in > [Section > 4.1.2.3|https://www.postgresql.org/docs/9.3/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE], > should be used instead. (The alternative would be doing the UTF-8 encoding > by hand and writing out the bytes, which would be very cumbersome.) > The Unicode escape syntax works fully only when the server encoding is > {{UTF8}}. When other server encodings are used, only code points in the ASCII > range (up to {{\u007F}}) can be specified. Both the 4-digit and the 8-digit > form can be used to specify UTF-16 surrogate pairs to compose characters with > code points larger than U+FFFF, although the availability of the 8-digit form > technically makes this unnecessary. (When surrogate pairs are used when the > server encoding is {{UTF8}}, they are first combined into a single code point > that is then encoded in UTF-8.) > > > [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-BACKSLASH-TABLE] > > Example: > {code:sql} > postgres=# SET bytea_output TO escape; > SET > postgres=# SELECT E'Th\\000omas'::bytea; > bytea > ------------ > Th\000omas > (1 row) > postgres=# SELECT 'Th\\000omas'::bytea; > bytea > ------------- > Th\\000omas > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org