[ 
https://issues.apache.org/jira/browse/SPARK-29034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29034:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> String Constants with C-style Escapes
> -------------------------------------
>
>                 Key: SPARK-29034
>                 URL: https://issues.apache.org/jira/browse/SPARK-29034
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> PostgreSQL also accepts "escape" string constants, which are an extension to 
> the SQL standard. An escape string constant is specified by writing the 
> letter {{E}} (upper or lower case) just before the opening single quote, 
> e.g., {{E'foo'}}. (When continuing an escape string constant across lines, 
> write {{E}} only before the first opening quote.) Within an escape string, a 
> backslash character ({{\}}) begins a C-like _backslash escape_ sequence, in 
> which the combination of backslash and following character(s) represent a 
> special byte value, as shown in [Table 
> 4-1|https://www.postgresql.org/docs/9.3/sql-syntax-lexical.html#SQL-BACKSLASH-TABLE].
> *Table 4-1. Backslash Escape Sequences*
> ||Backslash Escape Sequence||Interpretation||
> |{{\b}}|backspace|
> |{{\f}}|form feed|
> |{{\n}}|newline|
> |{{\r}}|carriage return|
> |{{\t}}|tab|
> |{{\}}{{o}}, {{\}}{{oo}}, {{\}}{{ooo}} ({{o}} = 0 - 7)|octal byte value|
> |{{\x}}{{h}}, {{\x}}{{hh}} ({{h}} = 0 - 9, A - F)|hexadecimal byte value|
> |{{\u}}{{xxxx}}, {{\U}}{{xxxxxxxx}} ({{x}} = 0 - 9, A - F)|16 or 32-bit 
> hexadecimal Unicode character value|
> Any other character following a backslash is taken literally. Thus, to 
> include a backslash character, write two backslashes ({{\\}}). Also, a single 
> quote can be included in an escape string by writing {{\'}}, in addition to 
> the normal way of {{''}}.
> It is your responsibility that the byte sequences you create, especially when 
> using the octal or hexadecimal escapes, compose valid characters in the 
> server character set encoding. When the server encoding is UTF-8, then the 
> Unicode escapes or the alternative Unicode escape syntax, explained in 
> [Section 
> 4.1.2.3|https://www.postgresql.org/docs/9.3/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE],
>  should be used instead. (The alternative would be doing the UTF-8 encoding 
> by hand and writing out the bytes, which would be very cumbersome.)
> The Unicode escape syntax works fully only when the server encoding is 
> {{UTF8}}. When other server encodings are used, only code points in the ASCII 
> range (up to {{\u007F}}) can be specified. Both the 4-digit and the 8-digit 
> form can be used to specify UTF-16 surrogate pairs to compose characters with 
> code points larger than U+FFFF, although the availability of the 8-digit form 
> technically makes this unnecessary. (When surrogate pairs are used when the 
> server encoding is {{UTF8}}, they are first combined into a single code point 
> that is then encoded in UTF-8.)
>  
>  
> [https://www.postgresql.org/docs/11/sql-syntax-lexical.html#SQL-BACKSLASH-TABLE]
>  
> Example:
> {code:sql}
> postgres=# SET bytea_output TO escape;
> SET
> postgres=# SELECT E'Th\\000omas'::bytea;
>    bytea
> ------------
>  Th\000omas
> (1 row)
> postgres=# SELECT 'Th\\000omas'::bytea;
>     bytea
> -------------
>  Th\\000omas
> (1 row)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to