gustavodemorais commented on code in PR #28111:
URL: https://github.com/apache/flink/pull/28111#discussion_r3188663480
##########
docs/data/sql_functions.yml:
##########
@@ -805,6 +805,22 @@ conversion:
call("TYPEOF", input)
call("TYPEOF", input, force_serializable)
description: Returns the string representation of the input expression's
data type. By default, the returned string is a summary string that might omit
certain details for readability. If force_serializable is set to TRUE, the
string represents a full data type that could be persisted in a catalog. Note
that especially anonymous, inline data types have no serializable string
representation. In this case, NULL is returned.
+ - sql: IS_VALID_UTF8(bytes)
+ table: BYTES.isValidUtf8()
+ description: |
+ Returns `TRUE` if the input is well-formed UTF-8, `FALSE` otherwise.
Specifically rejects: truncated multi-byte sequences (missing continuation
bytes), "overlong" encodings (using more bytes than necessary for the code
point), code points above the Unicode maximum U+10FFFF, and UTF-16 surrogate
values U+D800-U+DFFF (which have no UTF-8 representation). Returns `NULL` if
the input is `NULL`.
+
+ Useful for routing records with invalid UTF-8 to a dead-letter sink:
`WHERE IS_VALID_UTF8(payload)` keeps clean rows; `WHERE NOT
IS_VALID_UTF8(payload)` selects the rejects.
Review Comment:
fixed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]