gustavodemorais commented on code in PR #28111:
URL: https://github.com/apache/flink/pull/28111#discussion_r3187345932


##########
flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/BaseExpressions.java:
##########
@@ -1493,6 +1495,28 @@ public OutType inetNtoa() {
         return toApiSpecificExpression(unresolvedCall(INET_NTOA, toExpr()));
     }
 
+    /**
+     * Returns {@code true} if the input bytes form a well-formed UTF-8 
sequence, {@code false}
+     * otherwise. Returns {@code null} if the input is {@code null}.
+     *
+     * <p>Specifically rejects: truncated multi-byte sequences (missing 
continuation bytes),
+     * "overlong" encodings (using more bytes than necessary for the code 
point), code points above
+     * the Unicode maximum U+10FFFF, and UTF-16 surrogate values U+D800-U+DFFF 
(which have no UTF-8
+     * representation).
+     */
+    public OutType isValidUtf8() {

Review Comment:
   Hey David, isValidUtf8() doesn't run the check - it just builds a small 
piece of an SQL plan that says "validate UTF-8 here". The actual true/false is 
computed later, on every row, on the cluster. So the method has to return 
something you can keep chaining onto (.and(...), .filter(...), etc.) - that's 
what OutType is. It's the same return type every other Table API method uses, 
like isNull() or like()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to