>From ongdisheng:

Attention is currently required from: Ian Maxon.

ongdisheng has posted comments on this change by Ian Maxon. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21007?usp=email )

Change subject: [ASTERIXDB-2877][EXT] Fix multi-byte/emoji character corruption 
in CSV output
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/printers/PrintTools.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21007/comment/5f5bd699_4bc48b43?usp=email
 :
PS2, Line 322: char quote
> right now the quote, escape and delimiters are weird. […]
+1 on compile time checking

I traced through the code and found that compile time validation seems to 
already exists in `WriterValidationUtil.validateCSV()` from 
`asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/WriterValidationUtil.java`.


Currently, there are two code paths that eventually call writeUTF8StringAsCSV():
1. HTTP SELECT queries with CSV output: These use 
`CSVPrinterFactoryProvider.INSTANCE` which is initialized with an empty 
configuration, so delimiter, quote and escape always default to ASCII values (, 
" ").
2. COPY TO statements: These allow users to specify custom delimiter, quote and 
escape values. The compile-time validation happens in 
`WriterValidationUtil.validateCSV()` which calls `validateDelimiter()`, 
`validateQuote()` and `validateEscape()`.

Perhaps we can move the current runtime ASCII validation logic in 
`writeUTF8StringAsCSV()` to the compile time validators in 
`validateDelimiter()`, `validateQuote()` and validateEscape().



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21007?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: comment
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I434142a9b9cd2d1fc941b1e1f350e97403a8a3e1
Gerrit-Change-Number: 21007
Gerrit-PatchSet: 2
Gerrit-Owner: Ian Maxon <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: ongdisheng
Gerrit-CC: Hussain Towaileb <[email protected]>
Gerrit-CC: Murtadha Hubail <[email protected]>
Gerrit-Attention: Ian Maxon <[email protected]>
Gerrit-Comment-Date: Fri, 20 Mar 2026 06:52:51 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
Comment-In-Reply-To: Ian Maxon <[email protected]>

Reply via email to