>From ongdisheng:

Attention is currently required from: Ian Maxon.

ongdisheng has posted comments on this change by Ian Maxon. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21007?usp=email )

Change subject: [ASTERIXDB-2877][EXT] Fix multi-byte/emoji character corruption 
in CSV output
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

File 
asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/printers/PrintTools.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21007/comment/01f4e89d_e553f647?usp=email
 :
PS2, Line 322: char quote
> Oh, very nice detective work. […]
LGTM, seems like I don't have +2 permissions.

I just noticed something probably worth mentioning. The current compile-time 
validation in `WriterValidationUtil.unitByteCondition()` uses AND:
```
if (param != null && param.length() > 1 && param.getBytes().length != 1)
```

However, characters with `length()=1` but multiple bytes like `¢` or `中` will 
pass validation. Example:
- User query: COPY (...) WITH {"delimiter":"¢", "quote":"中"}
- Character `¢`: length()=1, getBytes().length=2
- Condition above would result in false
- No compilation error being thrown and query executes with non-ASCII delimiter

I think changing AND to OR would probably help to enforce ASCII-only:
```
if (param != null && (param.length() > 1 || param.getBytes().length != 1))
```

Feel free to let me know what you think on this :D



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21007?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: comment
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: I434142a9b9cd2d1fc941b1e1f350e97403a8a3e1
Gerrit-Change-Number: 21007
Gerrit-PatchSet: 3
Gerrit-Owner: Ian Maxon <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: ongdisheng
Gerrit-CC: Hussain Towaileb <[email protected]>
Gerrit-CC: Murtadha Hubail <[email protected]>
Gerrit-Attention: Ian Maxon <[email protected]>
Gerrit-Comment-Date: Mon, 23 Mar 2026 13:58:01 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
Comment-In-Reply-To: ongdisheng
Comment-In-Reply-To: Ian Maxon <[email protected]>

Reply via email to