>From Ali Alsuliman <[email protected]>:

Attention is currently required from: Shahrzad Shirazi, Michael Blow, Hussain 
Towaileb.
Ali Alsuliman has posted comments on this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146 )

Change subject: [NO ISSUE][COMP] Add None as quote option for CSV in external 
collections
......................................................................


Patch Set 12:

(5 comments)

File 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/stream/QuotedLineRecordReader.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/736a4b49_2ef32b0a
PS12, Line 62: this.quoteCheckNeeded = this.quote != CSVUtils.NULL_CHAR;
Should not this be driven by checking against "NONE" value in the config? i.e. 
if the value is not "NONE", then quoteCheckNeeded is true (even if the quote is 
specified as '\0'; I am not sure if we ever enforce that a quote cannot be 
'\0').
I understand that most likely no one is going to use '\0' as a quote char, but 
having it like this means we are not differentiating between quote as NONE and 
quote as '\0'.


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/023780ea_285e046e
PS12, Line 93: public boolean hasNext() throws IOException {
Shouldn't we just call super.next() when quoteCheckNeeded = false? Meaning if 
the user is saying NONE for quote, then we should not bother with dealing with 
a quote and just use line_record_reader that just reads until it finds 
end-of-line and then gives it to the csv_field_reader.


File 
asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/ExternalDataUtils.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/6baabce1_37250995
PS12, Line 698: public static void validateChar
This is a common path for other parameters too. We should only do this for the 
"quote" parameter so that other parameters continue to be checked


File 
hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/FieldCursorForDelimitedDataParser.java:

https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/c46d00a5_32abe7ed
PS12, Line 215: if (ch == '\n' && !startedQuote) {
Isn't this going to fail when we encounter '\n' at the end of the line since 
!startedQuote is always going to be true since we never mark startedQuote for 
NONE option? I believe your test does not trigger this path (this nextRecord() 
method, I mean).


https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/6ffdede6_1ecc2574
PS12, Line 311: if (!quoteCheckNeeded || !startedQuote) {
I didn't quite follow the flow for NONE option where the quoteCheckNeeded is 
false, meaning how is this solving the issue when the quote is '\0'?
Isn't the flow still the same where we go through the usual quote handling down 
below (except that the quote now is '\0' instead of ")?



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings

Gerrit-Project: asterixdb
Gerrit-Branch: ionic
Gerrit-Change-Id: I3812d2d1306282e9f02e8b77e1f79ac6b203cabe
Gerrit-Change-Number: 20146
Gerrit-PatchSet: 12
Gerrit-Owner: Shahrzad Shirazi <[email protected]>
Gerrit-Reviewer: Ali Alsuliman <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Hussain Towaileb <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Michael Blow <[email protected]>
Gerrit-Reviewer: Murtadha Hubail <[email protected]>
Gerrit-Attention: Shahrzad Shirazi <[email protected]>
Gerrit-Attention: Michael Blow <[email protected]>
Gerrit-Attention: Hussain Towaileb <[email protected]>
Gerrit-Comment-Date: Fri, 01 Aug 2025 21:09:39 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Gerrit-MessageType: comment

Reply via email to