>From Ali Alsuliman <[email protected]>: Attention is currently required from: Shahrzad Shirazi, Michael Blow, Hussain Towaileb. Ali Alsuliman has posted comments on this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146 )
Change subject: [NO ISSUE][COMP] Add None as quote option for CSV in external collections ...................................................................... Patch Set 12: (5 comments) File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/record/reader/stream/QuotedLineRecordReader.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/736a4b49_2ef32b0a PS12, Line 62: this.quoteCheckNeeded = this.quote != CSVUtils.NULL_CHAR; Should not this be driven by checking against "NONE" value in the config? i.e. if the value is not "NONE", then quoteCheckNeeded is true (even if the quote is specified as '\0'; I am not sure if we ever enforce that a quote cannot be '\0'). I understand that most likely no one is going to use '\0' as a quote char, but having it like this means we are not differentiating between quote as NONE and quote as '\0'. https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/023780ea_285e046e PS12, Line 93: public boolean hasNext() throws IOException { Shouldn't we just call super.next() when quoteCheckNeeded = false? Meaning if the user is saying NONE for quote, then we should not bother with dealing with a quote and just use line_record_reader that just reads until it finds end-of-line and then gives it to the csv_field_reader. File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/util/ExternalDataUtils.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/6baabce1_37250995 PS12, Line 698: public static void validateChar This is a common path for other parameters too. We should only do this for the "quote" parameter so that other parameters continue to be checked File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/FieldCursorForDelimitedDataParser.java: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/c46d00a5_32abe7ed PS12, Line 215: if (ch == '\n' && !startedQuote) { Isn't this going to fail when we encounter '\n' at the end of the line since !startedQuote is always going to be true since we never mark startedQuote for NONE option? I believe your test does not trigger this path (this nextRecord() method, I mean). https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146/comment/6ffdede6_1ecc2574 PS12, Line 311: if (!quoteCheckNeeded || !startedQuote) { I didn't quite follow the flow for NONE option where the quoteCheckNeeded is false, meaning how is this solving the issue when the quote is '\0'? Isn't the flow still the same where we go through the usual quote handling down below (except that the quote now is '\0' instead of ")? -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20146 To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-Project: asterixdb Gerrit-Branch: ionic Gerrit-Change-Id: I3812d2d1306282e9f02e8b77e1f79ac6b203cabe Gerrit-Change-Number: 20146 Gerrit-PatchSet: 12 Gerrit-Owner: Shahrzad Shirazi <[email protected]> Gerrit-Reviewer: Ali Alsuliman <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Hussain Towaileb <[email protected]> Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Michael Blow <[email protected]> Gerrit-Reviewer: Murtadha Hubail <[email protected]> Gerrit-Attention: Shahrzad Shirazi <[email protected]> Gerrit-Attention: Michael Blow <[email protected]> Gerrit-Attention: Hussain Towaileb <[email protected]> Gerrit-Comment-Date: Fri, 01 Aug 2025 21:09:39 +0000 Gerrit-HasComments: Yes Gerrit-Has-Labels: No Gerrit-MessageType: comment
