This is an automated email from the ASF dual-hosted git repository.
liaoxin pushed a commit to branch branch-2.0
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/branch-2.0 by this push:
new 5b8dda44684 [fix](csv reader) fix csv parser incorrect if enclosing
line_delimiter (#38347) (#38446)
5b8dda44684 is described below
commit 5b8dda44684832df0b5fc02d205e3e1fc5646130
Author: hui lai <[email protected]>
AuthorDate: Mon Jul 29 14:55:31 2024 +0800
[fix](csv reader) fix csv parser incorrect if enclosing line_delimiter
(#38347) (#38446)
Csv reader parse data incorrect when data enclosing line_delimiter, for
example, line_delimiter is \n and enclose is ', data as follows:
```
'aaaaaaaaaaaa
bbbb'
```
it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather
than one column
```
'aaaaaaaaaaaa
bbbb'
```
The reason why this happened is csv reader will not reset result when
not match enclose in this `output_buf_read`, causing incorrect
truncation was made.
Co-authored-by: Xin Liao <[email protected]>
---
be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
index 8dce6e589af..75350890aee 100644
--- a/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
+++ b/be/src/vec/exec/format/file_reader/new_plain_text_line_reader.cpp
@@ -160,6 +160,11 @@ void
EncloseCsvLineReaderContext::_on_pre_match_enclose(const uint8_t* start, si
if (_idx != _total_len) {
len = update_reading_bound(start);
} else {
+ // It needs to set the result to nullptr for matching enclose may
not be read
+ // after reading the output buf.
+ // Therefore, if the result is not set to nullptr,
+ // the parser will consider reading a line as there is a line
delimiter.
+ _result = nullptr;
break;
}
} while (true);
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]