pitrou commented on code in PR #45498:
URL: https://github.com/apache/arrow/pull/45498#discussion_r1952194887
##########
cpp/src/arrow/csv/parser.cc:
##########
@@ -171,12 +186,26 @@ class ResizableValueDescWriter : public
ValueDescWriter<ResizableValueDescWriter
// faster CSV parsing code.
class PresizedValueDescWriter : public
ValueDescWriter<PresizedValueDescWriter> {
public:
+ // The number of offsets being written will be `1 + num_rows * num_cols`,
+ // however we allow for one extraneous write in case of excessive columns,
+ // hence `2 + num_rows * num_cols` (see explanation in PushValue below).
PresizedValueDescWriter(MemoryPool* pool, int32_t num_rows, int32_t num_cols)
- : ValueDescWriter(pool, /*values_capacity=*/1 + num_rows * num_cols) {}
+ : ValueDescWriter(pool, /*values_capacity=*/2 + num_rows * num_cols) {}
void PushValue(ParsedValueDesc v) {
DCHECK_LT(values_size_, values_capacity_);
- values_[values_size_++] = v;
+ values_[values_size_] = v;
+ // We must take care not to write past the buffer's end if the line being
+ // parsed has more than `num_cols` columns. The obvious solution of setting
Review Comment:
> (as a side note, this is probably a good argument for C++ exceptions
generally, since they don't incur any cost in the happy case - unlike explicit
error returns)
And now that remark gives me an idea for solving this without any
performance cost. It's gonna be hackish though :(
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]