Peter Rozsa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22049 )

Change subject: IMPALA-10319: Support arbitrary encodings on Text files
......................................................................


Patch Set 23:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc
File be/src/util/char-codec.cc:

http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@61
PS21, Line 61: scopedMemTracker
> let's leave it like e.g. MemTracker
I meant the variable name could follow the snake-case, as other variable names.


http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@103
PS21, Line 103: prefix
> These are really different entities with different scope, but maybe I didn'
partial_symbol can be used to build up the symbol from the new buffer, as it's 
only appended by new characters and tries to convert to a valid one. I had the 
following idea, it's possible that I miss something, in this case, please 
correct me:
```Status CharCodec::HandlePrefix(uint8_t** buf_start, uint8_t* buf_end,
                              std::string* result_prefix) {
  if (!partial_symbol_.empty()) {
    DCHECK_LT(partial_symbol_.size(), MAX_SYMBOL);
    while (partial_symbol_.size() < MAX_SYMBOL && *buf_start < buf_end) {
      partial_symbol_.push_back(**buf_start);
      ++(*buf_start);
      try {
        *result_prefix = boost::locale::conv::to_utf<char>(
            reinterpret_cast<const char*>(partial_symbol_.data()),
            reinterpret_cast<const char*>(partial_symbol_.data() + 
partial_symbol_.size()),
            encoding_, boost::locale::conv::stop);

        partial_symbol_.clear();
        return Status::OK();
      } catch (boost::locale::conv::conversion_error&) {
        continue;
      } catch (const std::exception& e) {
        return Status(TErrorCode::CHARSET_CONVERSION_ERROR, e.what());
      }
    }
    return Status(TErrorCode::CHARSET_CONVERSION_ERROR, "Unable to decode 
buffer");
  }```



--
To view, visit http://gerrit.cloudera.org:8080/22049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65
Gerrit-Change-Number: 22049
Gerrit-PatchSet: 23
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Peter Rozsa <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 28 May 2025 15:14:22 +0000
Gerrit-HasComments: Yes

Reply via email to