Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/22049 )
Change subject: IMPALA-10319: Support arbitrary encodings on Text files ...................................................................... Patch Set 23: (2 comments) http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc File be/src/util/char-codec.cc: http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@61 PS21, Line 61: scopedMemTracker > let's leave it like e.g. MemTracker I meant the variable name could follow the snake-case, as other variable names. http://gerrit.cloudera.org:8080/#/c/22049/21/be/src/util/char-codec.cc@103 PS21, Line 103: prefix > These are really different entities with different scope, but maybe I didn' partial_symbol can be used to build up the symbol from the new buffer, as it's only appended by new characters and tries to convert to a valid one. I had the following idea, it's possible that I miss something, in this case, please correct me: ```Status CharCodec::HandlePrefix(uint8_t** buf_start, uint8_t* buf_end, std::string* result_prefix) { if (!partial_symbol_.empty()) { DCHECK_LT(partial_symbol_.size(), MAX_SYMBOL); while (partial_symbol_.size() < MAX_SYMBOL && *buf_start < buf_end) { partial_symbol_.push_back(**buf_start); ++(*buf_start); try { *result_prefix = boost::locale::conv::to_utf<char>( reinterpret_cast<const char*>(partial_symbol_.data()), reinterpret_cast<const char*>(partial_symbol_.data() + partial_symbol_.size()), encoding_, boost::locale::conv::stop); partial_symbol_.clear(); return Status::OK(); } catch (boost::locale::conv::conversion_error&) { continue; } catch (const std::exception& e) { return Status(TErrorCode::CHARSET_CONVERSION_ERROR, e.what()); } } return Status(TErrorCode::CHARSET_CONVERSION_ERROR, "Unable to decode buffer"); }``` -- To view, visit http://gerrit.cloudera.org:8080/22049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I787cd01caa52a19d6645519a6cedabe0a5253a65 Gerrit-Change-Number: 22049 Gerrit-PatchSet: 23 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Peter Rozsa <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 28 May 2025 15:14:22 +0000 Gerrit-HasComments: Yes
