Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16688 )
Change subject: IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0 ...................................................................... IMPALA-10145,IMPALA-10299: Apply unicode decoding bug fixes to thrift-0.11.0 After we bump the impala-shell dependent thrift version to 0.11.0, we hit some bugs in decoding malformed utf8 characters, which crash the impala-shell or cause it hanging forever. Before we bump the thrift version, impala-shell is able to print incomplete utf8 characters as some replaced utf8 symbols, e.g. impala-shell> select substr("引擎", 1, 4); 引� impala-shell> select unhex("aa"); � The cause is that thrift changes its internal strings representation from bytes to unicode after 0.10 (THRIFT-3503) to support Python3, which follows the "unicode sandwich" rule -- namely "bytes on the outside, unicode on the inside, encode/decode at the edges". However, the error handling method is not specified so we hit the decoding error. We need patches of THRIFT-2087 and THRIFT-5303 to improve its robustness. THRIFT-5303 is enough to resolve the issue we hitted since we mostly use the _fast_decode code path. Backporting THRIFT-2087 as well in case we use the normal decoding code path somewhere. Tests: - Verify the issue is resolved after bumping the impala-shell dependent thrift version to 0.11.0-p4. Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af Reviewed-on: http://gerrit.cloudera.org:8080/16688 Reviewed-by: Csaba Ringhofer <csringho...@cloudera.com> Tested-by: Quanlong Huang <huangquanl...@gmail.com> --- M buildall.sh A source/thrift/thrift-0.11.0-patches/0003-THRIFT-2087-Python-compiler-replace-non-utf-8-char-w.patch A source/thrift/thrift-0.11.0-patches/0004-THRIFT-5303-Fix-missing-error-handling-in-using-PyUn.patch 3 files changed, 55 insertions(+), 1 deletion(-) Approvals: Csaba Ringhofer: Looks good to me, approved Quanlong Huang: Verified -- To view, visit http://gerrit.cloudera.org:8080/16688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id16b04248f2db3033bef3ab26b7ba8205768c9af Gerrit-Change-Number: 16688 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>