[ https://issues.apache.org/jira/browse/IMPALA-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190672#comment-17190672 ]
Adam Tamas commented on IMPALA-10145: ------------------------------------- A possible solution for these two cases could be the switching of the thrift transmission method from string to binary. > UnicodeDecodeError in Thrift 0.11.0 generated files > --------------------------------------------------- > > Key: IMPALA-10145 > URL: https://issues.apache.org/jira/browse/IMPALA-10145 > Project: IMPALA > Issue Type: Bug > Reporter: Adam Tamas > Priority: Major > > If there is a string with undecodable characters in the query results, then > an error will happen during the fetching while thrift 0.11.0 generated python > files were in use which results in an UnicodeDecodeError. > Depending on which protocol is in use with the impala-shell, the error will > happen in different places. > Examples for hs2-http and hs2 protocolls: > {code:java} > [localhost:28000] default> select unhex('aa'); > Query: select unhex('aa') > Query submitted at: 2020-09-04 12:41:14 (Coordinator: > http://tadam-OptiPlex-7070:25000) > Query progress can be monitored at: > http://tadam-OptiPlex-7070:25000/query_plan?query_id=d041ab999f597fec:46a8b51800000000 > Caught exception 'utf8' codec can't decode byte 0xaa in position 0: invalid > start byte, type=<type 'exceptions.UnicodeDecodeError'> in FetchResults. > Unknown Exception : 'utf8' codec can't decode byte 0xaa in position 0: > invalid start byte > Traceback (most recent call last): > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/impala_shell.py", > line 1183, in _execute_stmt > for rows in rows_fetched: > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", > line 781, in fetch > resp = self._do_hs2_rpc(FetchResults) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", > line 942, in _do_hs2_rpc > return rpc() > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", > line 778, in FetchResults > return self.imp_service.FetchResults(req) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", > line 717, in FetchResults > return self.recv_FetchResults() > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", > line 736, in recv_FetchResults > result.read(iprot) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", > line 3593, in read > self.success.read(iprot) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/ttypes.py", > line 5888, in read > self.results.read(iprot) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/ttypes.py", > line 2670, in read > _elem115.read(iprot) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/ttypes.py", > line 2556, in read > self.stringVal.read(iprot) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/ttypes.py", > line 2352, in read > _elem95 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 > else iprot.readString() > File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xaa in position 0: > invalid start byte > [Not connected] > > {code} > {code:java} > [localhost:21050] default> select unhex('aa'); > Query: select unhex('aa') > Query submitted at: 2020-09-04 12:42:22 (Coordinator: > http://tadam-OptiPlex-7070:25000) > Query progress can be monitored at: > http://tadam-OptiPlex-7070:25000/query_plan?query_id=3a481e2a0581ea7c:a6e1901800000000 > Caught exception 'utf8' codec can't decode byte 0xaa in position 0: invalid > start byte, type=<type 'exceptions.UnicodeDecodeError'> in FetchResults. > Unknown Exception : 'utf8' codec can't decode byte 0xaa in position 0: > invalid start byte > Traceback (most recent call last): > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/impala_shell.py", > line 1183, in _execute_stmt > for rows in rows_fetched: > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", > line 781, in fetch > resp = self._do_hs2_rpc(FetchResults) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", > line 942, in _do_hs2_rpc > return rpc() > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/lib/impala_client.py", > line 778, in FetchResults > return self.imp_service.FetchResults(req) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", > line 717, in FetchResults > return self.recv_FetchResults() > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", > line 736, in recv_FetchResults > result.read(iprot) > File > "/home/tadam/imp/impala/shell/build/impala-shell-4.0.0-SNAPSHOT/gen-py/TCLIService/TCLIService.py", > line 3583, in read > iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec]) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xaa in position 0: > invalid start byte > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org