[ 
https://issues.apache.org/jira/browse/IMPALA-11325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-11325.
------------------------------------
    Fix Version/s: Impala 4.2.0
         Assignee: Joe McDonnell
       Resolution: Fixed

> Impala-shell hits UnicodeDecodeError when outputting Unicode via --output_file
> ------------------------------------------------------------------------------
>
>                 Key: IMPALA-11325
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11325
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Clients
>    Affects Versions: Impala 4.2.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>             Fix For: Impala 4.2.0
>
>
> When running impala-shell and trying to output Unicode to a fail via --output 
> file, it fails:
> {noformat}
> ishell -B -q "select '引'" --output_file=joetest3.txt
> /home/joe/view2/Impala/shell/option_parser.py:359: UnicodeWarning: Unicode 
> equal comparison failed to convert both arguments to Unicode - interpreting 
> them as being unequal
>   if '--live_progress' in sys.argv and '--disable_live_progress' in sys.argv:
> /home/joe/view2/Impala/shell/option_parser.py:363: UnicodeWarning: Unicode 
> equal comparison failed to convert both arguments to Unicode - interpreting 
> them as being unequal
>   if '--strict_hs2_protocol' in sys.argv:
> /home/joe/view2/Impala/shell/option_parser.py:369: UnicodeWarning: Unicode 
> equal comparison failed to convert both arguments to Unicode - interpreting 
> them as being unequal
>   if '--verbose' in sys.argv and '--quiet' in sys.argv:
> Starting Impala Shell with no authentication using Python 2.7.16
> Warning: live_progress only applies to interactive shell sessions, and is 
> being skipped for now.
> Opened TCP connection to localhost:21050
> Connected to localhost:21050
> Server version: impalad version 4.1.0-SNAPSHOT DEBUG (build 
> 4236c307b971881a3b1d85068db5b053a9c34cfa)
> Query: select '引'
> Query submitted at: 2022-05-31 08:31:50 (Coordinator: 
> http://joemcdonnell:25000)
> Query progress can be monitored at: 
> http://joemcdonnell:25000/query_plan?query_id=2347462fe8a18544:bbeedc1800000000
> UnicodeDecodeError : 'ascii' codec can't decode byte 0xe5 in position 0: 
> ordinal not in range(128) 
> Please check for columns containing binary data to find the possible source 
> of the error.
> Could not execute command: select '引'{noformat}
> This is specific to file output. This same query works if outputting to the 
> console.
> This line seems to be the problem:
> {noformat}
>         with open(self.filename, 'ab') as out_file:
>           # Note that instances of this class do not persist, so it's fine to
>           # close the we close the file handle after each write.
>           out_file.write(formatted_data.encode('utf-8'))  # file opened in 
> binary mode <--------
>           out_file.write(b'\n')
> {noformat}
> [https://github.com/apache/impala/blob/master/shell/shell_output.py#L115]
> It seems to work if we remove the .encode('utf-8').



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to