[ 
https://issues.apache.org/jira/browse/THRIFT-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904387#comment-16904387
 ] 

Jarry Shaw edited comment on THRIFT-4677 at 8/10/19 9:56 AM:
-------------------------------------------------------------

Updates:

The failed query command was:

{code:sql}
osquery> select * from scheduled_tasks;
+--------------------------------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------+---------+-------+--------+---------------+---------------+------------------+---------------+
| name                                                                          
 | action                                                            | path     
                                                                       | 
enabled | state | hidden | last_run_time | next_run_time | last_run_message | 
last_run_code |
+--------------------------------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------+---------+-------+--------+---------------+---------------+------------------+---------------+
| OneDrive Standalone Update 
Task-S-1-5-21-3869350761-1086610007-2961631759-1000 |  
%localappdata%\Microsoft\OneDrive\OneDriveStandaloneUpdater.exe  | \OneDrive 
Standalone Update Task-S-1-5-21-3869350761-1086610007-2961631759-1000 | 1       
| ready | 1      | 943891200     | 1565518827    | 任务尚未运行。      | 267011        
|
+--------------------------------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------+---------+-------+--------+---------------+---------------+------------------+---------------+
{code}

And the failing context was:

{code:python}
b'\xc8\xce\xce\xf1\xc9\xd0\xce\xb4\xd4\xcb\xd0\xd0\xa1\xa3'
{code}

which was actually encoded with cp936/gbk/gb2312 (they're all the same)

{code:python}
>>> b'\xc8\xce\xce\xf1\xc9\xd0\xce\xb4\xd4\xcb\xd0\xd0\xa1\xa3'.decode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid 
continuation byte
>>> b'\xc8\xce\xce\xf1\xc9\xd0\xce\xb4\xd4\xcb\xd0\xd0\xa1\xa3'.decode('cp936')
'任务尚未运行。'
{code}



was (Author: jarryshaw):
Updates:

The failed query command was:

{code:sql}
osquery> select * from scheduled_tasks;
+--------------------------------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------+---------+-------+--------+---------------+---------------+------------------+---------------+
| name                                                                          
 | action                                                            | path     
                                                                       | 
enabled | state | hidden | last_run_time | next_run_time | last_run_message | 
last_run_code |
+--------------------------------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------+---------+-------+--------+---------------+---------------+------------------+---------------+
| OneDrive Standalone Update 
Task-S-1-5-21-3869350761-1086610007-2961631759-1000 |  
%localappdata%\Microsoft\OneDrive\OneDriveStandaloneUpdater.exe  | \OneDrive 
Standalone Update Task-S-1-5-21-3869350761-1086610007-2961631759-1000 | 1       
| ready | 1      | 943891200     | 1565518827    | 任务尚未运行。      | 267011        
|
+--------------------------------------------------------------------------------+-------------------------------------------------------------------+---------------------------------------------------------------------------------+---------+-------+--------+---------------+---------------+------------------+---------------+
{code}

And the failing context was:

{code:python}
b'\xc8\xce\xce\xf1\xc9\xd0\xce\xb4\xd4\xcb\xd0\xd0\xa1\xa3'
{code}

which was actually encoded with cp936/gbk/gb2312 (they're all the same)

{code:python}
>>> b'\xc8\xce\xce\xf1\xc9\xd0\xce\xb4\xd4\xcb\xd0\xd0\xa1\xa3'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid 
continuation byte
>>> b'\xc8\xce\xce\xf1\xc9\xd0\xce\xb4\xd4\xcb\xd0\xd0\xa1\xa3'.decode('cp936')
'任务尚未运行。'
{code}


> UnicodeDecodeError in Python3
> -----------------------------
>
>                 Key: THRIFT-4677
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4677
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>         Environment: Operating System: Windows 10 Pro (Simplified Chinese)
> Python Interpreter: Python 3.6.6
> {{osquery}} Version: 3.3.0
> {{osquery-python}} Version: 3.0.5
>  
>            Reporter: Jarry Shaw
>            Priority: Major
>         Attachments: compat.py, osquery_all_mp.py
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> This is an issue occurred when using 
> [osquery-python|https://github.com/osquery/osquery-python] (Python binding of 
> [osquery|https://osquery.io/] by Facebook).
> When querying, {{UnicodeDecodeError}} raised with error message: "{{'utf-8' 
> codec can't decode byte 0xc3 in position 0: invalid continuation byte}}" from 
> {{thrift.compat.binary_to_str}}, which is because the encoding of {{bin_val}} 
> parameter should be "{{gbk}}".
> Possible approaches are:
>  * add a parameter for user to determine encodings
>  * get the system encoding through {{locale.getpreferredencoding()}}
>  * call {{bin_val.decode}} with {{errors='replace'}} or {{errors='ignore'}} 
> parameter
>  * introduce {{chardet}} to try and resolve encoding problems
> The attachment is my hack solution to this issue (through not perfect).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to