[ 
https://issues.apache.org/jira/browse/THRIFT-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904382#comment-16904382
 ] 

Jarry Shaw edited comment on THRIFT-4677 at 8/10/19 9:14 AM:
-------------------------------------------------------------

Sorry for the late reply. It was quite a long time ago, and I just tried to 
reproduce the bug recently.

So here's the exception traceback stack:

{code:python}
Traceback (most recent call last):
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py",
 line 121, in worker
    result = (True, func(*args, **kwds))
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py",
 line 44, in mapstar
    return list(map(*args))
  File "C:\Users\fakepath\Desktop\osquery_all_mp.py", line 54, in query
    query = instance.client.query(f'SELECT * FROM {table};')
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
 line 182, in query
    return self.recv_query()
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
 line 201, in recv_query
    result.read(iprot)
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
 line 981, in read
    self.success.read(iprot)
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ttypes.py",
 line 339, in read
    _val12 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 
else iprot.readString()
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\protocol\TProtocol.py",
 line 184, in readString
    return binary_to_str(self.readBinary())
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\compat.py",
 line 37, in binary_to_str
    return bin_val.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid 
continuation byte
{code}

Environment:

* Windows 10 Pro (Simplified Chinese)
* osquery v3.3.0
* osquery-python v3.0.6 (Python binding)
* thrift v0.11.0

And the Python system locale information:

{code:python}
>>> locale.getpreferredencoding()
'cp936'
{code}

Sorry I'm not familiar Thrift's implementation, so not really know how this bug 
should be fixed.
However, you may find the source code I'm using in the attachment.

 [^osquery_all_mp.py] 


was (Author: jarryshaw):
Sorry for the late reply. It was quite a long time ago, and I just tried to 
reproduce the bug recently.

So here's the exception traceback stack:

{code:python}
Traceback (most recent call last):
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py",
 line 121, in worker
    result = (True, func(*args, **kwds))
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py",
 line 44, in mapstar
    return list(map(*args))
  File "C:\Users\fakepath\Desktop\osquery_all_mp.py", line 54, in query
    query = instance.client.query(f'SELECT * FROM {table};')
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
 line 182, in query
    return self.recv_query()
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
 line 201, in recv_query
    result.read(iprot)
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
 line 981, in read
    self.success.read(iprot)
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ttypes.py",
 line 339, in read
    _val12 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 
else iprot.readString()
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\protocol\TProtocol.py",
 line 184, in readString
    return binary_to_str(self.readBinary())
  File 
"C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\compat.py",
 line 37, in binary_to_str
    return bin_val.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid 
continuation byte
{code}

Environment:

* Windows 10 Pro (Simplified Chinese)
* osquery v3.3.0
* osquery-python v3.0.6 (Python binding)
* thrift v0.11.0

And the Python system locale information:

{code:python}
>>> locale.getpreferredencoding()
'cp936'
{code}

Sorry I'm not familiar Thrift's implementation, so not really know how should 
this bug be fixed.
However, you may find the source code I'm using in the attachment.

 [^osquery_all_mp.py] 

> UnicodeDecodeError in Python3
> -----------------------------
>
>                 Key: THRIFT-4677
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4677
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>         Environment: Operating System: Windows 10 Pro (Simplified Chinese)
> Python Interpreter: Python 3.6.6
> {{osquery}} Version: 3.3.0
> {{osquery-python}} Version: 3.0.5
>  
>            Reporter: Jarry Shaw
>            Priority: Major
>         Attachments: compat.py, osquery_all_mp.py
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> This is an issue occurred when using 
> [osquery-python|https://github.com/osquery/osquery-python] (Python binding of 
> [osquery|https://osquery.io/] by Facebook).
> When querying, {{UnicodeDecodeError}} raised with error message: "{{'utf-8' 
> codec can't decode byte 0xc3 in position 0: invalid continuation byte}}" from 
> {{thrift.compat.binary_to_str}}, which is because the encoding of {{bin_val}} 
> parameter should be "{{gbk}}".
> Possible approaches are:
>  * add a parameter for user to determine encodings
>  * get the system encoding through {{locale.getpreferredencoding()}}
>  * call {{bin_val.decode}} with {{errors='replace'}} or {{errors='ignore'}} 
> parameter
>  * introduce {{chardet}} to try and resolve encoding problems
> The attachment is my hack solution to this issue (through not perfect).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to