Re: [PR] feat(python): support latin1/utf16 string encoding in python [fury]

via GitHub Mon, 06 Jan 2025 19:05:12 -0800


penguin-wwy commented on PR #1997:
URL: https://github.com/apache/fury/pull/1997#issuecomment-2574303154


   > @penguin-wwy with this pr, the string serialization is 2X+ faster than 
pickle for non-latin1 string. But for latin1 string, it's slower. Seems we need 
to implement a special `_PyUnicode_FromUCS1` instead of just using 
`PyUnicode_DecodeLatin1/PyUnicode_FromASCII/PyUnicode_FromKindAndData`. I still 
don't know why `PyUnicode_FromKindAndData` is slow. It should be just one 
memory copy and fast
   
   For latin1 strings, pickle also implements an optimized fast path, and the 
interface it calls is almost identical to that of fury.
   
   `PyUnicode_DecodeLatin1` function directly calls `_PyUnicode_FromUCS1`. 
`_PyUnicode_FromUCS1` performs a traversal to check if any characters exceed 
127, to set the is_ascii flag in the PyUnicodeObject. 
   `PyUnicode_FromASCII` assumes all characters are ASCII and does not perform 
the traversal. 
   `PyUnicode_FromKindAndData` behaves the same as PyUnicode_DecodeLatin1 under 
PyUnicode_1BYTE_KIND.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat(python): support latin1/utf16 string encoding in python [fury]

Reply via email to