penguin-wwy commented on PR #1997: URL: https://github.com/apache/fury/pull/1997#issuecomment-2574303154
> @penguin-wwy with this pr, the string serialization is 2X+ faster than pickle for non-latin1 string. But for latin1 string, it's slower. Seems we need to implement a special `_PyUnicode_FromUCS1` instead of just using `PyUnicode_DecodeLatin1/PyUnicode_FromASCII/PyUnicode_FromKindAndData`. I still don't know why `PyUnicode_FromKindAndData` is slow. It should be just one memory copy and fast For latin1 strings, pickle also implements an optimized fast path, and the interface it calls is almost identical to that of fury. `PyUnicode_DecodeLatin1` function directly calls `_PyUnicode_FromUCS1`. `_PyUnicode_FromUCS1` performs a traversal to check if any characters exceed 127, to set the is_ascii flag in the PyUnicodeObject. `PyUnicode_FromASCII` assumes all characters are ASCII and does not perform the traversal. `PyUnicode_FromKindAndData` behaves the same as PyUnicode_DecodeLatin1 under PyUnicode_1BYTE_KIND. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
