Re: python3, regular expression and bytes text

Eko palypse Sat, 12 Oct 2019 12:57:06 -0700

Thank you very much for your answer.

> You have to be able to match bytes, not strings.


May I ask you to elaborate on this, sorry non-native English speaker.
The buffer I receive is a byte-like buffer.

> I don't think you'll be able to 100% reliably match bytes in this way.
> You're asking it to make analysis of multiple bytes and to interpret
> them according to which character they would represent if decoded from
> UTF-8.
> 
> My recommendation: Even if your buffer is multiple gigabytes, just
> decode it anyway. Maybe you can decode your buffer in chunks, but
> otherwise, just bite the bullet and do the decode. You may be
> pleasantly surprised at how little you suffer as a result; Python is
> quite decent at memory management, and even if you DO get pushed into
> the swapper by this, it's still likely to be faster than trying to
> code around all the possible problems that come from mismatching your
> text search.
> 
> ChrisA

That's what I was afraid of. 
It would be nice if the "world" could commit itself to one standard, 
but I'm afraid that won't happen in my life anymore, I guess. :-(

Thx
Eren
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python3, regular expression and bytes text

Reply via email to