Re: tail

MRAB Sat, 07 May 2022 12:25:05 -0700

On 2022-05-07 19:35, Marco Sulla wrote:

On Sat, 7 May 2022 at 19:02, MRAB <[email protected]> wrote:
>
> On 2022-05-07 17:28, Marco Sulla wrote:
> > On Sat, 7 May 2022 at 16:08, Barry <[email protected]> wrote:
> >> You need to handle the file in bin mode and do the handling of line 
endings and encodings yourself. It’s not that hard for the cases you wanted.
> >
> >>>> "\n".encode("utf-16")
> > b'\xff\xfe\n\x00'
> >>>> "".encode("utf-16")
> > b'\xff\xfe'
> >>>> "a\nb".encode("utf-16")
> > b'\xff\xfea\x00\n\x00b\x00'
> >>>> "\n".encode("utf-16").lstrip("".encode("utf-16"))
> > b'\n\x00'
> >
> > Can I use the last trick to get the encoding of a LF or a CR in any 
encoding?
>
> In the case of UTF-16, it's 2 bytes per code unit, but those 2 bytes
> could be little-endian or big-endian.
>
> As you didn't specify which you wanted, it defaulted to little-endian
> and added a BOM (U+FEFF).
>
> If you specify which endianness you want with "utf-16le" or "utf-16be",
> it won't add the BOM:
>
>  >>> # Little-endian.
>  >>> "\n".encode("utf-16le")
> b'\n\x00'
>  >>> # Big-endian.
>  >>> "\n".encode("utf-16be")
> b'\x00\n'


Well, ok, but I need a generic method to get LF and CR for any
encoding an user can input.
Do you think that

"\n".encode(encoding).lstrip("".encode(encoding))

is good for any encoding?

'.lstrip' is the wrong method to use because it treats its argument as aset of characters, so it might strip off too many characters. A betterchoice is '.removeprefix'.

Furthermore, is there a way to get the encoding of an opened file object?

How was the file opened?

If it was opened as a text file, use the '.encoding' attribute (whichjust tells you what encoding was specified when it was opened, and you'dbe assuming that it's the correct one).

If it was opened as a binary file, all you know is that it containsbytes, and determining the encoding (assuming that it is a text file) isdown to heuristics (i.e. guesswork).


--
https://mail.python.org/mailman/listinfo/python-list

Re: tail

Reply via email to