[Python-ideas] Re: TextIOWrapper support for null-terminated lines

2020-10-25 Thread Chris Angelico
On Mon, Oct 26, 2020 at 10:47 AM Cameron Simpson  wrote:
>
> On 26Oct2020 09:45, Chris Angelico  wrote:
> >On Mon, Oct 26, 2020 at 8:44 AM Cameron Simpson  wrote:
> >> On 24Oct2020 13:37, Dan Sommers <2qdxy4rzwzuui...@potatochowder.com> wrote:
> >> >Spaces in filenames are just as bad, and much more common:
> >>
> >> But much easier to handle in simple text listings, which are newline 
> >> delimited.
> >> You're really running into a horrible behaviour from xargs, which is one
> >> reason why GNU parallel exists.
> >
> >I don't consider the behaviour horrible, and xargs isn't the only
> >thing to do this - other tools can be put into zero-termination mode
> >too.
>
> I'm not talking about -print0 and -0, which I merely dislike as a hack
> to accomodate badly named filenames, but xargs' non-0 behaviour, which
> splits on whitespace. Instead of newlines. That pissed me off enough to
> write my own.
>

Ohh, I see what you mean. Yeah, newlines would be a better default for
a lot of situations. Can't be changed now.

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/OCQ6OTLSWBPOHHVBZFK3Z35RIOSK35PO/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: TextIOWrapper support for null-terminated lines

2020-10-25 Thread Cameron Simpson
On 26Oct2020 09:45, Chris Angelico  wrote:
>On Mon, Oct 26, 2020 at 8:44 AM Cameron Simpson  wrote:
>> On 24Oct2020 13:37, Dan Sommers <2qdxy4rzwzuui...@potatochowder.com> wrote:
>> >Spaces in filenames are just as bad, and much more common:
>>
>> But much easier to handle in simple text listings, which are newline 
>> delimited.
>> You're really running into a horrible behaviour from xargs, which is one
>> reason why GNU parallel exists.
>
>I don't consider the behaviour horrible, and xargs isn't the only
>thing to do this - other tools can be put into zero-termination mode
>too.

I'm not talking about -print0 and -0, which I merely dislike as a hack
to accomodate badly named filenames, but xargs' non-0 behaviour, which
splits on whitespace. Instead of newlines. That pissed me off enough to
write my own.

[...]
>If you actually DO need to read null-terminated records from a file
>that's too big for memory, it's probably worth just rolling your own
>buffering, reading a chunk at a time and splitting off the interesting
>parts. It's not hugely difficult, and it's a good exercise to do now
>and then.

Aye. That's what my cs.buffer.CornuCopyBuffer class does for me:

https://pypi.org/project/cs.buffer/

aimed particularly at parsing binary data easily (it takes any iterable
of bytes, and has a few factories to start from a file etc).

Parsing a NUL terminated string from binary data isn't too bad given
such a thing.

Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/2G5RBMJYUWKFC7R5CO2VKODKJ2GZPA2H/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: TextIOWrapper support for null-terminated lines

2020-10-25 Thread Random832
On Sun, Oct 25, 2020, at 18:45, Chris Angelico wrote:
> If you actually DO need to read null-terminated records from a file
> that's too big for memory, it's probably worth just rolling your own
> buffering, reading a chunk at a time and splitting off the interesting
> parts. It's not hugely difficult, and it's a good exercise to do now
> and then. And yes, I can see the temptation to get Python to do it,
> but unfortunately, newline support is such a weird mess of
> cross-platform support that I don't think it needs to be made more
> complicated :)

Maybe a getdelim method that ignores all the newline support complexity and 
just reads until it reaches the specified character? It would make sense on 
binary files too.

The problem with rolling your own buffering is that there's not really a good 
way to put back the unused data after the delimiter if you're mixing this 
processing with something else. You'd have to do it a character at a time, 
which would be very inefficient in pure python.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/CXZWUKIIJNGP7EDXG7P3CHZKF3XW2P6P/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: TextIOWrapper support for null-terminated lines

2020-10-25 Thread Chris Angelico
On Mon, Oct 26, 2020 at 8:44 AM Cameron Simpson  wrote:
>
> On 24Oct2020 13:37, Dan Sommers <2qdxy4rzwzuui...@potatochowder.com> wrote:
> >On 2020-10-24 at 12:29:01 -0400,
> >Brian Allen Vanderburg II via Python-ideas  wrote:
> >
> >> ... Find can output it's filenames in null-terminated lines since it
> >> is possible to have newlines in a filename(yuck) ...
> >
> >Spaces in filenames are just as bad, and much more common:
>
> But much easier to handle in simple text listings, which are newline 
> delimited.
>
> You're really running into a horrible behaviour from xargs, which is one
> reason why GNU parallel exists.
>

I don't consider the behaviour horrible, and xargs isn't the only
thing to do this - other tools can be put into zero-termination mode
too.

But it's pretty rare to consume huge amounts of data in this way
(normally it'll just be a list of file names), so what I would do is
simply read the entire thing, then split on "\0". It's not like
reading a gigabyte of log file, where you really want to work line by
line and not read in more than you need; it's easily going to fit into
memory.

If you actually DO need to read null-terminated records from a file
that's too big for memory, it's probably worth just rolling your own
buffering, reading a chunk at a time and splitting off the interesting
parts. It's not hugely difficult, and it's a good exercise to do now
and then. And yes, I can see the temptation to get Python to do it,
but unfortunately, newline support is such a weird mess of
cross-platform support that I don't think it needs to be made more
complicated :)

ChrisA
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5VGYDJ4RZRWQWHBMSQZUD5QJUHVF2J66/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: TextIOWrapper support for null-terminated lines

2020-10-25 Thread Cameron Simpson
On 24Oct2020 13:37, Dan Sommers <2qdxy4rzwzuui...@potatochowder.com> wrote:
>On 2020-10-24 at 12:29:01 -0400,
>Brian Allen Vanderburg II via Python-ideas  wrote:
>
>> ... Find can output it's filenames in null-terminated lines since it
>> is possible to have newlines in a filename(yuck) ...
>
>Spaces in filenames are just as bad, and much more common:

But much easier to handle in simple text listings, which are newline delimited.

You're really running into a horrible behaviour from xargs, which is one 
reason why GNU parallel exists.

Cheers,
Cameron Simpson 
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6EO37LQLQWTZDJQA3FRD4FQSC7IOHKYU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: TextIOWrapper support for null-terminated lines

2020-10-24 Thread 2QdxY4RzWzUUiLuE
On 2020-10-24 at 12:29:01 -0400,
Brian Allen Vanderburg II via Python-ideas  wrote:

> ... Find can output it's filenames in null-terminated lines since it
> is possible to have newlines in a filename(yuck) ...

Spaces in filenames are just as bad, and much more common:

$ touch 'foo bar'
$ find . -name 'foo bar'
./foo bar
$ find . -name 'foo bar' -print | xargs ls -l
ls: cannot access './foo': No such file or directory
ls: cannot access 'bar': No such file or directory
$ find . -name 'foo bar' -print0 | xargs -0 ls -l
-rw-r--r-- 1 dan dan 0 Oct 24 13:31 './foo bar'
$ rm 'foo bar'
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/F5UX5CL7YQIHEX3MP5R4GUVHIXCS5VQP/
Code of Conduct: http://python.org/psf/codeofconduct/