[Python-ideas] Re: Method to efficiently advance iterators for sequences that support random access

David Mertz Wed, 07 Oct 2020 16:26:24 -0700

Yes on systems language. But also, why read IP packets as a bytes iterator?
Reading the whole 64k into a bytes object is trivial memory, and then it's
random access anyway.


I was trying to construct a scenario where this could be faster. Maybe the
network is VERY slow and the whole bigger isn't available at once. But then
.advance() still has to block on the bytes anyway. I suppose you can invent
a story where the first 32k bytes of each packet are fast on the network,
them the other 32k bytes are slow.

It feels like you have to fine-tune a very unusual scenario to produce even
minor benefit.

On Wed, Oct 7, 2020, 7:06 PM Caleb Donovick <donov...@cs.stanford.edu>
wrote:

> > This is BARELY more plausible as a real-world case.  Throwing away 28
> bytes with a bunch of next() calls is completely trivial in time.  A case
> where some implementation could conceivably save measurable time would
> require skipping 100s of thousands or millions of next() calls... and
> probably calls that actually took some work to compute to matter, even
> there.
>
> Seeing as an IP packet could in theory be as large 64K
> `stream.advance(total_length - 32 - len(header))` could be skipping ~65,000
> next calls. (Although in practice packets are very unlikely to exceed 1,500
> bytes because of the ethernet standard.)  In any event avoiding ~1500 next
> calls per packet is hardly insignificant if you want to process more than
> handful of packet.
>
> Now a better rejection to my example is that this sort of code does not
> belong in python and should be in a systems language, an argument I would
> agree with.
>
>
> -- Caleb Donovick
>
>
> On Wed, Oct 7, 2020 at 3:38 PM David Mertz <me...@gnosis.cx> wrote:
>
>> On Wed, Oct 7, 2020 at 6:24 PM Caleb Donovick <donov...@cs.stanford.edu>
>> wrote:
>>
>>> Itertools.count was an example (hence the use of "e.g.") of an iterator
>>> which can be efficiently
>>> advanced without producing intermediate state. Clearly anyone can
>>> advance it manually.
>>> My point is that an iterator may have an efficient way to calculate its
>>> state some point in the future
>>> without needing to calculate the intermediate state.
>>>
>>
>> Yes, this is technically an example.  But this doesn't get us any closer
>> to a real-world use case.  If you want an iterator than counts from N, the
>> spelling `count(N)` exists now.  If you want to starting counting N
>> elements later than wherever you are now, I guess do:
>>
>> new_count = counter(next(old_cound) + N)
>>
>> For example the fibonacci sequence has a closed
>>> form formula for the nth element and hence could be advanced efficiently.
>>>
>>
>> Sure.  And even more relevantly, if you want the Nth Fibonacci you can
>> write a closed-form function `nth_fib()` to get it.  This is toy examples
>> where *theoretically* a new magic method could be used, but it's not close
>> to a use case that would motivate changing the language.
>>
>> ```
>>> def get_tcp_headers(stream: Iterator[Byte]):
>>>     while stream:
>>>         # Move to the total length field of the IP header
>>>         stream.advance(2)
>>>         # record the total length (two bytes)
>>>         total_length = ...
>>>         # skip the rest of IP header
>>>         stream.advance(28)
>>>         # record the TCP header
>>>         header = ...
>>>         yield header
>>>         stream.advance(total_length - 32 - len(header))
>>> ```
>>>
>>
>> This is BARELY more plausible as a real-world case.  Throwing away 28
>> bytes with a bunch of next() calls is completely trivial in time.  A case
>> where some implementation could conceivably save measurable time would
>> require skipping 100s of thousands or millions of next() calls... and
>> probably calls that actually took some work to compute to matter, even
>> there.
>>
>> What you'd need to motivate the new API is a case where you might skip a
>> million items in an iterator, and yet the million-and-first item is
>> computable without computing all the others.  Ideally something where each
>> of those million calls does something more than just copy a byte from a
>> kernel buffer.
>>
>> I don't know that such a use case does not exist, but nothing comes to my
>> mind, and no one has suggested one in this thread.  Otherwise,
>> itertools.islice() completely covers the situation already.
>>
>> --
>> The dead increasingly dominate and strangle both the living and the
>> not-yet born.  Vampiric capital and undead corporate persons abuse
>> the lives and control the thoughts of homo faber. Ideas, once born,
>> become abortifacients against new conceptions.
>>
>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ERVDXDBMAPZYVCB36NS2SUEINURVUZBA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Method to efficiently advance iterators for sequences that support random access

Reply via email to