Re: Optimizing list processing

Terry Reedy Thu, 12 Dec 2013 10:43:01 -0800

On 12/12/2013 7:08 AM, Steven D'Aprano wrote:

Please don't focus on the algorithm I gave. Focus on the fact that I
could write it like this:


if some condition to do with the computer's available memory:
      make modifications in place
else:
      make a copy of the data containing the modifications

if only I knew what that condition was and how to test for it.

In other words, you want a magic formula that depends on just abouteverything: the (static) memory in the machine, the other programsrunning, the memory otherwise used by the current program, the numberitems, the nature of the items, and possibly the memory fragmentation state.

You may be right in this particular case, I don't know I haven't tried
it, but there are cases where list comps are significantly faster than
generator expressions.


Stefan noted that neither is needed if zip produces the items wanted.

I'm not terribly fussed about micro-optimizations here. I'm concerned
about *macro-optimizing* the case where creating two (or more) lists
forces the computer to page memory in and out of virtual memory. Saving a
few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about.

Why would you spend an hour or more of your time to save 5 seconds?Theoretical interest or a practical problem with enough runs to turnsseconds into hours or days? A non-trivial practical answer will requiremore info about the practical context, including the distribution ofmachine memory sizes and of problem sizes.

I don't want to pre-allocate the list comp. I want to avoid having to
have two LARGE lists in memory at the same time, one containing the
decorated values, and one not. I know that this is a good optimization
for large enough lists. On my computer, ten million items is sufficient
to demonstrate the optimization, and with sufficient testing I could
determine a rough cut-off value, below which list comps are more
efficient and above which in-place modifications are better.

But I don't know how to generalize that cut-off value. If I buy extra
RAM, the cut-off value will shift. If I run it on another computer, it
will shift.

The simplest answer is to always avoid catastrophe by always modifyingthe one list. For 'short' lists, the time difference will be relativelysmall.

If the fraction of short lists is large enough to make the cumulativetime difference large enough, and the list object sizes are more or lessconstant or at least bounded, a possible answer is to pick a minimummemory size, get a machine with that size, and work out a list sizesmall enough to avoid thrashing.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: Optimizing list processing

Reply via email to