On 12/12/2013 7:08 AM, Steven D'Aprano wrote:
Please don't focus on the algorithm I gave. Focus on the fact that I
could write it like this:
if some condition to do with the computer's available memory:
make modifications in place
else:
make a copy of the data containing the modifications
if only I knew what that condition was and how to test for it.
In other words, you want a magic formula that depends on just about
everything: the (static) memory in the machine, the other programs
running, the memory otherwise used by the current program, the number
items, the nature of the items, and possibly the memory fragmentation state.
You may be right in this particular case, I don't know I haven't tried
it, but there are cases where list comps are significantly faster than
generator expressions.
Stefan noted that neither is needed if zip produces the items wanted.
I'm not terribly fussed about micro-optimizations here. I'm concerned
about *macro-optimizing* the case where creating two (or more) lists
forces the computer to page memory in and out of virtual memory. Saving a
few milliseconds? I don't care. Saving 5 or 6 seconds? That I care about.
Why would you spend an hour or more of your time to save 5 seconds?
Theoretical interest or a practical problem with enough runs to turns
seconds into hours or days? A non-trivial practical answer will require
more info about the practical context, including the distribution of
machine memory sizes and of problem sizes.
I don't want to pre-allocate the list comp. I want to avoid having to
have two LARGE lists in memory at the same time, one containing the
decorated values, and one not. I know that this is a good optimization
for large enough lists. On my computer, ten million items is sufficient
to demonstrate the optimization, and with sufficient testing I could
determine a rough cut-off value, below which list comps are more
efficient and above which in-place modifications are better.
But I don't know how to generalize that cut-off value. If I buy extra
RAM, the cut-off value will shift. If I run it on another computer, it
will shift.
The simplest answer is to always avoid catastrophe by always modifying
the one list. For 'short' lists, the time difference will be relatively
small.
If the fraction of short lists is large enough to make the cumulative
time difference large enough, and the list object sizes are more or less
constant or at least bounded, a possible answer is to pick a minimum
memory size, get a machine with that size, and work out a list size
small enough to avoid thrashing.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list