On 12/11/2013 6:54 PM, Steven D'Aprano wrote:
I have some code which produces a list from an iterable using at least
one temporary list, using a Decorate-Sort-Undecorate idiom.

It is a non-standard version thereof, as DSU usually means to decorate with a key that gets discarded.

A couple of examples of input and expected output would have been good ;-).

The algorithm looks something like this (simplified):

table = sorted([(x, i) for i,x in enumerate(iterable)])

This makes two temporaries when only one is needed, and shows why we added generator expressions.

table = sorted((x, i) for i,x in enumerate(iterable))
is equivalent to the table; table.sort lines below.

The following (untested) should be faster as it avoids tuple unpacking and repacking.

from itertools import count
table = sorted(t for t in zip(iterable, count))

> table = [i for x,i in table]

Did your original un-simplified use zip instead enumerate? Table now has the original index of the items in iterable sorted by the items value. The use for this is not obvious.

The problem here is that for large iterables, say 10 million items or so,
this is *painfully* slow, as my system has to page memory like mad to fit
two large lists into memory at once. So I came up with an in-place
version that saves (approximately) two-thirds of the memory needed.

With 1/3 saved by using a genex, it saves 1/2 of the remainder.

table = [(x, i) for i,x in enumerate(iterable)]
table.sorted()

(You meant table.sort().) This is an expansion of sorted(genex). It might be slightly faster as list comp may be faster than list(equivalent genex).

for x, i in table:
     table[i] = x

I cannot understand what you are aiming for here. Besides the fact that this does not work, it keeps x values rather than i indexes as before.

> table = [i for x,i in table]

done in place is

for j, (x,i) in enumerate(table):
  table[j] = i

I expect the list comp be faster than in-place as long as the result list can be allocated and held in memory without paging. (This of course depends on system memory and other memory uses.) List iterators have a __length_hint__ method giving the length of the underlying list, so the list comp result list can potentially be allocated once and then filled in by enumeration and replacement, but in C rather than Python code.

--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to