Re: Optimisation Hints (dict processing and strings)
OPQ wrote: for (2): for k in hash.keys()[:]: # Note : Their may be a lot of keys here if len(hash[k])<2: del hash[k] - use the dict.iter* methods to prevent building a list in memory. You shouldn't use these values directly to delete the entry as this could break the iterator: for key in [k for (k, v) in hash.iteritems () if len (v) < 2]: del hash (key) I gonna try, but think that would be overkill: a whole list has to be computed ! Yes, but it is smaller than the list returned by hash.keys(), so it should be a win over what you were doing originally. Plus it avoids a lookup (hash[k]) which may improve the speed also. BTW I have long assumed that iterating key, value pairs of a dict using iteritems() is faster than iterating with keys() followed by a lookup, since the former method should be able to avoid actually hashing the key and looking it up. I finally wrote a test, and my assumption seems to be correct; using iteritems() is about 1/3 faster for simple keys. Here is a simple test: ## d = dict((i, i) for i in range(1)) def withItems(d): for k,v in d.iteritems(): pass def withKeys(d): for k in d: d[k] from timeit import Timer for fn in [withItems, withKeys]: name = fn.__name__ timer = Timer('%s(d)' % name, 'from __main__ import d, %s' % name) print name, timer.timeit(1000) ## I get withItems 0.980311184801 withKeys 1.37672944466 Kent -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
OPQ wrote: - Try if it isn't faster to iterate using items instead of iterating over keys items are huge lists of numbers. keys are simple small strings. And even if it is faster, how can I find the key back, in order to delete it ? for v in hashh.items(): if len(v)<2: del ??? To elaborate on the memory requirements of .keys () vs. items (): .keys () creates a new list of n objects. The objects are additional references to the existing keys. .items () creates also a new list of n objects. These objects are tuples of references, one to the key and one to the value. Only references are used so it doesn't matter how large the value actually is. Whether the tuples are created for the items () call or already exist depends on the implementation of the dictionary. Trying to infer this by using sys.getrefcount got me inconclusive results. I gonna try, but think that would be overkill: a whole list has to be computed ! Maybe whith genexps .. for key in (k for (k,v) in hash.iteritems() if len(v)<2) Using only iterators has problems: for k,v in hash.iteritems (): if len (v) < 2: del hash [k] You are changing hash while you iterate over it, this very often breaks the iterator. If you are memory bound, maybe a dabase like SQLite is really the way to go. Or you could write the keys to a remporary file in the loop and then write a second loop that reads the keys and deletes them from hash. Daniel -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
On 29 Mar 2005 23:41:15 -0800, [EMAIL PROTECTED] (OPQ) wrote: [...] >> > for k in hash.keys()[:]: # Note : Their may be a lot of keys here >> >if len(hash[k])<2: >> > del hash[k] >> >> - Try if it isn't faster to iterate using items instead of iterating >> over keys > >items are huge lists of numbers. keys are simple small strings. And >even if it is faster, how can I find the key back, in order to delete >it ? items() returns a list of (k,v) tuples, so you can unpack them into the for assignment target, e.g., for k,v in hashh.items(): if len(v)<2: del hassh[k] >for v in hashh.items(): >if len(v)<2: > del ??? > but to avoid a big temporary items() list, perhaps (untested) for k in [k for k,v in hashh.iteritems() if len(v)<2]: del hassh[k] UIAM that should build a temporary safe list only of the keys selected for deletion. Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
I posted a question about string concatination just last week. There is plenty of discussion on it, if you just search the group. -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
#For the record, I'm not on premature optimisation anymore. #The code is working. I just want to save hours of computing, without relying to much on C extensions. #Nevertheless, thansk for tips, clarifications and explanations. > longone=longone + char # where len(char)== 1 > > > > I known that string concatenation is time consuming, but a small test > > on timeit seems to show that packing and creating an array for those 2 > > elements is equally time consuming > > - use cStringIO instead > - or append all chars to a list and do "".join (listvar) Yes, but as Peter said using a list won't be useful for a single operation. And I have to compute the string at each step of the loop. I think listvar won't be useful. But I'm going to try cStringIO. > > > > > for (2): > > for k in hash.keys()[:]: # Note : Their may be a lot of keys here > >if len(hash[k])<2: > > del hash[k] > > - Try if it isn't faster to iterate using items instead of iterating > over keys items are huge lists of numbers. keys are simple small strings. And even if it is faster, how can I find the key back, in order to delete it ? for v in hashh.items(): if len(v)<2: del ??? > - use the dict.iter* methods to prevent building a list in memory. You > shouldn't use these values directly to delete the entry as this could > break the iterator: > > for key in [k for (k, v) in hash.iteritems () if len (v) < 2]: > del hash (key) > I gonna try, but think that would be overkill: a whole list has to be computed ! Maybe whith genexps .. for key in (k for (k,v) in hash.iteritems() if len(v)<2) > This of course builds a list of keys to delete, which could also be large. > Yes. Which tell me to use genexps. > - also: hash.keys()[:] is not necessary, hash.keys () is already a copy > Yes I got that one, but too late ! Thanks ! -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
OPQ wrote: > (2)- in a dict mapping a key to a list of int, remove every entrie > where the list of int have of length < 2 > > > So far, my attempts are > for (2): > for k in hash.keys()[:]: # Note : Their may be a lot of keys here >if len(hash[k])<2: > del hash[k] > > > Here again, I think the hash.keys duplication can be time *and* memory > consuming. But still better than (I suppose - no time it yet) > hash=dict([(k,v) for (k,v) in hash if len(v)>1]) Firstly, don't call it "hash"; you are shadowing a built-in function -- as well as giving clues to your background :-) Secondly, you are *triplicating* the keys. Mapping.keys() returns a list. You can safely iterate over that to change the contents of the mapping. Read the docs: "a.keys() a copy of a's list of keys". Mapping.keys()[:] gives you a copy of the copy. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
RE: Optimisation Hints (dict processing and strings)
Thanks for the correction. I didn't pause to think as I wrote that... -Peter > -Original Message- > From: Aaron Bingham [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 29, 2005 11:24 > To: Peter Hansen > Cc: python-list@python.org > Subject: Re: Optimisation Hints (dict processing and strings) > > Peter Hansen <[EMAIL PROTECTED]> writes: > > > You've misunderstood the comments about this area. > > String concatenation is *not* "time consuming". > > *Repeated* concatenations *will become very time > consuming*, along an > > exponential curve. That's what the discussions about O(n^2) are > > referring to. > > For the record, O(n^2) is /not/ exponential, but polynomial. > A function with exponential complexity would have, e.g. > O(A^n) for some constant A, which would be /much/ worse than > the behavior of repeated string concatenation. > > Regards, > > -- > > Aaron Bingham > Software Engineer > Cenix BioScience GmbH > > > -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
Peter Hansen <[EMAIL PROTECTED]> writes: > You've misunderstood the comments about this area. > String concatenation is *not* "time consuming". > *Repeated* concatenations *will become very time > consuming*, along an exponential curve. That's > what the discussions about O(n^2) are referring > to. For the record, O(n^2) is /not/ exponential, but polynomial. A function with exponential complexity would have, e.g. O(A^n) for some constant A, which would be /much/ worse than the behavior of repeated string concatenation. Regards, -- Aaron Bingham Software Engineer Cenix BioScience GmbH -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
OPQ wrote: for (1): longone=longone + char # where len(char)== 1 I known that string concatenation is time consuming, but a small test on timeit seems to show that packing and creating an array for those 2 elements is equally time consuming - use cStringIO instead - or append all chars to a list and do "".join (listvar) for (2): for k in hash.keys()[:]: # Note : Their may be a lot of keys here if len(hash[k])<2: del hash[k] Here again, I think the hash.keys duplication can be time *and* memory consuming. But still better than (I suppose - no time it yet) hash=dict([(k,v) for (k,v) in hash if len(v)>1]) - Try if it isn't faster to iterate using items instead of iterating over keys - use the dict.iter* methods to prevent building a list in memory. You shouldn't use these values directly to delete the entry as this could break the iterator: for key in [k for (k, v) in hash.iteritems () if len (v) < 2]: del hash (key) This of course builds a list of keys to delete, which could also be large. - also: hash.keys()[:] is not necessary, hash.keys () is already a copy Daniel -- http://mail.python.org/mailman/listinfo/python-list
Re: Optimisation Hints (dict processing and strings)
OPQ wrote: I'd happy to have you share some thougts about ultimate optimisations on those 2 topics: (1)- adding one caractere at the end of a string (may be long) longone=longone + char # where len(char)== 1 I known that string concatenation is time consuming, but a small test on timeit seems to show that packing and creating an array for those 2 elements is equally time consuming You've misunderstood the comments about this area. String concatenation is *not* "time consuming". *Repeated* concatenations *will become very time consuming*, along an exponential curve. That's what the discussions about O(n^2) are referring to. A single string concatenation is definitely going to be faster than any amount of converting to other data structures and such. Basically, to concatenate two strings, Python simply adds their sizes together and allocates a new memory area of sufficient size, then copies the bytes from each string to the new area. Not much overhead there, for a single concatenation. The problem comes when you try to scale this. At some point, the overhead of allocating and deallocating all the intermediate strings becomes much larger than other approaches would take. For example, by building a list with references to all the strings, then using join(), you save on the intermediate steps. Join performs the length-summation and memory allocation steps only once, and copies each individual string into the final area only once. Much faster, provided the overhead of growing a list as you append() string references is small enough. And it is, since growing a list is not O(n^2) in Python (though I don't recall exactly what it is). The key with this stuff is to realize that neither approach is inherently faster: it depends on how much concatenation you are doing. It could be that concatenating up to four strings is faster than using the append/join technique, or it could be that it's faster up to twenty strings. When you're not talking hundreds of operations, and when you're not talking about actual *measured* performance problems (in other words, when we would call it "premature optimization"), focus on the readability of the code and on keeping it simple and clean. Worry about the correctness of the code, and leave optimization worries till later. (But understand this "big O" stuff well enough to be able to avoid the real problem areas, when possible.) -Peter -- http://mail.python.org/mailman/listinfo/python-list