Re: preferring [] or () in list of error codes?
Albert van der Horst writes: > But I greatly prefer a set > > " > for i in {point1,point2,point3}: > statements > " Agreed, for the reasons you cite. I think this idiom can be expected to become more common and hopefully displace using a tuple literal or list literal, as the set literal syntax becomes more reliably available on arbitrary installed Python versions. > [Yes I know { } doesn't denote a set. I tried it. I don't know how to > denote a set ... ] Try it in Python 3 and be prepared to be pleased http://docs.python.org/3.0/whatsnew/3.0.html#new-syntax>. -- \ “Too many Indians spoil the golden egg.” —Sir Joh | `\ Bjelke-Petersen | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
In article , > > >But practicality beats purity -- there are many scenarios where we make >compromises in our meaning in order to get correct, efficient code. E.g. >we use floats, despite them being a poor substitute for the abstract Real >numbers we mean. > >In addition, using a tuple or a list in this context: > >if e.message.code in (25401,25402,25408): > >is so idiomatic, that using a set in it's place would be distracting. >Rather that efficiently communicating the programmer's intention, it >would raise in my mind the question "that's strange, why are they using a >set there instead of a tuple?". As a newby I'm really expecting a set here. The only reason my mind goes in the direction of 3 items is that it makes no sense in combination with ``in''. That makes this idiom one that should be killed. " point1 = (0,1,0) point2 = (1,0,0) point3 = (0,0,1) for i in (point1,point2, point3): " ??? I don't think so. At least I would do " for i in [point1,point2,point3]: statements " But I greatly prefer a set " for i in {point1,point2,point3}: statements " Because a set is unorderded, this would convey to the the compiler that it may evaluate the three statements concurrently. For a list I expect the guarantee that the statements are evaluated in order. For a tuple I don't know what to expect. That alone is sufficient reason not to use it here. [Yes I know { } doesn't denote a set. I tried it. I don't know how to denote a set ... ] >-- >Steven Groetjes Albert. -- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. alb...@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Mel writes: > The immutability makes it easier to talk about the semantic meanings. > After you do > > event_timestamp = (2009, 06, 04, 05, 02, 03) > there's nothing that can happen to the tuple to invalidate > > (year, month, day, hour, minute, second) = event_timestamp > even though, as you say, there's nothing in the tuple to inform anybody > about the year, month, day, ... interpretation. Also note that the stdlib ‘collections.namedtuple’ implementation http://docs.python.org/library/collections.html#collections.namedtuple> essentially acknowledges this: the names are assigned in advance to index positions, tying a specific semantic meaning to each position. -- \ “Prediction is very difficult, especially of the future.” | `\ —Niels Bohr | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Gunter Henriksen wrote: [ ... ] > I guess to me, fundamentally, the interpretation of > tuple as a sequence whose elements have semantic meaning > implicitly defined by position is a relatively abstract > intrepretation whose value is dubious relative to the > value of immutability, since it seems like a shortcut > which sacrifices explicitness for the sake of brevity. The immutability makes it easier to talk about the semantic meanings. After you do > event_timestamp = (2009, 06, 04, 05, 02, 03) there's nothing that can happen to the tuple to invalidate > (year, month, day, hour, minute, second) = event_timestamp even though, as you say, there's nothing in the tuple to inform anybody about the year, month, day, ... interpretation. And of course there's nothing in a C struct object that isn't in the equivalent Python tuple. The difference is that the C compiler has arranged all the outside code that uses the struct object to use it in the correct way. The only object I've found in Python that truly replaces a struct object in C is a dict with string keys -- or an object that uses such a dict as its __dict__. Mel. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
> > >event_timestamp = (2009, 06, 04, 05, 02, 03) > > >(year, month, day, hour, minute, second) = event_timestamp > > > > [...] > > The point of each position having a different semantic meaning is that > tuple unpacking works as above. You need to know the meaning of each > position in order to unpack it to separate names, as above. > > So two tuples that differ only in the sequence of their items are > different in meaning. This is unlike a list, where the sequence of items > does *not* affect the semantic meaning of each item. I do not feel the above is significantly different enough from event_timestamp = [2009, 06, 04, 05, 02, 03] (year, month, day, hour, minute, second) = event_timestamp event_timestamp = (2009, 06, 04, 05, 02, 03) (year, month, day, hour, minute, second) = event_timestamp event_timestamp = [2009, 06, 04, 05, 02, 03] [year, month, day, hour, minute, second] = event_timestamp to suggest tuples are really adding significant value in this case, especially when I can do something like event_timestamp = (2009, 06, 04, 05, 02, 03) (year, month, day, hour, second, minute) = event_timestamp and not have any indication I have done the wrong thing. I guess to me, fundamentally, the interpretation of tuple as a sequence whose elements have semantic meaning implicitly defined by position is a relatively abstract intrepretation whose value is dubious relative to the value of immutability, since it seems like a shortcut which sacrifices explicitness for the sake of brevity. I would feel differently if seemed unusual to find good Python code which iterates through the elements of a tuple as a variable length homogenous ordered collection. But then I would be wishing for immutable lists... -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Gunter Henriksen writes: > > Try, then, this tuple: > > > >event_timestamp = (2009, 06, 04, 05, 02, 03) > >(year, month, day, hour, minute, second) = event_timestamp > > I totally agree about anything to do with immutability, I think the > relative ordering of the elements in this example may be orthogonal to > the concept of a tuple as an object whose elements have a semantic > meaning implicitly defined by location in the sequence... in other > words knowing that element i+1 is in some sense ordinally smaller than > element i does not give me much information about what element i+1 > actually is. The point of each position having a different semantic meaning is that tuple unpacking works as above. You need to know the meaning of each position in order to unpack it to separate names, as above. So two tuples that differ only in the sequence of their items are different in meaning. This is unlike a list, where the sequence of items does *not* affect the semantic meaning of each item. Note that I'm well aware that the language doesn't impose this as a hard restriction; but that says more about Python's “consenting adults” philosophy than anything else. -- \ “I went to a general store. They wouldn't let me buy anything | `\ specifically.” —Steven Wright | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
> Try, then, this tuple: > >event_timestamp = (2009, 06, 04, 05, 02, 03) >(year, month, day, hour, minute, second) = event_timestamp > > A list would be wrong for this value, because each position in the > sequence has a specific meaning beyond its mere sequential position. Yet > it also matters to the reader that these items are in a specific > sequence, since that's a fairly standard ordering for those items. > > In this case, a tuple is superior to a list because it correctly conveys > the semantic meaning of the overall value: the items must retain their > sequential order to have the intended meaning, and to alter any one of > them is conceptually to create a new timestamp value. I totally agree about anything to do with immutability, I think the relative ordering of the elements in this example may be orthogonal to the concept of a tuple as an object whose elements have a semantic meaning implicitly defined by location in the sequence... in other words knowing that element i+1 is in some sense ordinally smaller than element i does not give me much information about what element i+1 actually is. To me a timestamp could be (date, time), or (days, seconds, microseconds) (as in datetime.timedelta()), so it is not clear to me that using a tuple as something where the semantic meaning of the element at position i should readily apparent would be the best approach for timestamps, or enough to distinguish list and tuple (in other words I am not suggesting a dict or class). In the case of something like (x, y) or (real, imag), or (longitude, latitude), or any case where there is common agreement and understanding, such that using names is arguably superfluous... I think in those cases the concept makes sense of a tuple as a sequence of attributes whose elements have a semantic meaning implicitly defined by position in the sequence. My feeling is the number of cases where tuples are better than lists for that is small relative to the number of cases where tuple adds value as an immutable list. I do not mean to be suggesting that a tuple should only ever be used or thought of as a "frozenlist" though. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Gunter Henriksen writes: > I think I would have difficulty holding a position that this should > not be a class (or equivalent via namedtuple()) or a dict. It seems to > me like a case could be made that there are far more situations where > it makes sense to use tuples as immutable sequences than as objects > whose attributes are named implicitly by an index. This dodge_city > definitely does not seem to me like a good candidate for a plain > tuple. It's a fair cop. (I only meant that for this example a tuple was superior to a list, but you're right that a dict would be better than either.) Try, then, this tuple: event_timestamp = (2009, 06, 04, 05, 02, 03) (year, month, day, hour, minute, second) = event_timestamp A list would be wrong for this value, because each position in the sequence has a specific meaning beyond its mere sequential position. Yet it also matters to the reader that these items are in a specific sequence, since that's a fairly standard ordering for those items. In this case, a tuple is superior to a list because it correctly conveys the semantic meaning of the overall value: the items must retain their sequential order to have the intended meaning, and to alter any one of them is conceptually to create a new timestamp value. -- \ “[The RIAA] have the patience to keep stomping. They're playing | `\ whack-a-mole with an infinite supply of tokens.” —kennon, | _o__) http://kuro5hin.org/ | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
> [In this tuple] >dodge_city = (1781, 1870, 1823) >(population, feet_above_sea_level, establishment_year) = dodge_city > each index in the sequence implies something very > different about each value. The semantic meaning > of each index is *more* than just the position in > the sequence; it matters *for interpreting that > component*, and that component would not mean the > same thing in a different index position. A tuple > is the right choice, for that reason. I think I would have difficulty holding a position that this should not be a class (or equivalent via namedtuple()) or a dict. It seems to me like a case could be made that there are far more situations where it makes sense to use tuples as immutable sequences than as objects whose attributes are named implicitly by an index. This dodge_city definitely does not seem to me like a good candidate for a plain tuple. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
En Tue, 09 Jun 2009 05:02:33 -0300, Steven D'Aprano escribió: [...] As tuples are defined in Python, they quack like immutable lists, they walk like immutable lists, and they swim like immutable lists. Why shouldn't we treat them as immutable lists? Phillip Eby states that "Lists are intended to be homogeneous sequences, while tuples are heterogeneous data structures." (Notice the subtle shift there: lists are "intended", while tuples "are". But in fact, there's nothing to stop you from putting homogeneous data into a tuple, so Eby is wrong to say that tuples *are* heterogeneous.) Perhaps Eby intends lists to be homogeneous, perhaps Guido does too, but this is Python, where we vigorously defend the right to shoot ourselves in the foot. We strongly discourage class creators from trying to enforce their intentions by using private attributes, and even when we allow such a thing, the nature of Python is that nothing is truly private. Why should homogeneity and heterogeneity of lists and tuples be sacrosanct? Nothing stops me from putting hetereogeneous data into a list, or homogeneous data into a tuple, and there doesn't appear to be any ill- effects from doing so. Why give lose sleep over the alleged lack of purity? Yes - but in the past the distinction was very much stronger. I think that tuples didn't have *any* method until Python 2.0 -- so, even if someone could consider a tuple a "read-only list", the illusion disappeared as soon as she tried to write anything more complex that a[i]. Maybe tuples could quack like immutable lists, but they could not swim nor walk... With time, tuples gained more and more methods and are now very similar to lists - they even have an index() method (undocumented but obvious) which is absurd in the original context. Think of tuples as used in relational databases: there is no way in SQL to express the condition "search for this along all values in this tuple", because it usually doesn't make any sense at all (and probably, if it does make sense in a certain case, it's because the database is badly designed.) But *now*, you can express that operation in Python. So I'd say that *now*, the distinction between an "homogeneous container" vs "heterogeneous data structure" has vanished a lot, and it's hard to convince people that tuples aren't just immutable lists. That is, *I* would have used a list in this case: for delay in (0.01, 0.1, 0.5, 1, 2, 5, 10, 30, 60): do_something(delay) but I cannot find a *concrete* reason to support the assertion "list is better". So, for practical purposes, tuples act now as if they were immutable lists -- one should be aware of the different memory allocation strategies, but I see no other relevant differences. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Tue, 09 Jun 2009 04:57:48 -0700, samwyse wrote: > Time to test things! I'm going to compare three things using Python > 3.0: > X={...}\nS=lambda x: x in X > S=lambda x: x in {...} > S=lambda x: x in (...) > where the ... is replaced by lists of integers of various lengths. > Here's the test bed: [snip] Hmmm... I think your test-bed is unnecessarily complicated, making it difficult to see what is going on. Here's my version, with lists included for completeness. Each test prints the best of five trials of one million repetitions of ten successful searches, then does the same thing again for unsuccessful searches. from timeit import Timer def test(size): global s, l, t, targets print("Testing search with size %d" % size) rng = range(size) s, l, t = set(rng), list(rng), tuple(rng) # Calculate a (more or less) evenly distributed set of ten # targets to search for, including both end points. targets = [i*size//9 for i in range(9)] + [size-1] assert len(targets) == 10 setup = "from __main__ import targets, %s" body = "for i in targets: i in %s" # Run a series of successful searches. for name in "s l t".split(): obj = globals()[name] secs = min(Timer(body % name, setup % name).repeat(repeat=5)) print("Successful search in %s: %f s" % (type(obj), secs)) # Also run unsuccessful tests. targets = [size+x for x in targets] for name in "s l t".split(): obj = globals()[name] secs = min(Timer(body % name, setup % name).repeat(repeat=5)) print("Unsuccessful search in %s: %f s" % (type(obj), secs)) Results are: >>> test(1) Testing search with size 1 Successful search in : 1.949509 s Successful search in : 1.838387 s Successful search in : 1.876309 s Unsuccessful search in : 1.998207 s Unsuccessful search in : 2.148660 s Unsuccessful search in : 2.137041 s >>> >>> >>> test(10) Testing search with size 10 Successful search in : 1.943664 s Successful search in : 3.659786 s Successful search in : 3.569164 s Unsuccessful search in : 1.935553 s Unsuccessful search in : 5.833665 s Unsuccessful search in : 5.573177 s >>> >>> >>> test(100) Testing search with size 100 Successful search in : 1.907839 s Successful search in : 21.704032 s Successful search in : 21.391875 s Unsuccessful search in : 1.916241 s Unsuccessful search in : 41.178029 s Unsuccessful search in : 41.856226 s >>> >>> >>> test(1000) Testing search with size 1000 Successful search in : 2.256150 s Successful search in : 189.991579 s Successful search in : 187.349630 s Unsuccessful search in : 1.869202 s Unsuccessful search in : 398.451284 s Unsuccessful search in : 388.544178 s As expected, lists and tuples are equally as fast (or slow if you prefer). Successful searches are about twice as fast as unsuccessful ones, and performance suffers as the size of the list/tuple increases. However, sets are nearly just as fast no matter the size of the set, or whether the search is successfully or unsuccessful. > You will note that testing against a list constant is just as fast as > testing against a set. This was surprising for me; apparently the > __contains__ operator turns a tuple into a set. I doubt that very much. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
m...@pixar.com wrote: John Machin wrote: T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T) I've learned a lot from this thread, but this is the niftiest bit I've picked up... thanks! If you are doing a lot of dissing, starting with from dis import dis saves subsequent typing. tjr -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Steven D'Aprano wrote: James Tauber explains this at http://jtauber.com/blog/2006/04/15/ python_tuples_are_not_just_constant_lists/>. He doesn't really explain anything though, he merely states it as revealed wisdom. The closest he comes to an explanation is to declare that in tuples "the index in a tuple has an implied semantic. The point of a tuple is that the i-th slot means something specific. In other words, it's a index-based (rather than name based) datastructure." But he gives no reason for why we should accept that as true for tuples but not lists. It may be that that's precisely the motivation Guido had when he introduced tuples into Python, but why should we not overload tuples with more meanings than Guido (hypothetically) imagined? In other words, why *shouldn't* we treat tuples as immutable lists, if that helps us solve a problem effectively? I believe that we should overload tuples with *less* specific meaning than originally. In 3.0, tuples have *all* the general sequence operations and methods, including .index() and .count(). This was not true in 2.5 (don't know about 2.6), which is why tuples are yet not documented as having those two methods (reported in http://bugs.python.org/issue4966 ). Operationally, they are now general immutable sequences. Period. Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
John Machin wrote: > T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T) I've learned a lot from this thread, but this is the niftiest bit I've picked up... thanks! -- Mark Harrison Pixar Animation Studios -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 9, 8:20 am, samwyse wrote: > On Jun 9, 12:30 am, Emile van Sebille wrote: > > > On 6/8/2009 8:43 PM Ben Finney said... > > > The fact that literal set syntax is a relative newcomer is the primary > > > reason for that, I'd wager. > > > Well, no. It really is more, "that's odd... why use set?" > > Until I ran some timing tests this morning, I'd have said that sets > could determine membership faster than a list, but that's apparently > not true, See my reply to that post. I believe your tests were flawed. > assuming that the list has less than 8K members. Above 16K > members, sets are much faster than lists. I'm not sure where the > break is, or even why there's a break. The break comes from the compiler, not the objects themselves. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 9, 4:57 am, samwyse wrote: > On Jun 8, 8:57 pm, samwyse wrote: > > > I conclude that using constructors is generally a bad idea, since the > > compiler doesn't know if you're calling the builtin or something with > > an overloaded name. I presume that the compiler will eventually > > optimize the second example to match the last, but both of them use > > the BUILD_SET opcode. I expect that this can be expensive for long > > lists, so I don't think that it's a good idea to use set constants > > inside loops. Instead it should be assigned to a global or class > > variable. > > Time to test things! I'm going to compare three things using Python > 3.0: > X={...}\nS=lambda x: x in X > S=lambda x: x in {...} > S=lambda x: x in (...) > where the ... is replaced by lists of integers of various lengths. > Here's the test bed: > > from random import seed, sample > from timeit import Timer > maxint = 2**31-1 > values = list(map(lambda n: 2**n-1, range(1,16))) > def append_numbers(k, setup): > seed(1968740928) > for i in sample(range(maxint), k): > setup.append(str(i)) > setup.append(',') > print('==', 'separate set constant') > for n in values[::2]: > print('===', n, 'values') > setup = ['X={'] > append_numbers(n, setup) > setup.append('}\nS=lambda x: x in X') > t = Timer('S(88632719)', ''.join(setup)) > print(t.repeat()) > print('==', 'in-line set constant') > for n in values[:4]: > print('===', n, 'values') > setup = ['S=lambda x: x in {'] > append_numbers(n, setup) > setup.append('}') > t = Timer('S(88632719)', ''.join(setup)) > print(t.repeat()) > print('==', 'in-line list constant') > for n in values: > print('===', n, 'values') > setup = ['S=lambda x: x in ('] > append_numbers(n, setup) > setup.append(')') > t = Timer('S(88632719)', ''.join(setup)) > print(t.repeat()) It looks like you are evaluating the list/set/tuple every pass, and then, for lists and tuples, always indexing the first item. > And here are the results. There's something interesting at the very > end. [snip results showing virtually identical performance for list, set, and tuple] > You will note that testing against a list constant is just as fast as > testing against a set. This was surprising for me; apparently the > __contains__ operator turns a tuple into a set. Given the way you wrote the test it this is hardly surprising. I would expect "item in list" to have comparable execution time to "item in set" if item is always the first element in list. Furthermore, the Python compiler appears to be optimizing this specific case to always use a precompiled set. Well, almost always > You will also note > that performance to fall off drastically for the last set of values. > I'm not sure what happens there; I guess I'll file a bug report. Please don't; it's not a bug. The slowdown is because at sizes above a certain threshold the Python compiler doesn't try to precompile in- line lists, sets, and tuples. The last case was above that limit. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 9, 12:30 am, Emile van Sebille wrote: > On 6/8/2009 8:43 PM Ben Finney said... > > The fact that literal set syntax is a relative newcomer is the primary > > reason for that, I'd wager. > > Well, no. It really is more, "that's odd... why use set?" Until I ran some timing tests this morning, I'd have said that sets could determine membership faster than a list, but that's apparently not true, assuming that the list has less than 8K members. Above 16K members, sets are much faster than lists. I'm not sure where the break is, or even why there's a break. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Steven D'Aprano writes: > On Tue, 09 Jun 2009 09:43:45 +1000, Ben Finney wrote: > > > Use a list when the semantic meaning of an item doesn't depend on > > all the other items: it's “only” a collection of values. > > > > Your list of message codes is a good example: if a value appears at > > index 3, that doesn't make it mean something different from the same > > value appearing at index 2. > > That advice would seem to imply that lists shouldn't be ordered. No such implication. Order is important in a list, it just doesn't change the semantic meaning of the value. > If a list of values has an order, it implies that "first place" (index > 0) is different from "second place", by virtue of the positions they > appear in the list. The lists: > > presidential_candidates_sorted_by_votes = ['Obama', 'McCain'] > presidential_candidates_sorted_by_votes = ['McCain', 'Obama'] > > have very different meanings. But the semantic meaning if each value is unchanged: each is still a presidential candidate's surname. The additional semantic meaning of putting it in a list is no more than the position in the sequence. A list is the right choice, for that reason. Whereas, for example, in this tuple: dodge_city = (1781, 1870, 1823) (population, feet_above_sea_level, establishment_year) = dodge_city each index in the sequence implies something very different about each value. The semantic meaning of each index is *more* than just the position in the sequence; it matters *for interpreting that component*, and that component would not mean the same thing in a different index position. A tuple is the right choice, for that reason. -- \ “Are you pondering what I'm pondering?” “Umm, I think so, Don | `\ Cerebro, but, umm, why would Sophia Loren do a musical?” | _o__) —_Pinky and The Brain_ | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 8, 8:57 pm, samwyse wrote: > I conclude that using constructors is generally a bad idea, since the > compiler doesn't know if you're calling the builtin or something with > an overloaded name. I presume that the compiler will eventually > optimize the second example to match the last, but both of them use > the BUILD_SET opcode. I expect that this can be expensive for long > lists, so I don't think that it's a good idea to use set constants > inside loops. Instead it should be assigned to a global or class > variable. Time to test things! I'm going to compare three things using Python 3.0: X={...}\nS=lambda x: x in X S=lambda x: x in {...} S=lambda x: x in (...) where the ... is replaced by lists of integers of various lengths. Here's the test bed: from random import seed, sample from timeit import Timer maxint = 2**31-1 values = list(map(lambda n: 2**n-1, range(1,16))) def append_numbers(k, setup): seed(1968740928) for i in sample(range(maxint), k): setup.append(str(i)) setup.append(',') print('==', 'separate set constant') for n in values[::2]: print('===', n, 'values') setup = ['X={'] append_numbers(n, setup) setup.append('}\nS=lambda x: x in X') t = Timer('S(88632719)', ''.join(setup)) print(t.repeat()) print('==', 'in-line set constant') for n in values[:4]: print('===', n, 'values') setup = ['S=lambda x: x in {'] append_numbers(n, setup) setup.append('}') t = Timer('S(88632719)', ''.join(setup)) print(t.repeat()) print('==', 'in-line list constant') for n in values: print('===', n, 'values') setup = ['S=lambda x: x in ('] append_numbers(n, setup) setup.append(')') t = Timer('S(88632719)', ''.join(setup)) print(t.repeat()) And here are the results. There's something interesting at the very end. == separate set constant === 1 values [0.26937306277753176, 0.26113626173158877, 0.2692190487889] === 7 values [0.26583266867716426, 0.27223543774418268, 0.27681646689732919] === 31 values [0.25089725090758752, 0.25562690230182894, 0.25844625504079444] === 127 values [0.32404313956103392, 0.33048948958596691, 0.34487930728626104] === 511 values [0.27574566041214732, 0.26991838348169983, 0.28309016928129083] === 2047 values [0.27826162263639631, 0.27337357122204065, 0.26888752620793976] === 8191 values [0.27479134917985437, 0.27955955295994261, 0.27740676538498654] === 32767 values [0.26189725230441319, 0.25949247739587022, 0.2537356004743625] == in-line set constant === 1 values [0.43579086168772818, 0.4231755711968983, 0.42178740594125852] === 3 values [0.54712875519095228, 0.55325048295244272, 0.54346991028189251] === 7 values [1.1897654590178366, 1.1763383335032813, 1.2009900699669931] === 15 values [1.7661906750718313, 1.7585005915556291, 1.7405896559478933] == in-line list constant === 1 values [0.23651385860493335, 0.24746972031361381, 0.23778469051234197] === 3 values [0.23710750947396875, 0.23205630883254713, 0.23345592805789295] === 7 values [0.24607764394636789, 0.23551903943099006, 0.24241377046524093] === 15 values [0.2279376289444599, 0.22491908887861456, 0.24076747184349045] === 31 values [0.22860084172708994, 0.233022074034551, 0.23138639128715965] === 63 values [0.23671639831319169, 0.23404259479906031, 0.22269394573891077] === 127 values [0.22754176857673158, 0.22818151468971593, 0.22711154629987718] === 255 values [0.23503126794047802, 0.24493699618247788, 0.26690207833677349] === 511 values [0.24518255811842238, 0.23878118587697728, 0.22844830837438934] === 1023 values [0.23285585179122137, 0.24067220833932623, 0.23807439213642922] === 2047 values [0.24206484343680756, 0.24352201187581102, 0.24366253252857462] === 4095 values [0.24624526301527183, 0.23692145230748807, 0.23829956041899081] === 8191 values [0.22246514570986164, 0.22435309515595137, 0.011456761] === 16383 values [194.29462683106374, 193.21789529116128, 193.25843228678508] === 32767 values You will note that testing against a list constant is just as fast as testing against a set. This was surprising for me; apparently the __contains__ operator turns a tuple into a set. You will also note that performance to fall off drastically for the last set of values. I'm not sure what happens there; I guess I'll file a bug report. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 8, 10:06 pm, Chris Rebert wrote: > On Mon, Jun 8, 2009 at 6:57 PM, samwyse wrote: > > On Jun 8, 7:37 pm, Carl Banks wrote: > >> On Jun 8, 4:43 pm, Ben Finney wrote: > >> > m...@pixar.com writes: > >> > > Is there any reason to prefer one or the other of these statements? > > >> > > if e.message.code in [25401,25402,25408]: > >> > > if e.message.code in (25401,25402,25408): > > >> If you want to go strictly by the book, I would say he ought to be > >> using a set since his collection of numbers has no meaningful order > >> nor does it make sense to list any item twice. > > > As the length of the list increases, the increased speeds of looking > > something up makes using a set makes more sense. But what's the best > > way to express this? Here are a few more comparisons (using Python > > 3.0)... > > S=lambda x:x in set((25401,25402,25408)) > dis(S) > > 1 0 LOAD_FAST 0 (x) > > 3 LOAD_GLOBAL 0 (set) > > 6 LOAD_CONST 3 ((25401, 25402, 25408)) > > 9 CALL_FUNCTION 1 > > 12 COMPARE_OP 6 (in) > > 15 RETURN_VALUE > S=lambda x:x in{25401,25402,25408} > dis(S) > > 1 0 LOAD_FAST 0 (x) > > 3 LOAD_CONST 0 (25401) > > 6 LOAD_CONST 1 (25402) > > 9 LOAD_CONST 2 (25408) > > 12 BUILD_SET 3 > > 15 COMPARE_OP 6 (in) > > 18 RETURN_VALUE > S=lambda x:x in{(25401,25402,25408)} > dis(S) > > 1 0 LOAD_FAST 0 (x) > > 3 LOAD_CONST 3 ((25401, 25402, 25408)) > > 6 BUILD_SET 1 > > 9 COMPARE_OP 6 (in) > > 12 RETURN_VALUE > > > I conclude that using constructors is generally a bad idea, since the > > compiler doesn't know if you're calling the builtin or something with > > an overloaded name. I presume that the compiler will eventually > > optimize the second example to match the last, but both of them use > > the BUILD_SET opcode. I expect that this can be expensive for long > > Erm, unless I misunderstand you somehow, the second example will and > should *never* match the last. > The set {25401,25402,25408}, containing 3 integer elements, is quite > distinct from the set {(25401,25402,25408)}, containing one element > and that element is a tuple. > set(X) != {X}; set([X]) = {X} D'oh! I was thinking about how you can initialize a set from an iterator and for some reason thought that you could do the same with a set constant. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Tue, 09 Jun 2009 09:43:45 +1000, Ben Finney wrote: > Use a list when the semantic meaning of an item doesn't depend on all > the other items: it's “only” a collection of values. > > Your list of message codes is a good example: if a value appears at > index 3, that doesn't make it mean something different from the same > value appearing at index 2. That advice would seem to imply that lists shouldn't be ordered. If a list of values has an order, it implies that "first place" (index 0) is different from "second place", by virtue of the positions they appear in the list. The lists: presidential_candidates_sorted_by_votes = ['Obama', 'McCain'] presidential_candidates_sorted_by_votes = ['McCain', 'Obama'] have very different meanings. Prohibiting the use of lists in the context of ordered data is surely is an unfortunate consequence of your advice. > James Tauber explains this at > http://jtauber.com/blog/2006/04/15/ > python_tuples_are_not_just_constant_lists/>. He doesn't really explain anything though, he merely states it as revealed wisdom. The closest he comes to an explanation is to declare that in tuples "the index in a tuple has an implied semantic. The point of a tuple is that the i-th slot means something specific. In other words, it's a index-based (rather than name based) datastructure." But he gives no reason for why we should accept that as true for tuples but not lists. It may be that that's precisely the motivation Guido had when he introduced tuples into Python, but why should we not overload tuples with more meanings than Guido (hypothetically) imagined? In other words, why *shouldn't* we treat tuples as immutable lists, if that helps us solve a problem effectively? To put it another way, I think the question of whether or not tuples are immutable lists has the answer Mu. Sometimes they are, sometimes they're not. I have no problem with the title of the quoted blog post -- that tuples are not *just* constant lists -- but I do dispute that there is any reason for declaring that tuples must not be used as constant lists. As tuples are defined in Python, they quack like immutable lists, they walk like immutable lists, and they swim like immutable lists. Why shouldn't we treat them as immutable lists? Phillip Eby states that "Lists are intended to be homogeneous sequences, while tuples are heterogeneous data structures." (Notice the subtle shift there: lists are "intended", while tuples "are". But in fact, there's nothing to stop you from putting homogeneous data into a tuple, so Eby is wrong to say that tuples *are* heterogeneous.) Perhaps Eby intends lists to be homogeneous, perhaps Guido does too, but this is Python, where we vigorously defend the right to shoot ourselves in the foot. We strongly discourage class creators from trying to enforce their intentions by using private attributes, and even when we allow such a thing, the nature of Python is that nothing is truly private. Why should homogeneity and heterogeneity of lists and tuples be sacrosanct? Nothing stops me from putting hetereogeneous data into a list, or homogeneous data into a tuple, and there doesn't appear to be any ill- effects from doing so. Why give lose sleep over the alleged lack of purity? -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On 6/8/2009 8:43 PM Ben Finney said... Steven D'Aprano writes: In addition, using a tuple or a list in this context: if e.message.code in (25401,25402,25408): is so idiomatic, that using a set in it's place would be distracting. I think a list in that context is fine, and that's the idiom I see far more often than a tuple. Rather that efficiently communicating the programmer's intention, it would raise in my mind the question "that's strange, why are they using a set there instead of a tuple?". The fact that literal set syntax is a relative newcomer is the primary reason for that, I'd wager. Well, no. It really is more, "that's odd... why use set?" Emile -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Steven D'Aprano writes: > In addition, using a tuple or a list in this context: > > if e.message.code in (25401,25402,25408): > > is so idiomatic, that using a set in it's place would be distracting. I think a list in that context is fine, and that's the idiom I see far more often than a tuple. > Rather that efficiently communicating the programmer's intention, it > would raise in my mind the question "that's strange, why are they > using a set there instead of a tuple?". The fact that literal set syntax is a relative newcomer is the primary reason for that, I'd wager. -- \ “If you are unable to leave your room, expose yourself in the | `\window.” —instructions in case of fire, hotel, Finland | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Tue, 09 Jun 2009 11:02:54 +1000, Ben Finney wrote: > Carl Banks writes: > >> If you want to go strictly by the book, I would say he ought to be >> using a set since his collection of numbers has no meaningful order nor >> does it make sense to list any item twice. > > Yes, a set would be best for this specific situation. > >> I don't think it's very important, however, to stick to rules like that >> for objects that don't live for more than a single line of code. > > It's important to the extent that it's important to express one's > *meaning*. Program code should be written primarily as a means of > communicating with other programmers, and only incidentally for the > computer to execute. But practicality beats purity -- there are many scenarios where we make compromises in our meaning in order to get correct, efficient code. E.g. we use floats, despite them being a poor substitute for the abstract Real numbers we mean. In addition, using a tuple or a list in this context: if e.message.code in (25401,25402,25408): is so idiomatic, that using a set in it's place would be distracting. Rather that efficiently communicating the programmer's intention, it would raise in my mind the question "that's strange, why are they using a set there instead of a tuple?". -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Mon, Jun 8, 2009 at 6:57 PM, samwyse wrote: > On Jun 8, 7:37 pm, Carl Banks wrote: >> On Jun 8, 4:43 pm, Ben Finney wrote: >> > m...@pixar.com writes: >> > > Is there any reason to prefer one or the other of these statements? >> >> > > if e.message.code in [25401,25402,25408]: >> > > if e.message.code in (25401,25402,25408): >> >> If you want to go strictly by the book, I would say he ought to be >> using a set since his collection of numbers has no meaningful order >> nor does it make sense to list any item twice. > > As the length of the list increases, the increased speeds of looking > something up makes using a set makes more sense. But what's the best > way to express this? Here are a few more comparisons (using Python > 3.0)... > S=lambda x:x in set((25401,25402,25408)) dis(S) > 1 0 LOAD_FAST 0 (x) > 3 LOAD_GLOBAL 0 (set) > 6 LOAD_CONST 3 ((25401, 25402, 25408)) > 9 CALL_FUNCTION 1 > 12 COMPARE_OP 6 (in) > 15 RETURN_VALUE S=lambda x:x in{25401,25402,25408} dis(S) > 1 0 LOAD_FAST 0 (x) > 3 LOAD_CONST 0 (25401) > 6 LOAD_CONST 1 (25402) > 9 LOAD_CONST 2 (25408) > 12 BUILD_SET 3 > 15 COMPARE_OP 6 (in) > 18 RETURN_VALUE S=lambda x:x in{(25401,25402,25408)} dis(S) > 1 0 LOAD_FAST 0 (x) > 3 LOAD_CONST 3 ((25401, 25402, 25408)) > 6 BUILD_SET 1 > 9 COMPARE_OP 6 (in) > 12 RETURN_VALUE > > I conclude that using constructors is generally a bad idea, since the > compiler doesn't know if you're calling the builtin or something with > an overloaded name. I presume that the compiler will eventually > optimize the second example to match the last, but both of them use > the BUILD_SET opcode. I expect that this can be expensive for long Erm, unless I misunderstand you somehow, the second example will and should *never* match the last. The set {25401,25402,25408}, containing 3 integer elements, is quite distinct from the set {(25401,25402,25408)}, containing one element and that element is a tuple. set(X) != {X}; set([X]) = {X} Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 8, 2009, at 9:28 PM, Carl Banks wrote: On Jun 8, 6:02 pm, Ben Finney wrote: Carl Banks writes: If you want to go strictly by the book, I would say he ought to be using a set since his collection of numbers has no meaningful order nor does it make sense to list any item twice. Yes, a set would be best for this specific situation. I don't think it's very important, however, to stick to rules like that for objects that don't live for more than a single line of code. It's important to the extent that it's important to express one's *meaning*. Program code should be written primarily as a means of communicating with other programmers, and only incidentally for the computer to execute. Which is precisely why isn't not very important for an object that exists for one line. No programmer is ever going to be confused about the meaning of this: if a in (1,2,3): Actually, I might be -- I think of a tuple first as a single thing, as opposed to a list or map, which I see first as a collection of other things. Charles Yeomans -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 8, 7:37 pm, Carl Banks wrote: > On Jun 8, 4:43 pm, Ben Finney wrote: > > m...@pixar.com writes: > > > Is there any reason to prefer one or the other of these statements? > > > > if e.message.code in [25401,25402,25408]: > > > if e.message.code in (25401,25402,25408): > > If you want to go strictly by the book, I would say he ought to be > using a set since his collection of numbers has no meaningful order > nor does it make sense to list any item twice. As the length of the list increases, the increased speeds of looking something up makes using a set makes more sense. But what's the best way to express this? Here are a few more comparisons (using Python 3.0)... >>> S=lambda x:x in set((25401,25402,25408)) >>> dis(S) 1 0 LOAD_FAST0 (x) 3 LOAD_GLOBAL 0 (set) 6 LOAD_CONST 3 ((25401, 25402, 25408)) 9 CALL_FUNCTION1 12 COMPARE_OP 6 (in) 15 RETURN_VALUE >>> S=lambda x:x in{25401,25402,25408} >>> dis(S) 1 0 LOAD_FAST0 (x) 3 LOAD_CONST 0 (25401) 6 LOAD_CONST 1 (25402) 9 LOAD_CONST 2 (25408) 12 BUILD_SET3 15 COMPARE_OP 6 (in) 18 RETURN_VALUE >>> S=lambda x:x in{(25401,25402,25408)} >>> dis(S) 1 0 LOAD_FAST0 (x) 3 LOAD_CONST 3 ((25401, 25402, 25408)) 6 BUILD_SET1 9 COMPARE_OP 6 (in) 12 RETURN_VALUE I conclude that using constructors is generally a bad idea, since the compiler doesn't know if you're calling the builtin or something with an overloaded name. I presume that the compiler will eventually optimize the second example to match the last, but both of them use the BUILD_SET opcode. I expect that this can be expensive for long lists, so I don't think that it's a good idea to use set constants inside loops. Instead it should be assigned to a global or class variable. -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 8, 6:02 pm, Ben Finney wrote: > Carl Banks writes: > > If you want to go strictly by the book, I would say he ought to be > > using a set since his collection of numbers has no meaningful order > > nor does it make sense to list any item twice. > > Yes, a set would be best for this specific situation. > > > I don't think it's very important, however, to stick to rules like > > that for objects that don't live for more than a single line of code. > > It's important to the extent that it's important to express one's > *meaning*. Program code should be written primarily as a means of > communicating with other programmers, and only incidentally for the > computer to execute. Which is precisely why isn't not very important for an object that exists for one line. No programmer is ever going to be confused about the meaning of this: if a in (1,2,3): Carl Banks -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
Carl Banks writes: > If you want to go strictly by the book, I would say he ought to be > using a set since his collection of numbers has no meaningful order > nor does it make sense to list any item twice. Yes, a set would be best for this specific situation. > I don't think it's very important, however, to stick to rules like > that for objects that don't live for more than a single line of code. It's important to the extent that it's important to express one's *meaning*. Program code should be written primarily as a means of communicating with other programmers, and only incidentally for the computer to execute. -- \“Laurie got offended that I used the word ‘puke’. But to me, | `\ that's what her dinner tasted like.” —Jack Handey | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Jun 8, 4:43 pm, Ben Finney wrote: > m...@pixar.com writes: > > Is there any reason to prefer one or the other of these statements? > > > if e.message.code in [25401,25402,25408]: > > if e.message.code in (25401,25402,25408): > > > I'm currently using [], but only coz I think it's prettier > > than (). > > Use a list when the semantic meaning of an item doesn't depend on all > the other items: it's “only” a collection of values. > > Your list of message codes is a good example: if a value appears at > index 3, that doesn't make it mean something different from the same > value appearing at index 2. > > Use a tuple when the semantic meaning of the items are bound together, > and it makes more sense to speak of all the items as a single structured > value. If you want to go strictly by the book, I would say he ought to be using a set since his collection of numbers has no meaningful order nor does it make sense to list any item twice. I don't think it's very important, however, to stick to rules like that for objects that don't live for more than a single line of code. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
m...@pixar.com writes: > Is there any reason to prefer one or the other of these statements? > > if e.message.code in [25401,25402,25408]: > if e.message.code in (25401,25402,25408): > > I'm currently using [], but only coz I think it's prettier > than (). Use a list when the semantic meaning of an item doesn't depend on all the other items: it's “only” a collection of values. Your list of message codes is a good example: if a value appears at index 3, that doesn't make it mean something different from the same value appearing at index 2. Use a tuple when the semantic meaning of the items are bound together, and it makes more sense to speak of all the items as a single structured value. The classic examples are point coordinates and timestamps: rather than a collection of values, it makes more sense to think of each coordinate set or timestamp as a single complex value. The value 7 appearing at index 2 would have a completely different meaning from the value 7 appearing at index 3. James Tauber explains this at http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/>. -- \ “Pinky, are you pondering what I'm pondering?” “Well, I think | `\ so, Brain, but pantyhose are so uncomfortable in the | _o__) summertime.” —_Pinky and The Brain_ | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
pixar.com> writes: > > Is there any reason to prefer one or the other of these statements? > > if e.message.code in [25401,25402,25408]: > if e.message.code in (25401,25402,25408): > >From the viewpoint of relative execution speed, in the above case if it matters at all it matters only on Python 2.4 AFAICT: | >>> L=lambda x:x in[25401,25402,25408]; T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T) 1 0 LOAD_FAST0 (x) 3 LOAD_CONST 1 (25401) 6 LOAD_CONST 2 (25402) 9 LOAD_CONST 3 (25408) 12 BUILD_LIST 3 15 COMPARE_OP 6 (in) 18 RETURN_VALUE 1 0 LOAD_FAST0 (x) 3 LOAD_CONST 4 ((25401, 25402, 25408)) 6 COMPARE_OP 6 (in) 9 RETURN_VALUE Earlier versions build the list or tuple at run time (as for the list above); later versions detect that the list can't be mutated and generate the same code for both the list and tuple. However there are limits to the analysis that can be performed e.g. if the list is passed to a function, pursuit halts at the county line: [Python 2.6.2] | >>> F=lambda y,z:y in z;L=lambda x:F(x,[25401,25402,25408]); T=lambda x:F(x,(25401,25402,25408));import dis;dis.dis(L);dis.dis(T) 1 0 LOAD_GLOBAL 0 (F) 3 LOAD_FAST0 (x) 6 LOAD_CONST 0 (25401) 9 LOAD_CONST 1 (25402) 12 LOAD_CONST 2 (25408) 15 BUILD_LIST 3 18 CALL_FUNCTION2 21 RETURN_VALUE 1 0 LOAD_GLOBAL 0 (F) 3 LOAD_FAST0 (x) 6 LOAD_CONST 3 ((25401, 25402, 25408)) 9 CALL_FUNCTION2 12 RETURN_VALUE So in general anywhere I had a "list constant" I'd make it a tuple -- I'm not aware of any way that performance gets worse by doing that, and it can get better. Background: I'm supporting packages that run on 2.1 to 2.6 in one case and 2.4 to 2.6 in the other; every little unobtrusive tweak helps :-) HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
m...@pixar.com wrote: Is there any reason to prefer one or the other of these statements? if e.message.code in [25401,25402,25408]: if e.message.code in (25401,25402,25408): I'm currently using [], but only coz I think it's prettier than (). context: these are database errors and e is database exception, so there's probably been zillions of instructions and io's handling that already. I lightly prefer the (a, b, c) -- you do put spaces after the comma, don't you? A tuple can be kept as a constant, but it requires (not very heavy) program analysis to determine that the list need not be constructed each time the statement is executed. In addition, a tuple is allocated as a single block, while a list is a pair of allocations. The cost is tiny, however, and your sense of aesthetics is part of your code. So unless you only very slightly prefer brackets, if I were you I'd go with the list form. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: preferring [] or () in list of error codes?
On Mon, Jun 8, 2009 at 2:36 PM, wrote: > Is there any reason to prefer one or the other of these statements? > >if e.message.code in [25401,25402,25408]: >if e.message.code in (25401,25402,25408): > > I'm currently using [], but only coz I think it's prettier > than (). I like to use tuples / () if the sequence literal is ultimately static. Purely because in my mind that just makes it a little more clear-- a list is mutable, so I use it when it should be or may be mutated; if it never would, I use a tuple. It just seems clearer to me that way. But a tuple also takes up a little space in memory, so it's a bit more efficient that way. I have absolutely no idea if reading / checking for contents in a list vs tuple has any performance difference, but would suspect it'd be tiny (and probably irrelevant in a small case like that), but still. --S -- http://mail.python.org/mailman/listinfo/python-list
preferring [] or () in list of error codes?
Is there any reason to prefer one or the other of these statements? if e.message.code in [25401,25402,25408]: if e.message.code in (25401,25402,25408): I'm currently using [], but only coz I think it's prettier than (). context: these are database errors and e is database exception, so there's probably been zillions of instructions and io's handling that already. Many TIA! Mark -- Mark Harrison Pixar Animation Studios -- http://mail.python.org/mailman/listinfo/python-list