Re: [Python-Dev] Why not using the hash when comparing strings?
Hrvoje Niksic wrote: > On 10/19/2012 03:22 AM, Benjamin Peterson wrote: >> It would be interesting to see how common it is for strings which have >> their hash computed to be compared. > > Since all identifier-like strings mentioned in Python are interned, and > therefore have had their hash computed, I would imagine comparing them > to be fairly common. After all, strings are often used as makeshift > enums in Python. > > On the flip side, those strings are typically small, so a measurable > overall speed improvement brought by such a change seems unlikely. I'm pretty sure it would result in a small slowdown. Many (most?) of the comparisons against interned identifiers will be done by dictionary lookups and the dictionary lookup code only tries the string comparison after it has determined that the hashes match. The only time dictionary key strings contents are actually compared is when the hash matches but the pointers are different; it is already the case that if the hashes don't match the strings are never compared. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 (again)
Hrvoje Niksic wrote: > Assume a UTF-8 locale. A file named b'\xff', being an invalid UTF-8 > sequence, will be converted to the half-surrogate '\udcff'. However, > a file named b'\xed\xb3\xbf', a valid[1] UTF-8 sequence, will also be > converted to '\udcff'. Those are quite different POSIX pathnames; how > will Python know which one it was when I later pass '\udcff' to > open()? > > > [1] > I'm assuming that it's valid UTF8 because it passes through Python > 2.5's '\xed\xb3\xbf'.decode('utf-8'). I don't claim to be a UTF-8 > expert. I'm not a UTF-8 expert either, but I got bitten by this yesterday. I was uploading a file to a Google Search Appliance and it was rejected as invalid UTF-8 despite having been encoded into UTF-8 by Python. The cause was a byte sequence which decoded to a half surrogate similar to your example above. Python will happily decode and encode such sequences, but as I found to my cost other systems reject them. Reading wikipedia implies that Python is wrong to accept these sequences and I think (though I'm not a lawyer) that RFC 3629 also implies this: "The definition of UTF-8 prohibits encoding character numbers between U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding form (as surrogate pairs) and do not directly represent characters." and "Implementations of the decoding algorithm above MUST protect against decoding invalid sequences." ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] slightly inconsistent set/list pop behaviour
Andrea Griffini wrote: > On Wed, Apr 8, 2009 at 12:57 PM, Jack diederich > wrote: >> You wrote a program to find the two smallest ints that would have a >> hash collision in the CPython set implementation? I'm impressed. >> And by impressed I mean frightened. > > ? > > print set([0,8]).pop(), set([8,0]).pop() If 'smallest ints' means the sum of the absolute values then these are slightly smaller: >>> print set([-1,6]).pop(), set([6,-1]).pop() 6 -1 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Ruby-style Blocks in Python [Pseudo-PEP]
tav wrote: > I explain in detail in this blog article: > > http://tav.espians.com/ruby-style-blocks-in-python.html > "This is also possible in Python but at the needless cost of naming and defining a function first" The cost of defining the function first is probably much less than the cost of your __do__ function. Your proposal seems to be much more limited than passing functions around e.g. Python allows you to pass in multiple functions where appropriate, or to store them for later calling. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] repeated keyword arguments
"Steven D'Aprano" <[EMAIL PROTECTED]> wrote: > It would be nice to be able to do this: > > defaults = dict(a=5, b=7) > f(**defaults, a=8) # override the value of a in defaults > > but unfortunately that gives a syntax error. Reversing the order would > override the wrong value. So as Python exists now, no, it's not > terribly useful. But it's not inherently a stupid idea. There is already an easy way to do that using functools.partial, and it is documented and therefore presumably deliberate behaviour "If additional keyword arguments are supplied, they extend and override keywords." >>> from functools import partial >>> def f(a=1, b=2, c=3): print a, b, c >>> g = partial(f, b=99) >>> g() 1 99 3 >>> g(a=100, b=101) 100 101 3 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding start to enumerate()
"Steven D'Aprano" <[EMAIL PROTECTED]> wrote: > On Mon, 12 May 2008 08:20:51 am Georg Brandl wrote: >> I believe the following is a common use-case for enumerate() >> (at least, I've used it quite some times): >> >> for lineno, line in enumerate(fileobject): >> ... >> >> For this, it would be nice to have a start parameter for enumerate(). > > Why would it be nice? What would you use it for? > > The only thing I can think of is printing lines with line numbers, and > starting those line numbers at one instead of zero. If that's the only > use-case, should it require built-in support? > If you are generating paginated output then a function to generate an arbitrary page would likely want to enumerate starting at some value larger than one. Of course in that case you'll also want to skip part way through the data, but I think it is more likely that you'll want to enumerate the partial data (e.g. if it is a database query) rather than slice the enumeration. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Complexity documentation request
Dimitrios Apostolou <[EMAIL PROTECTED]> wrote: > On another note which sorting algorithm is python using? Perhaps we can > add this as a footnote. I always thought it was quicksort, with a worst > case of O(n^2). See http://svn.python.org/projects/python/trunk/Objects/listsort.txt ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Declaring setters with getters
Greg Ewing <[EMAIL PROTECTED]> wrote: > Fred Drake wrote: >>@property >>def attribute(self): >>return 42 >> >>@property.set >>def attribute(self, value): >>self._ignored = value > > Hmmm... if you were allowed general lvalues as the target of a > def, you could write that as > >def attribute.set(self, value): > ... > Dotted names would be sufficient rather than general lvalues. I like this, I think it looks cleaner than the other options, especially if you write both getter and setter in the same style: attribute = property() def attribute.fget(self): return 42 def attribute.fset(self, value): self._ignored = value ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Declaring setters with getters
"Steven Bethard" <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > On 10/31/07, Fred Drake <[EMAIL PROTECTED]> wrote: >> If I had to choose built-in names, though, I'd prefer "property", >> "propset", "propdel". Another possibility that seems reasonable >> (perhaps a bit better) would be: >> >>class Thing(object): >> >>@property >>def attribute(self): >>return 42 >> >>@property.set >>def attribute(self, value): >>self._ignored = value >> >>@property.delete >>def attribute(self): >>pass > > +1 on this spelling if possible. Though based on Guido's original > recipe it might have to be written as:: > > @property.set(attribute) > def attribute(self, value): > self._ignored = value > It *can* be written as Fred suggested: see http://groups.google.co.uk/group/comp.lang.python/browse_thread/thread/b442d08c9a019a8/8a381be5edc26340 However that depends on hacking the stack frames, so the implementation probably isn't appropriate here. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] generators and with
"tomer filiba" <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > why not add __enter__ and __exit__ to generator objects? > it's really a trivial addition: __enter__ returns self, __exit__ calls > close(). > it would be used to ensure close() is called when the generator is > disposed, instead of doing that manually. typical usage would be: > > with mygenerator() as g: > g.next() > bar = g.send("foo") > You can already ensure that the close() method is called quite easily: with contextlib.closing(mygenerator()) as g: g.next() bar = g.send("foo") ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] whitespace normalization
"Neal Norwitz" <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > I just checked in a whitespace normalization change that was way too > big. Should this task be automated? IMHO, changing whitespace retrospectively in a version control system is a bad idea. How much overhead would it be to have a checkin hook which compares each modified file against the output of running reindent.py over the same file and rejects the checkin if it changes anything? (With of course an appropriate message suggesting the use of Reindent.py before reatttempting the checkin). That way the whitespace ought to stay normalized so you shouldn't need a separate cleanup step and you won't be breaking diff and blame for the sources (and if the reindent does ever break anything it should be more tracable). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] __del__ unexpectedly being called twice
"Terry Reedy" <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > > "Duncan Booth" <[EMAIL PROTECTED]> wrote in message > news:[EMAIL PROTECTED] >> There's a thread on comp.lang.python at the moment under the subject >> "It is >> __del__ calling twice for some instances?" which seems to show that >> when releasing a long chain of old-style classes every 50th >> approximately has its finaliser called twice. I've verified that this >> happens on both Python >> 1.4 and 1.5. > > Should we assume you meant 2.4 and 2.5? > Probably. 2.5b3 to be a bit more precise. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] __del__ unexpectedly being called twice
There's a thread on comp.lang.python at the moment under the subject "It is __del__ calling twice for some instances?" which seems to show that when releasing a long chain of old-style classes every 50th approximately has its finaliser called twice. I've verified that this happens on both Python 1.4 and 1.5. My guess is that there's a bug in the trashcan mechanism: calling the __del__ method means creating a descriptor, and if that descriptor gets queued in the trashcan then releasing it calls the __del__ method a second time. I'm not sure if there is going to be a particularly easy fix for that. Would someone who knows this code (instance_dealloc in classobject.c) like to have a look at it, should I just submit a bug report, or isn't it worth bothering about? The code which exhibits the problem: #!/usr/local/bin/python -d # -*- coding: koi8-u -*- import sys class foo: def __init__(self, other): self.other = other self._deleted = False global ini_cnt ini_cnt +=1 def __del__(self): if self._deleted: print "aargh!" self._deleted = True global del_cnt del_cnt +=1 print "del",del_cnt,"at",id(self) def stat(): print "-"*20 print "ini_cnt = %d" % ini_cnt print "del_cnt = %d" % del_cnt print "difference = %d" % (ini_cnt-del_cnt) ini_cnt = 0 del_cnt = 0 loop_cnt = 55 a = foo(None) for i in xrange(loop_cnt): a = foo(a) stat() a = None stat() The original thread is at: http://groups.google.com/group/comp.lang.python/browse_thread/thread/293acf433a39583b/bfd4af9c6008a34e ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] segmentation fault in Python 2.5b3 (trunk:51066)
Thomas Heller <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: >> /* if no docstring given and the getter has one, use that one */ >> if ((doc == NULL || doc == Py_None) && get != NULL && >> PyObject_HasAttrString(get, "__doc__")) { >> if (!(get_doc = PyObject_GetAttrString(get, "__doc__"))) >>return -1; >> Py_DECREF(get_doc); /* it is INCREF'd again below */ >> ^^ >> doc = get_doc; >> } >> >> Py_XINCREF(get); >> Py_XINCREF(set); >> Py_XINCREF(del); >> Py_XINCREF(doc); >> > > A strange programming style, if you ask me, and I wonder why Coverity > doesn't complain about it. > Does Coverity recognise objects on Python's internal pools as deallocated? If not it wouldn't complain because all that the Py_DECREF does is link the block into a pool's freeblock list so any subsequent reference from the Py_XINCREF is still a reference to allocated memory. [Off topic: Microsoft have (or had?) a similarly screwy bit in their ActiveX atl libraries: a newly created ActiveX object has its reference count incremented before calling FinalConstruct and then decremented to 0 (using a method which decrements the reference count but doesn't free it) before being incremented again. If in the meanwhile you increment and decrement the reference count in another thread then it goes bang.] The moral is to regard the reference counting rules as law: no matter how sure you are that you can cheat, don't or you'll regret it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] reference leaks, __del__, and annotations
"Jim Jewett" <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > As a strawman proposal: > > deletes = [(obj.__del__.cycle, obj) for obj in cycle > if hasattr(obj, "__del__") and > hasattr(obj.__del__, "cycle")] > deletes.sort() > for (cycle, obj) in deletes: > obj.__del__() > > Lightweight __del__ methods (such as most resource managers) could set > the cycle attribute to True, and thereby ensure that they won't cause > unbreakable cycles. Fancier object frameworks could use different > values for the cycle attribute. Any object whose __del__ is not > annotated will still be at least as likely to get finalized as it is That doesn't look right to me. Surely if you have a cycle what you want to do is to pick just *one* of the objects in the cycle and break the link which makes it participate in the cycle. That should be sufficient to cause the rest of the cycle to collapse with __del__ methods being called from the normal refcounting mechanism. So something like this: for obj in cycle: if hasattr(obj, "__breakcycle__"): obj.__breakcycle__() break Every object which knows it can participate in a cycle then has the option of defining a method which it can use to tear down the cycle. e.g. by releasing the resource and then deleting all of its attributes, but no guarantees are made over which obj has this method called. An object with a __breakcycle__ method would have to be extra careful as its methods could still be called after it has broken the cycle, but it does mean that the responsibilities are in the right place (i.e. defining the method implies taking that into account). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The decorator(s) module
Georg Brandl <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > Unfortunately, a @property decorator is impossible... > It all depends what you want (and whether you want the implementation to be portable to other Python implementations). Here's one possible but not exactly portable example: from inspect import getouterframes, currentframe import unittest class property(property): @classmethod def get(cls, f): locals = getouterframes(currentframe())[1][0].f_locals prop = locals.get(f.__name__, property()) return cls(f, prop.fset, prop.fdel, prop.__doc__) @classmethod def set(cls, f): locals = getouterframes(currentframe())[1][0].f_locals prop = locals.get(f.__name__, property()) return cls(prop.fget, f, prop.fdel, prop.__doc__) @classmethod def delete(cls, f): locals = getouterframes(currentframe())[1][0].f_locals prop = locals.get(f.__name__, property()) return cls(prop.fget, prop.fset, f, prop.__doc__) class PropTests(unittest.TestCase): def test_setgetdel(self): class C(object): def __init__(self, colour): self._colour = colour @property.set def colour(self, value): self._colour = value @property.get def colour(self): return self._colour @property.delete def colour(self): self._colour = 'none' inst = C('red') self.assertEquals(inst.colour, 'red') inst.colour = 'green' self.assertEquals(inst._colour, 'green') del inst.colour self.assertEquals(inst._colour, 'none') if __name__=='__main__': unittest.main() ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Path PEP and the division operator
Nick Coghlan <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > Duncan Booth wrote: >> I'm not convinced by the rationale given why atime,ctime,mtime and >> size are methods rather than properties but I do find this PEP much >> more agreeable than the last time I looked at it. > > A better rationale for doing it is that all of them may raise > IOException. It's rude for properties to do that, so it's better to > make them methods instead. Yes, that rationale sounds good to me. > > That was a general guideline that came up the first time adding Path > was proposed - if the functionality involved querying or manipulating > the actual filesystem (and therefore potentially raising IOError), > then it should be a method. If the operation related solely to the > string representation, then it could be a property. Perhaps Bjorn could add that to the PEP? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Path PEP and the division operator
BJörn Lindqvist <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > On 2/4/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: >> I won't even look at the PEP as long as it uses / or // (or any other >> operator) for concatenation. > > That's good, because it doesn't. :) > http://www.python.org/peps/pep-0355.html > No, but it does say that / may be reintroduced 'if the BFDL so desires'. I hope that doesn't mean the BDFL may be overruled. :^) I'm not convinced by the rationale given why atime,ctime,mtime and size are methods rather than properties but I do find this PEP much more agreeable than the last time I looked at it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] properties and block statement
Stefan Rank <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > I think there is no need for a special @syntax for this to work. > > I suppose it would be possible to allow a trailing block after any > function invocation, with the effect of creating a new namespace that > gets treated as containing keyword arguments. > I suspect that without any syntax changes at all it will be possible (for some stack crawling implementation of 'propertycontext', and assuming nobody makes property objects immutable) to do: class C(object): with propertycontext() as x: doc = """ Yay for property x! """ def fget(self): return self._x def fset(self, value): self._x = value for inheritance you would have to specify the base property: class D(C): with propertycontext(C.x) as x: def fset(self, value): self._x = value+1 propertycontext could look something like: import sys @contextmanager def propertycontext(parent=None): classframe = sys._getframe(2) cvars = classframe.f_locals marker = object() keys = ('fget', 'fset', 'fdel', 'doc') old = [cvars.get(key, marker) for key in keys] if parent: pvars = [getattr(parent, key) for key in ('fget', 'fset', 'fdel', '__doc__')] else: pvars = [None]*4 args = dict(zip(keys, pvars)) prop = property() try: yield prop for key, orig in zip(keys, old): v = cvars.get(key, marker) if v is not orig: args[key] = v prop.__init__(**args) finally: for k,v in zip(keys,old): if v is marker: if k in cvars: del cvars[k] else: cvars[k] = v ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bug in urlparse
[EMAIL PROTECTED] wrote in news:[EMAIL PROTECTED]: > According to RFC 2396[1] section 5.2: > > g) If the resulting buffer string still begins with one or more > complete path segments of "..", then the reference is > considered to be in error. Implementations may handle this > error by retaining these components in the resolved path (i.e., > treating them as part of the final URI), by removing them from > the resolved path (i.e., discarding relative levels above the > root), or by avoiding traversal of the reference. > > If I read this right, it explicitly allows the urlparse.urljoin behavior > ("handle this error by retaining these components in the resolved path"). > Yes, the urljoin behaviour is explicitly allowed, however it is not the most commonly implemented permitted behaviour. Both IE and Mozilla/Firefox handle this error by stripping the spurious .. elements from the front of the path. Apache, and I hope other web servers, work by the third permitted method, i.e. rejecting requests to these invalid urls. The net effect of this is that on some sites using a Python spider (e.g. webchecker.py) will produce a large number of error messages for links which browsers will actually resolve successfully. (At least that's when I first noticed this particular problem). Depending on your reasons for spidering a site this can be either a good thing or an annoyance. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: anonymous blocks
Jim Fulton <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: >> No, the return sets a flag and raises StopIteration which should make >> the iterator also raise StopIteration at which point the real return >> happens. > > Only if exc is not None > > The only return in the pseudocode is inside "if exc is not None". > Is there another return that's not shown? ;) > Ah yes, I see now what you mean. I would think that the relevant psuedo-code should look more like: except StopIteration: if ret: return exc if exc is not None: raise exc # XXX See below break ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Re: anonymous blocks
Jim Fulton <[EMAIL PROTECTED]> wrote in news:[EMAIL PROTECTED]: > Guido van Rossum wrote: >> I've written a PEP about this topic. It's PEP 340: Anonymous Block >> Statements (http://python.org/peps/pep-0340.html). >> > Some observations: > > 1. It looks to me like a bare return or a return with an EXPR3 that > happens > to evaluate to None inside a block simply exits the block, rather > than exiting a surrounding function. Did I miss something, or is > this a bug? > No, the return sets a flag and raises StopIteration which should make the iterator also raise StopIteration at which point the real return happens. If the iterator fails to re-raise the StopIteration exception (the spec only says it should, not that it must) I think the return would be ignored but a subsquent exception would then get converted into a return value. I think the flag needs reset to avoid this case. Also, I wonder whether other exceptions from next() shouldn't be handled a bit differently. If BLOCK1 throws an exception, and this causes the iterator to also throw an exception then one exception will be lost. I think it would be better to propogate the original exception rather than the second exception. So something like (added lines to handle both of the above): itr = EXPR1 exc = arg = None ret = False while True: try: VAR1 = next(itr, arg) except StopIteration: if exc is not None: if ret: return exc else: raise exc # XXX See below break + except: + if ret or exc is None: + raise + raise exc # XXX See below + ret = False try: exc = arg = None BLOCK1 except Exception, exc: arg = StopIteration() ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com