Re: [Python-Dev] A "record" type (was Re: Py2.6 ideas)
Josiah Carlson wrote: > one thing to note with your method - you can't guarantee the order of the > attributes as they are being displayed. > Actually, my record type *can*; see the hack using the __names__ field. It won't preserve that order during iteration--but it's only a prototype so far, and it could be fixed if there was interest. /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
Josiah Carlson wrote: > Larry Hastings <[EMAIL PROTECTED]> wrote: > >> For new code, I can't think of a single place I'd want to use a >> "NamedTuple" where a "record" wouldn't be arguably more suitable. The >> "NamedTuple" makes sense only for "fixing" old APIs like os.stat. >> > I disagree. > > def add(v1, v2) > return Vector(i+j for i,j in izip(v1,v2)) > > x,y,z = point > I realize I'm in the minority here, but I prefer to use tuples only as lightweight frozen lists of homogeneous objects. I understand the concept that a tuple is a frozen container of heterogeneous objects referenced by position, it's considered a perfectly Pythonic idiom, it has a long-standing history in mathematical notation... I *get* it, I just don't *agree* with it. Better to make explicit the association of data with its name. x.dingus is simply better than remembering "the ith element of x is the 'dingus'". The examples you cite don't sway me; I'd be perfectly happy to use a positionless "record" that lacked the syntactic shortcuts you suggest. I stand by my assertion above. At least now you know "where I'm coming from", /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A "record" type (was Re: Py2.6 ideas)
Larry Hastings <[EMAIL PROTECTED]> wrote: > > Steven Bethard wrote: > > On 2/20/07, Larry Hastings <[EMAIL PROTECTED]> wrote: > >> I considered using __slots__, but that was gonna take too long. > > Here's a simple implementation using __slots__: > > Thanks for steering me to it. However, your implementation and Mr. > Hettinger's original NamedTuple both requires you to establish a type at > the onset; with my prototype, you can create records ad-hoc as you can > with dicts and tuples. I haven't found a lot of documentation on how to > use __slots__, but I'm betting it wouldn't mesh well with my breezy > ad-hoc records type. If it helps, you can think of Steven's and Raymond's types as variations of a C struct. They are fixed at type definition time, but that's more or less the point. Also, you can find more than you ever wanted to know about __slots__ by searching on google for 'python __slots__' (without quotes), but it would work *just fine* with your ad-hoc method, though one thing to note with your method - you can't guarantee the order of the attributes as they are being displayed. Your example would have the same issues as dict does here: >>> dict(b=1, a=2) {'a': 2, 'b': 1} Adding more attributes could further arbitrarily rearrange them. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A "record" type (was Re: Py2.6 ideas)
Steven Bethard wrote: > On 2/20/07, Larry Hastings <[EMAIL PROTECTED]> wrote: >> I considered using __slots__, but that was gonna take too long. > Here's a simple implementation using __slots__: Thanks for steering me to it. However, your implementation and Mr. Hettinger's original NamedTuple both requires you to establish a type at the onset; with my prototype, you can create records ad-hoc as you can with dicts and tuples. I haven't found a lot of documentation on how to use __slots__, but I'm betting it wouldn't mesh well with my breezy ad-hoc records type. Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A "record" type (was Re: Py2.6 ideas)
Steven Bethard gmail.com> writes: > Here's a simple implementation using __slots__: > > http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237 That's pretty cool! Two suggestions: 1. rename the _items method to __iter__, so that you have easy casting to tuple and lists; 2. put a check in the metaclass such as ``assert '__init__' not in bodydict`` to make clear to the users that they cannot override the __init__ method, that's the metaclass job. Great hack! Michele Simionato ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
Larry Hastings <[EMAIL PROTECTED]> wrote: > For new code, I can't think of a single place I'd want to use a > "NamedTuple" where a "record" wouldn't be arguably more suitable. The > "NamedTuple" makes sense only for "fixing" old APIs like os.stat. I disagree. def add(v1, v2) return Vector(i+j for i,j in izip(v1,v2)) x,y,z = point And more are examples where not having a defined ordering (as would be the case with a 'static dict') would reduce its usefulness. Having a defined ordering (as is the case of lists and tuples) implies indexability (I want the ith item in this sequence!). It also allows one to use a NamedTuple without change otherwise anywhere you would have previously used a tuple (except for doctests, which would need to be changed). This was all discussed earlier. - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
Raymond Hettinger writes: > Two people had some difficulty building non-upgraded third-party modules > with Py2.5 on 64-bit machines (I think wxPython was one of the problems) In my experience wxPython is problematic, period. It's extremely tightly bound to internal details of everything around it. In particular, on every package system I've (tried to) build it, the first thing the package does is to check for its own version of Python, and pull it in if it's not there. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A "record" type (was Re: Py2.6 ideas)
On 2/20/07, Larry Hastings <[EMAIL PROTECTED]> wrote: > # the easy way to define a "subclass" of record > def Point(x, y): > return record(x = x, y = y) > > # you can use hack-y features to make your "subclasses" more swell > def Point(x, y): > x = record(x = x, y = y) > # a hack to print the name "Point" instead of "record" > x.__classname__ = "Point" > # a hack to impose an ordering on the repr() display > x.__names__ = ("x", "y") > return x > > p = Point(3, 5) > q = Point(2, y=5) > r = Point(y=2, x=4) > print p, q, r > > # test pickling > import pickle > pikl = pickle.dumps(p) > pp = pickle.loads(pikl) > print pp > print pp == p > > # test that the output repr works to construct > s = repr(p) > print repr(s) > peval = eval(s) > print peval > print p == peval > -- > > Yeah, I considered using __slots__, but that was gonna take too long. Here's a simple implementation using __slots__: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502237 And you don't have to hack anything to get good help() and a nice repr(). Declare a simple class for your type and you're ready to go:: >>> class Point(Record): ... __slots__ = 'x', 'y' ... >>> Point(3, 4) Point(x=3, y=4) STeVe -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] A "record" type (was Re: Py2.6 ideas)
Larry Hastings wrote: I'd prefer a lightweight frozen dict, let's call it a "record" after one of the suggestions in this thread. That achieves symmetry: mutable & immutable & heavyweight lightweight +-- positional | list tuple keyword| dict record I knocked together a quick prototype of a "record" to show what I had in mind. Here goes: -- import exceptions class record(dict): __classname__ = "record" def __repr__(self): s = [self.__classname__, "("] comma = "" for name in self.__names__: s += (comma, name, "=", repr(self[name])) comma = ", " s += ")" return "".join(s) def __delitem__(self, name): raise exceptions.TypeError("object is read-only") def __setitem__(self, name, value): self.__delitem__(name) def __getattr__(self, name): if name in self: return self[name] return object.__getattr__(self, name) def __hasattr__(self, name): if name in self.__names__: return self[name] return object.__hasattr__(self, name) def __setattr__(self, name, value): # hack: allow setting __classname__ and __names__ if name in ("__classname__", "__names__"): super(record, self).__setattr__(name, value) else: # a shortcut to throw our exception self.__delitem__(name) def __init__(self, **args): names = [] for name, value in args.iteritems(): names += name super(record, self).__setitem__(name, value) self.__names__ = tuple(names) if __name__ == "__main__": r = record(a=1, b=2, c="3") print r print r.a print r.b print r.c # this throws a TypeError # r.c = 123 # r.d = 456 # the easy way to define a "subclass" of record def Point(x, y): return record(x = x, y = y) # you can use hack-y features to make your "subclasses" more swell def Point(x, y): x = record(x = x, y = y) # a hack to print the name "Point" instead of "record" x.__classname__ = "Point" # a hack to impose an ordering on the repr() display x.__names__ = ("x", "y") return x p = Point(3, 5) q = Point(2, y=5) r = Point(y=2, x=4) print p, q, r # test pickling import pickle pikl = pickle.dumps(p) pp = pickle.loads(pikl) print pp print pp == p # test that the output repr works to construct s = repr(p) print repr(s) peval = eval(s) print peval print p == peval -- Yeah, I considered using __slots__, but that was gonna take too long. Cheers, /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
On 2/20/07, Tim Peters <[EMAIL PROTECTED]> wrote: > In any case, hash codes are defined to be of type "long" in the C API, > so there appears no painless way to boost their size on boxes where > sizeof(Py_ssize_t) > sizeof(long). But that would only be on Windows; I believe other vendors have a 64-bit long on 64-bit machines. I suppose the pain wouldn't be any greater than the pain of turning int into Py_ssize_t. Perhaps less so in Py3k since there the issue that PyInt only holds a C long is solved (by equating it to PyLong :). -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
[Raymond Hettinger] > After thinking more about Py_ssize_t, I'm surprised that we're not hearing > about > 64 bit users having a couple of major problems. > > If I'm understanding what was done for dictionaries, the hash table can grow > larger than the range of hash values. Accordingly, I would expect large > dictionaries to have an unacceptably large number of collisions. OTOH, we > haven't heard a single complaint, so perhaps my understanding is off. > ... As others have noted, it would require a truly gigantic dict for anyone to notice, and nobody yet has enough RAM to build something that large. I added this comment to dictobject.c for 2.5: Theoretical Python 2.5 headache: hash codes are only C "long", but sizeof(Py_ssize_t) > sizeof(long) may be possible. In that case, and if a dict is genuinely huge, then only the slots directly reachable via indexing by a C long can be the first slot in a probe sequence. The probe sequence will still eventually reach every slot in the table, but the collision rate on initial probes may be much higher than this scheme was designed for. Getting a hash code as fat as Py_ssize_t is the only real cure. But in practice, this probably won't make a lick of difference for many years (at which point everyone will have terabytes of RAM on 64-bit boxes). Ironically, IIRC we /have/ had a complaint in the other direction: someone on SF claims to have a box where sizeof(Py_ssize_t) < sizeof(long). Something else breaks as a result of that. I think I always implicitly assumed sizeof(Py_ssize_t) >= sizeof(long) would hold. In any case, hash codes are defined to be of type "long" in the C API, so there appears no painless way to boost their size on boxes where sizeof(Py_ssize_t) > sizeof(long). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] NamedTuple (was: Py2.6 ideas)
[Raymond Hettinger] > The constructor signature ... Point(*fetchall(s)), > and it allows for direct construction with Point(2,3) without > the slower and weirder form: Point((2,3)). [Jim Jewett] >> If you were starting from scratch, I would agree >> whole-heartedly; this is one of my most frequent >> mistakes.The question is whether it makes sense to >> "fix" NamedTuple without also fixing regular tuple, list, Yes. Tuples and lists both have syntactic support for direct construction and NamedTuples aspire to that functionality: vec = (dx*3.0, dy*dx/dz, dz) # Regular tuple vec = Vector(dx*3.0, dy*dx/dz, dz) # Named tuple I've worked with the current version of the recipe for a long time and after a while the wisdom of the signature becomes self-evident. We REALLY don't want: vec = Vector((dx*3.0, dy*dx/dz, dz)) # yuck For conversion from other iterables, it is REALLY easy to write: vec = Vector(*partial_derivatives) Remember, list() and tuple() are typically used as casts, not as direct constructors. How often do you write: dlist = list((dx*3.0, dy*dx/dz, dz)) That is usually written: dlist = [dx*3.0, dy*dx/dz, dz] I think the Vec((dx, dy, dz)) sysntax falls into the category of foolish consistency. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
At 09:56 AM 2/20/2007 -0800, Larry Hastings wrote: >My final bit of feedback: why is it important that a NamedTuple have a >class name? In a word: pickling. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
Michele Simionato wrote: > ``Contract = namedtuple('Contract stock strike volatility expiration rate > iscall'.split())`` is not that bad either, but I agree that this is a > second order issue. > That also nicely makes another point: this form accepts not just a list of strings but any iterable. That sounds nice. I'd vote against the one-string approach; it seems wholly un-Pythonic to me. Python already has a parser, so don't invent your own. Speaking of un-Pythonic-ness, the whole concept of a "named tuple" strikes me as un-Pythonic. It's the only data structure I can think of where every item inside has two names. (Does TOOWTDI apply to member lookup?) I'd prefer a lightweight frozen dict, let's call it a "record" after one of the suggestions in this thread. That achieves symmetry: mutable & immutable & heavyweight lightweight +-- positional | list tuple keyword| dict record For new code, I can't think of a single place I'd want to use a "NamedTuple" where a "record" wouldn't be arguably more suitable. The "NamedTuple" makes sense only for "fixing" old APIs like os.stat. My final bit of feedback: why is it important that a NamedTuple have a class name? In the current implementation, the first identifier split from the argument string is used as the "name" of the NamedTuple, and the following arguments are names of fields. Dicts and sets are anonymous, not to mention lists and tuples; why do NamedTuples need names? My feeling is: if you want a class name, use a class. /larry/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
At 10:17 AM 2/20/2007 +, Fuzzyman wrote: >Michele Simionato wrote: > > >Raymond Hettinger verizon.net> writes: > > > > > >>* Add a pure python named_tuple class to the collections module. I've > been > >>using the class for about a year and found that it greatly improves the > >>usability of tuples as records. > >>http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 > >> > >> > >[snip..] > >4. I want help(MyNamedTuple) to work well; in particular it should > > display the right module name. That means > > that in the m dictionary you should add a __module__ attribute: > > > >__module__ = sys._getframe(1).f_globals['__name__'] > > > > > > >Hello all, > >If this is being considered for inclusion in the standard library, using >_getframe' hackery will guarantee that it doesn't work with alternative >implementations of Python (like IronPython at least which doesn't have >Python stack frames). Here's a way that doesn't need it: @namedtuple def Point(x,y): """The body of this function is ignored -- but this docstring will be used for the Point class""" This approach also gets rid of the messy string stuff, and it also allows one to specify default values. The created type can be given the function's __name__, __module__, and __doc__. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] NamedTuple (was: Py2.6 ideas)
Raymond Hettinger explained: > The constructor signature ... Point(*fetchall(s)), > and it allows for direct construction with Point(2,3) without the > slower and weirder form: Point((2,3)). If you were starting from scratch, I would agree whole-heartedly; this is one of my most frequent mistakes. The question is whether it makes sense to "fix" NamedTuple without also fixing regular tuple, list, set, etc. Assuming this goes to collections rather than builtins, would it be reasonable to include both versions? > Also, the current signature > works better with keyword arguments: Point(x=2, y=3) or > Point(2, y=3) which wouldn't be common but would be > consistent with the relationship between keyword arguments > and positional arguments in other parts of the language. Yes and no. One important trait of (pre 2.6) keyword arguments is that they have defaults. When I'm using a record format that someone else defined, it is pretty common for large portions of most records to be essentially wasted. It would be nice if I could create a record by just specifying the fields that have non-default values. I can do this with a data class (since attribute lookup falls back to the class), but not with this tuple factory. > The string form for the named tuple factory was arrived at > because it was easier to write, read, and alter than its original > form with a list of strings: > Contract = namedtuple('Contract stock strike volatility expiration rate iscall') > vs. > Contract = namedtuple('Contract', 'stock', 'strike', 'volatility', > 'expiration', 'rate', 'iscall') The type name is somehow different. Do either of these affect your decision? # separate arguments for type name and field names Contract = namedtuple('Contract', "stock strike volatility expiration rate iscall") # no explicit assignment, and bad hackery to get it namedtuple('Contract', "stock strike volatility expiration rate iscall") -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
> My suspicion is that building Python for an 64-bit address space is > still a somewhat academic exercise. I know we don't do this at Google > (we switch to other languages long before the datasets become so large > we'd need a 64-bit address space for Python). What's your experience > at EWT? Two people had some difficulty building non-upgraded third-party modules with Py2.5 on 64-bit machines (I think wxPython was one of the problems) but they either gave up or switched machines before we could isolate the problem and say for sure whether Py_ssize_t was the culprit. I had remembered the PEP saying that there might be some issues for non-upgraded third-party modules and have wondered whether others were similarly affected. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
On 2/20/07, Raymond Hettinger <[EMAIL PROTECTED]> wrote: > After thinking more about Py_ssize_t, I'm surprised that we're not hearing > about > 64 bit users having a couple of major problems. > > If I'm understanding what was done for dictionaries, the hash table can grow > larger than the range of hash values. Accordingly, I would expect large > dictionaries to have an unacceptably large number of collisions. OTOH, we > haven't heard a single complaint, so perhaps my understanding is off. Not until the has table has 4 billion entries. I believe that would be 96 GB just for the hash table; plus probably at least that for that many unique key strings. Not to mention the values (but those needn't be unique). I think the benefit of 64-bit architecture start above using 2 or 3 GB of RAM, so there's quite a bit of expansion space for 64-bit users before they run into this theoretical problem. > The other area where I expected to hear wailing and gnashing of teeth is users > compiling with third-party extensions that haven't been updated to a > Py_ssize_t > API and still use longs. I would have expected some instability due to the > size > mismatches in function signatures -- the difference would only show-up with > giant sized data structures -- the bigger they are, the harder they fall. > OTOH, > there have not been any compliants either -- I would have expected someone to > submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to > a > long so that the poster could make a reliable build that included non-updated > third-party extensions. > > In the absence of a bug report, it's hard to know whether there is a real > problem. Have all major third-party extensions adopted Py_ssize_t or is some > divine force helping unconverted extensions work with converted Python code? > Maybe the datasets just haven't gotten big enough yet. My suspicion is that building Python for an 64-bit address space is still a somewhat academic exercise. I know we don't do this at Google (we switch to other languages long before the datasets become so large we'd need a 64-bit address space for Python). What's your experience at EWT? -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Making builtins more efficient
If this is not a replay of an old message, please move the discussion to python-ideas. On 2/20/07, Steven Elliott <[EMAIL PROTECTED]> wrote: > I'm finally getting back into this. I'd like to take one more shot at > it with a revised version of what I proposed before. > > For those of you that did not see the original thread it was about ways > that accessing builtins could be more efficient. It's a bit much to > summarize again now, but you should be able to find it in the archive > with this subject and a date of 2006-03-08. > > On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: > > Steven Elliott wrote: > > > One way of handling it is to > > > alter STORE_ATTR (op code for assigning to mod.str) to always check to > > > see if the key being assigned is one of the default builtins. If it is, > > > then the module's indexed array of builtins is assigned to. > > > > As long as you're going to all that trouble, it > > doesn't seem like it would be much harder to treat > > all global names that way, instead of just a predefined > > set. The compiler already knows all of the names that > > are used as globals in the module's code. > > What I have in mind may be close to what you are suggesting above. My > thought now is that builtins are a set of tokens that typically, but > don't necessarily, point to the same objects in all modules. Such > tokens, which I'll refer to as "global tokens", can be roughly broken > into two sets: > 1) Global tokens that typically point to the same object in all > modules. > 2) Global tokens that that are likely to point to the different > objects (or be undefined) in different modules. > Set 1) is pretty much the the builtins. "True" and "len" are likely to > point to the same objects in all modules, but not necessarily. Set 2) > might be things like "os" and "sys" which are often defined (imported) > in modules, but not necessarily. > > Access to the globals of a module, including the current module, is done > with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For > each of these opcodes the following snippet of code from ceval.c (for > LOAD_GLOBAL) is relevant to this discussion: > /* This is the un-inlined version of the code above */ > x = PyDict_GetItem(f->f_globals, w); > if (x == NULL) { > x = PyDict_GetItem(f->f_builtins, w); > if (x == NULL) { > load_global_error: > format_exc_check_arg( > PyExc_NameError, > GLOBAL_NAME_ERROR_MSG, w); > break; > } > } > > So, to avoid the hash table lookups above maybe the global tokens could > be assigned an index value that is fixed for any given version of the > interpreter and that is the same for all modules (that "True" is always > index 7, "len" is always index 3, etc.) > > Once a set of indexes have been determined a new opcode, that I'll call > "LOAD_GTOKEN", could be created that avoids the hash table lookup by > functioning in a way that is similar to LOAD_FAST (pull a local variable > value out of an array). For example, static references to "True" could > always be compiled to > LOAD_GTOKEN 7 (True) > > As to set 1) and set 2) that I mentioned above - there is only a need to > distinguish between the two sets if a copy-on-write mechanism is used. > That way global tokens that are likely to have their value changed > (group 2) ) can all be together in one group so that only that group > needs to be copied when one of the global tokens is written to. For > example code such as: > True = 1 > print True > would be compiled into something like: > 1 LOAD_CONST 1 (1) > STORE_GTOKEN1 7 (True) > 2 LOAD_GTOKEN17 (True) > PRINT_ITEM > PRINT_NEWLINE > Note that "1" has been appended to "STORE_GTOKEN" to indicate that group > 1) is being worked with. The store command will copy the array of > pointers once, the first time it is called. > > Just as a new opcode is needed for LOAD_GLOBAL one would be needed for > LOAD_ATTR. Perhaps "LOAD_ATOKEN" would work. For example: > amodule.len = my_len > print amodule.len > would be compiled into something like: > 1 LOAD_GLOBAL 0 (my_len) > LOAD_GLOBAL 1 (amodule) > STORE_ATOKEN1 3 (len) > > 2 LOAD_GLOBAL 1 (amodule) > LOAD_ATOKEN13 (len) > PRINT_ITEM > PRINT_NEWLINE > LOAD_CONST 0 (None) > RETURN_VALUE > > Note that it looks almost identical to the code that is currently > generated, but the oparg "3" shown for the "LOAD_ATOKEN1" above indexes > into an array (like LOAD_FAST) to get at the attribute directly whereas > the oparg that would be shown for LOAD_ATTR is an index into an array of > constants/strings which is then used to retrieve the attribute from the > module's global hash table. > > > > That's great, but I'm curious if additional gains can be > > >
Re: [Python-Dev] Making builtins more efficient
I'm finally getting back into this. I'd like to take one more shot at it with a revised version of what I proposed before. For those of you that did not see the original thread it was about ways that accessing builtins could be more efficient. It's a bit much to summarize again now, but you should be able to find it in the archive with this subject and a date of 2006-03-08. On Fri, 2006-03-10 at 12:46 +1300, Greg Ewing wrote: > Steven Elliott wrote: > > One way of handling it is to > > alter STORE_ATTR (op code for assigning to mod.str) to always check to > > see if the key being assigned is one of the default builtins. If it is, > > then the module's indexed array of builtins is assigned to. > > As long as you're going to all that trouble, it > doesn't seem like it would be much harder to treat > all global names that way, instead of just a predefined > set. The compiler already knows all of the names that > are used as globals in the module's code. What I have in mind may be close to what you are suggesting above. My thought now is that builtins are a set of tokens that typically, but don't necessarily, point to the same objects in all modules. Such tokens, which I'll refer to as "global tokens", can be roughly broken into two sets: 1) Global tokens that typically point to the same object in all modules. 2) Global tokens that that are likely to point to the different objects (or be undefined) in different modules. Set 1) is pretty much the the builtins. "True" and "len" are likely to point to the same objects in all modules, but not necessarily. Set 2) might be things like "os" and "sys" which are often defined (imported) in modules, but not necessarily. Access to the globals of a module, including the current module, is done with one of three opcodes (LOAD_GLOBAL, LOAD_ATTR and LOAD_NAME). For each of these opcodes the following snippet of code from ceval.c (for LOAD_GLOBAL) is relevant to this discussion: /* This is the un-inlined version of the code above */ x = PyDict_GetItem(f->f_globals, w); if (x == NULL) { x = PyDict_GetItem(f->f_builtins, w); if (x == NULL) { load_global_error: format_exc_check_arg( PyExc_NameError, GLOBAL_NAME_ERROR_MSG, w); break; } } So, to avoid the hash table lookups above maybe the global tokens could be assigned an index value that is fixed for any given version of the interpreter and that is the same for all modules (that "True" is always index 7, "len" is always index 3, etc.) Once a set of indexes have been determined a new opcode, that I'll call "LOAD_GTOKEN", could be created that avoids the hash table lookup by functioning in a way that is similar to LOAD_FAST (pull a local variable value out of an array). For example, static references to "True" could always be compiled to LOAD_GTOKEN 7 (True) As to set 1) and set 2) that I mentioned above - there is only a need to distinguish between the two sets if a copy-on-write mechanism is used. That way global tokens that are likely to have their value changed (group 2) ) can all be together in one group so that only that group needs to be copied when one of the global tokens is written to. For example code such as: True = 1 print True would be compiled into something like: 1 LOAD_CONST 1 (1) STORE_GTOKEN1 7 (True) 2 LOAD_GTOKEN17 (True) PRINT_ITEM PRINT_NEWLINE Note that "1" has been appended to "STORE_GTOKEN" to indicate that group 1) is being worked with. The store command will copy the array of pointers once, the first time it is called. Just as a new opcode is needed for LOAD_GLOBAL one would be needed for LOAD_ATTR. Perhaps "LOAD_ATOKEN" would work. For example: amodule.len = my_len print amodule.len would be compiled into something like: 1 LOAD_GLOBAL 0 (my_len) LOAD_GLOBAL 1 (amodule) STORE_ATOKEN1 3 (len) 2 LOAD_GLOBAL 1 (amodule) LOAD_ATOKEN13 (len) PRINT_ITEM PRINT_NEWLINE LOAD_CONST 0 (None) RETURN_VALUE Note that it looks almost identical to the code that is currently generated, but the oparg "3" shown for the "LOAD_ATOKEN1" above indexes into an array (like LOAD_FAST) to get at the attribute directly whereas the oparg that would be shown for LOAD_ATTR is an index into an array of constants/strings which is then used to retrieve the attribute from the module's global hash table. > > That's great, but I'm curious if additional gains can be > > made be focusing just on builtins. > > As long as builtins can be shadowed, I can't see how > to make any extra use of the fact that it's a builtin. > A semantic change would be needed, such as forbidding > shadowing of builtins, or at least forbidding this > from outside the module. I now think that it best not to think of builtins a
Re: [Python-Dev] Py_ssize_t
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Feb 20, 2007, at 4:47 AM, Raymond Hettinger wrote: > The other area where I expected to hear wailing and gnashing of > teeth is users > compiling with third-party extensions that haven't been updated to > a Py_ssize_t > API and still use longs. I would have expected some instability > due to the size > mismatches in function signatures -- the difference would only show- > up with > giant sized data structures -- the bigger they are, the harder they > fall. OTOH, > there have not been any compliants either -- I would have expected > someone to > submit a patch to pyport.h that allowed a #define to force > Py_ssize_t back to a > long so that the poster could make a reliable build that included > non-updated > third-party extensions. When I did an experimental port of our big embedded app to Python 2.5, that's (almost) exactly what I did. I didn't add the #define to a Python header file, but to our own and it worked pretty well, IIRC. I never went farther than the experimental phase though. - -Barry -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (Darwin) iQCVAwUBRdr1QHEjvBPtnXfVAQIO0wP5Adr7c467NFn5fjmvcAemtvjg+3Tri0qV SHI6LF88tSYkxKLezTojXPFQ+kYTjgz1yLa1KuQ6W9Q8dhiKGUVu7ZqFT12IGcIV n6Yf0htkpGmq/3G7m7D7yWHQrQE3Ce3+f6tI/4aL5eQ3mgdo1y828sY/sCCc4fTC Ln2gSad6g/M= =QQ5y -END PGP SIGNATURE- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Welcome to the "Python-Dev" mailing list
Juan Carlos: Please note that the python-dev list is for the developers of the Python language, rather than for Python application developer, so your query isn't really appropriate for this list. I would suggest that you read the mailman license carefully, as it will tell you what you are allowed to do with the software. Since Mailman is distributed under the GPL it is likely you can do all the things you want to. If you have any further queries then there is a Mailman mailing list to which you should refer. Please see http://www.gnu.org/software/mailman/ for further details. Good luck with your project. regards Steve Juan Carlos Suarez wrote: > > *Good morning*, thanks a lot for your answer, I confirmed by this means > my subscription. > What *I really wish and need* is to be able to use Mailman as a freely > distributer , as a distributer of my own mailing lists ,as a mailing list > manager , to be able to send my mails by Mailman and to be able to open > my own mail address. > How should I do this. I need your instructions and help. > I'm waiting for your answer soon > Thanks, my regards, *Juan Carlos Suarez* > > > > > You wrote: > Welcome to the Python-Dev@python.org mailing list! If you are a new > subscriber, please take the time to introduce yourself briefly in your > first post. It is appreciated if you lurk around for a while before > posting! :-) > > Additional information on Python's development process can be found in > the Python Developer's Guide: > > http://www.python.org/dev/ > > To post to this list, send your email to: > > python-dev@python.org > > General information about the mailing list is at: > > http://mail.python.org/mailman/listinfo/python-dev > > If you ever want to unsubscribe or change your options (eg, switch to > or from digest mode, change your password, etc.), visit your > subscription page at: > > > http://mail.python.org/mailman/options/python-dev/juancarlosuarez%40ciudad.com.ar > > > You can also make such adjustments via email by sending a message to: > > [EMAIL PROTECTED] > > with the word `help' in the subject or body (don't include the > quotes), and you will get back a message with instructions. > > You must know your password to change your options (including changing > the password, itself) or to unsubscribe. It is: > > casandra07 > > Normally, Mailman will remind you of your python.org mailing list > passwords once every month, although you can disable this if you > prefer. This reminder will also include instructions on how to > unsubscribe or change your account options. There is also a button on > your options page that will email your current password to you. > > > > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden Blog of Note: http://holdenweb.blogspot.com See you at PyCon? http://us.pycon.org/TX2007 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Welcome to the "Python-Dev" mailing list
Good morning, thanks a lot for your answer, I confirmed by this means my subscription. What I really wish and need is to be able to use Mailman as a freely distributer , as a distributer of my own mailing lists ,as a mailing list manager , to be able to send my mails by Mailman and to be able to open my own mail address. How should I do this. I need your instructions and help. I'm waiting for your answer soon Thanks, my regards, Juan Carlos Suarez You wrote: Welcome to the Python-Dev@python.org mailing list! If you are a new subscriber, please take the time to introduce yourself briefly in your first post. It is appreciated if you lurk around for a while before posting! :-) Additional information on Python's development process can be found in the Python Developer's Guide: http://www.python.org/dev/ To post to this list, send your email to: python-dev@python.org General information about the mailing list is at: http://mail.python.org/mailman/listinfo/python-dev If you ever want to unsubscribe or change your options (eg, switch to or from digest mode, change your password, etc.), visit your subscription page at: http://mail.python.org/mailman/options/python-dev/juancarlosuarez%40ciudad.com.ar You can also make such adjustments via email by sending a message to: [EMAIL PROTECTED] with the word `help' in the subject or body (don't include the quotes), and you will get back a message with instructions. You must know your password to change your options (including changing the password, itself) or to unsubscribe. It is: casandra07 Normally, Mailman will remind you of your python.org mailing list passwords once every month, although you can disable this if you prefer. This reminder will also include instructions on how to unsubscribe or change your account options. There is also a button on your options page that will email your current password to you. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
Raymond Hettinger schrieb: > If I'm understanding what was done for dictionaries, the hash table can grow > larger than the range of hash values. Accordingly, I would expect large > dictionaries to have an unacceptably large number of collisions. OTOH, we > haven't heard a single complaint, so perhaps my understanding is off. I think this would happen, but users don't have enough memory to notice it. For a dictionary with more than 4GEntries, you need 72GiB memory (8 byte for each key, value, and cached-hash). So you are starting to see massive collisions only when you have that much memory - plus in that dictionary, you would also need space for keys and values. Very few people have machines with 128+GiB main memory, so no complaints yet. But you are right: extending the hash value to be a 64-bit quantity was "forgotten", mainly because it isn't a count of something - and being "count of something" was the primary criterion for the 2.5 changes. > The other area where I expected to hear wailing and gnashing of teeth is > users > compiling with third-party extensions that haven't been updated to a > Py_ssize_t > API and still use longs. I would have expected some instability due to the > size > mismatches in function signatures -- the difference would only show-up with > giant sized data structures -- the bigger they are, the harder they fall. > OTOH, > there have not been any compliants either -- I would have expected someone to > submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to > a > long so that the poster could make a reliable build that included non-updated > third-party extensions. On most 64-bit systems, there is also an option to run 32-bit programs (atleast on AMD64, Sparc-64, and PPC64 there is). So people are more likely to do that when they run into problems, rather than recompiling the 64-bit Python. > In the absence of a bug report, it's hard to know whether there is a real > problem. Have all major third-party extensions adopted Py_ssize_t or is some > divine force helping unconverted extensions work with converted Python code? I know Matthias Klose has fixed all extension modules in the entire Debian source to compile without warnings on 64-bit machines. They may not work all yet, but yes, for all modules in Debian, it has been fixed. Not sure whether Matthias is a divine force, but working for Canonical comes fairly close :-) > Maybe the datasets just haven't gotten big enough yet. Primarily that. We still have a few years ahead to find all bugs before people would start complaining that Python is unstable on 64-bit systems. By the time people would actually see problems, hopefully they all have been resolved. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
Michele Simionato wrote: >Raymond Hettinger verizon.net> writes: > > >>* Add a pure python named_tuple class to the collections module. I've been >>using the class for about a year and found that it greatly improves the >>usability of tuples as records. >>http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/500261 >> >> >[snip..] >4. I want help(MyNamedTuple) to work well; in particular it should > display the right module name. That means > that in the m dictionary you should add a __module__ attribute: > >__module__ = sys._getframe(1).f_globals['__name__'] > > > Hello all, If this is being considered for inclusion in the standard library, using _getframe' hackery will guarantee that it doesn't work with alternative implementations of Python (like IronPython at least which doesn't have Python stack frames). At least wrapping it in a try/except would be helpful. All the best, Michael Foord ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py_ssize_t
On 20 Feb, 2007, at 10:47, Raymond Hettinger wrote: > > The other area where I expected to hear wailing and gnashing of > teeth is users > compiling with third-party extensions that haven't been updated to > a Py_ssize_t > API and still use longs. I would have expected some instability > due to the size > mismatches in function signatures -- the difference would only show- > up with > giant sized data structures -- the bigger they are, the harder they > fall. OTOH, > there have not been any compliants either -- I would have expected > someone to > submit a patch to pyport.h that allowed a #define to force > Py_ssize_t back to a > long so that the poster could make a reliable build that included > non-updated > third-party extensions. Maybe that's because most sane 64-bit systems use LP64 and therefore don't have any problems with mixing Py_ssize_t and long. AFAIK Windows is the only major platform that doesn't use the LP64 model and 64-bit windows isn't used a lot. Ronald ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Py_ssize_t
After thinking more about Py_ssize_t, I'm surprised that we're not hearing about 64 bit users having a couple of major problems. If I'm understanding what was done for dictionaries, the hash table can grow larger than the range of hash values. Accordingly, I would expect large dictionaries to have an unacceptably large number of collisions. OTOH, we haven't heard a single complaint, so perhaps my understanding is off. The other area where I expected to hear wailing and gnashing of teeth is users compiling with third-party extensions that haven't been updated to a Py_ssize_t API and still use longs. I would have expected some instability due to the size mismatches in function signatures -- the difference would only show-up with giant sized data structures -- the bigger they are, the harder they fall. OTOH, there have not been any compliants either -- I would have expected someone to submit a patch to pyport.h that allowed a #define to force Py_ssize_t back to a long so that the poster could make a reliable build that included non-updated third-party extensions. In the absence of a bug report, it's hard to know whether there is a real problem. Have all major third-party extensions adopted Py_ssize_t or is some divine force helping unconverted extensions work with converted Python code? Maybe the datasets just haven't gotten big enough yet. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
Raymond Hettinger wrote: > The constructor signature has been experimented with > several time and had best results in its current form > which allows the *args for casting a record set returned > by SQL or by the CSV module as in Point(*fetchall(s)), I think you mean something like [Point(*tup) for tup in fetchall(s)], which I don't like for the reasons explained later. > and it allows for direct construction with Point(2,3) without the > slower and weirder form: Point((2,3)). Also, the current signature > works better with keyword arguments: Point(x=2, y=3) or > Point(2, y=3) which wouldn't be common but would be > consistent with the relationship between keyword arguments > and positional arguments in other parts of the language. I don't buy this argument. Yes, Point(2,3) is nicer than Point((2,3)) in the interactive interpreter and in the doctests, but in real life one has always tuples coming as return values from functions. Consider your own example, TestResults(*doctest.testmod()). I will argue that the * does not feel particularly good and that it would be better to just write TestResults(doctest.testmod()). Moreover I believe that having a subclass constructor incompatible with the base class constructor is very evil. First of all, you must be consistent with the tuple constructor, not with "other parts of the language". Finally I did some timing of code like this:: from itertools import imap Point = namedtuple('Point x y'.split()) lst = [(i, i*i) for i in range(500)] def with_imap(): for _ in imap(Point, lst): pass def with_star(): for _ in (Point(*t) for t in lst): pass and as expected the performances are worse with the * notation. In short, I don't feel any substantial benefit coming from the *args constructor. > The string form for the named tuple factory was arrived at > because it was easier to write, read, and alter than its original > form with a list of strings: >Contract = namedtuple('Contract stock strike volatility > expiration rate > iscall') > vs. >Contract = namedtuple('Contract', 'stock', 'strike', 'volatility', > 'expiration', 'rate', 'iscall') > That former is easier to edit and to re-arrange. Either form is trivial to > convert > programmatically to the other and the definition step only occurs > once while the > use of the new type can appear many times throughout the code. > Having experimented with both forms, I've found the string form to > be best thought it seems a bit odd. Yet, the decision isn't central to > the proposal and is still an open question. ``Contract = namedtuple('Contract stock strike volatility expiration rate iscall'.split())`` is not that bad either, but I agree that this is a second order issue. Michele Simionato ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
>> * I remembered why the __repr__ function had a 'show' argument. I've >> changed the name now to make it more clear and added a docstring. >> The idea was the some use cases require that the repr exactly match >> the default style for tuples and the optional argument allowed for that >> possiblity with almost no performance hit. > > But what about simply changing the __repr__? . . . > In [4]: Point.__repr__ = tuple.__repr__ Okay, that is the better way. Raymond ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Py2.6 ideas
Raymond Hettinger rcn.com> writes: > > More thoughts on named tuples after trying-out all of Michele's suggestions: > > * The lowercase 'namedtuple' seemed right only because it's a function, but > as a factory function, it is somewhat class-like. In use, 'NamedTuple' more > closely matches my mental picture of what is happening and distinguishes > what it does from the other two entries in collections, 'deque' and > 'defaultdict' > which are used to create instances instead of new types. This is debatable. I remember Guido using lowercase for metaclasses in the famous descrintro essay. I still like more the lowercase for class factories. But I will not fight on this ;) > * I remembered why the __repr__ function had a 'show' argument. I've > changed the name now to make it more clear and added a docstring. > The idea was the some use cases require that the repr exactly match > the default style for tuples and the optional argument allowed for that > possiblity with almost no performance hit. But what about simply changing the __repr__? In [2]: Point = NamedTuple('Point','x','y') In [3]: Point(1,2) Out[3]: Point(x=1, y=2) In [4]: Point.__repr__ = tuple.__repr__ In [5]: Point(1,2) Out[5]: (1, 2) It feels clearer to me. Michele Simionato ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com