Re: [Python-Dev] Can LOAD_GLOBAL be optimized to a simple array lookup?
2006/8/24, Brett Cannon [EMAIL PROTECTED]: On 8/23/06, K.S.Sreeram [EMAIL PROTECTED] wrote: Hi all, I noticed in Python/ceval.c that LOAD_GLOBAL uses a dictionary lookup, and was wondering if that can be optimized to a simple array lookup. No, not as the language stands now. If i'm right there are 3 kinds of name lookups: locals, outer scopes(closures), and globals. (not counting attribute lookup). Locals are identified by, either the presence of assignments, or their presence in the arg list. So all name lookups can be classified into the 3 types at compile/load time. Since we know, at load time, which names are global.. Can't we simply build a global name table and replace LOAD_GLOBALs with a lookup at the corresponding index into the global name table? But we don't know statically what the globals will be. You can import a module and put something in its global namespace externally. That is done after load time or compile time. I think that it can be implemented for the language as it stands now. I don't know whether it will be a good thing or not. In principle, you can add a feature to dict implementation that will allow it to notify when the value of a specific key changes. If you have that, you can change LOAD_GLOBAL implementation to: 1. look for the global. 2. ask for notification when the global dict changes in a way that will change the meaning of the global. 3. change the LOAD_GLOBAL opcode to something like LOAD_CONST, and set the notification from the dict to update the LOAD_CONST opcode to the new object. In that way, LOAD_GLOBAL will cause a dict lookup only once. Changing the value of globals will require more work, though. Again, I'm not saying that it's desired, just that it's possible. Have a good day, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] ANN: byteplay - a bytecode assembler/disassembler
Hello, I wanted to tell you that I wrote a Python bytecode assembler/disassembler, and would be happy if people tried it and said what they think. I send this message to this list because this module deals with pretty low-level Python, so I thought it might interest the audience here. If I was wrong - please forgive me. The idea is to define an object which is equivalent to Python's code object, but which is easy to work with. To explain what I mean, I'll show a quick example. We can define this stupid function: def f(a, b): ... print (a, b) We can convert it to an equivalent object, and see how it stores the byte code: from byteplay import * c = Code.from_code(f.func_code) from pprint import pprint; pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_FAST, 'b'), (BUILD_TUPLE, 2), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] We can change the bytecode easily, and see what happens: c.code[3:3] = [(ROT_TWO, None)] f.func_code = c.to_code() f(3, 5) (5, 3) The idea is basically the same as that of Michael Hudson's bytecodehacks, but this one works with Python 2.4 and 2.5. I also think that it's simpler to use. I borrowed some code from Phillip J. Eby's peak.util.assembler - the main difference between his package and mine is that mine lets you play with existing bytecode, not only create new code objects. I learned a lot about Python's bytecode from writing this, and I think that other may learn from it as well - I think it's much easier to understand how code objects work by understanding equivalent objects which were meant to be as simple as possible, instead of as fast as possible. I think it got pretty good testing - I patched __import__ so that after a module is imported, all function objects (found by the gc module) were disassembled and assembled again. I then managed to get the complete test suite to pass! You can download the module from http://byteplay.googlecode.com/svn/trunk/byteplay.py . I wrote a documentation (which goes into some length in purpose of explaining how bytecode works). It's on the wiki: http://wiki.python.org/moin/ByteplayDoc . I even thought that it might get to the standard library, because it seemed to me to be a pretty good documentation of bytecode details, and because it has a pretty straightforward interface. It also makes meddling with bytecode much less dangerous - for example, you can see Raymond's original recipe for binding constants at compile time (I reposted it at http://python.pastebin.com/768312) and how using byteplay makes it simple (posted at http://python.pastebin.com/768318 so you can view the diff). But, of course - it's up to you. I will be entirely satisfied if people simply find it useful. Have a good day, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Empty Subscript PEP on Wiki - keep or toss?
Hello, I posted it as a pre-PEP, in the hope that it may become a PEP and be accepted. As it happened, Guido said no at the end, so I stopped pushing the subject. I think that the main reason for the no was that my use case wasn't convincing enough - the objections were that this wasn't useful enough, not that it does anything harmful*. As the one who does think it's useful, I have the tiniest hope that if, in the future, people will become familiar with the package and see the usefulness of allowing empty subscript list, the decision will change. If the only result of me posting it as a PEP is a final rejected status that will prevent any chance of that happening, I don't think I'll bother to make it a PEP. If it's not the case, then I'll make it a PEP and post it. Have a good week, Noam * Yes, I know that adding unneeded feature to the language can be considered harmful. You may not agree with my distinction in this case. As it is, I barely consider this as an added feature - I would say it's mostly a small generalization. 2006/6/30, Georg Brandl [EMAIL PROTECTED]: [EMAIL PROTECTED] wrote: Noam Raphael posted an empty subscript PEP on the Python Wiki: http://wiki.python.org/moin/EmptySubscriptListPEP It's not linked to by any other pages on the wiki. Is there a reason it wasn't added to the peps repository? Perhaps the author forgot to submit it to the PEP editor, or he decided to abandon it after the mostly negative discussion here. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/noamraph%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
2006/6/18, Shane Hathaway [EMAIL PROTECTED]: Try to think more about how users will use your API. You haven't specified where those names (sheet1, income_tax, and profit) are coming from. What do you expect users of your library to do to bring those names into their namespace? That's a good question. I'm going to do some bytecode hacks! Something like this: from spreadsheetlib import SourceCube, CalculatedCube income_tax = SourceCube([]) income_tax[] = 0.18 years = set([2003, 2004, 2005]) profit = SourceCube([years]) profit[2003] = 1000; profit[2004] = 2000; profit[2005] = 2500 real_profit = CalculatedCube([years], lambda year: profit[year] / (1+ income_tax[])) print real_profit[2004] (1694.9152542372883) It may be what Talin meant about a higher level language, but I don't really change the language - I only inspect the function to see on what other changeable objects it depends. Those changeable objects implement some sort of change notification protocol, and it allows the system to automatically recalculate the result when one of the values it depends on changes. (Actually, I intend to change the function to depend directly on the changeable object instead of look it up every time in the global namespace, but I don't think that it changes the explanation.) Note that requiring that all changeable objects will be attributes of some other object won't remove the need for bytecode hacking: the only way is to explicitly specify a list of all the objects that the function depends on, and then give a function that gets these as arguments. This will really be inconvenient. But thanks for the suggestion! Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
Hello, 2006/6/16, Josiah Carlson [EMAIL PROTECTED]: I'm not a mathematician, and I don't really work with arrays of any dimensionality, so the need for 0-D subscripting via arr[] while being cute, isn't compelling to my uses for Python. Thanks for appreciating its cuteness... Now, I appreciate the desire to reduce code length and complexity, but from what I understand, the ultimate result of such a change to your code would be to go from: arr[()] to: arr[] I don't see how this can reduce lines of code in implementation or use. At most it is a two characters per use, and a change in documentation (specifying how you subscript 0-D arrays). If you can show an example where actual code line count is reduced with this change, I can't guarantee that I would get behind this proposal in a few months (if the conversation starts up again), but it may make me feel less that your proposal is essentially about aesthetics. I meant the extra code for writing a special class to handle scalars, if I decide that the x[()] syntax is too ugly or too hard to type, so I write a special class which will allow the syntax x.value. The extra parentheses might not seem to matter for code using that library, but I intend for people to use it directly, in an interactive way, just like you type an expression in a spreadsheet. I expect that for such a usage, the extra parentheses will be slightly unfun. I know that it's not such a big difference, but I'm not talking about a big change to the language either - it affects less than 20 lines of code (probably could be done with even less), and doesn't cause any problems with anything. I can imagine Guido designing the grammar, thinking, Should I allow an empty subscript list? No, why should anyone want such a thing? Besides, if someone wants them, we can always add them later. - at least, it may be how I would think if I designed a language. So now, a use was found. Indeed, it is farely rare. But why not to allow it now? Have a good week, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
2006/6/17, Martin v. Löwis [EMAIL PROTECTED]: Noam Raphael wrote: I meant the extra code for writing a special class to handle scalars, if I decide that the x[()] syntax is too ugly or too hard to type, so I write a special class which will allow the syntax x.value. What I cannot understand is why you use a zero-dimensional array to represent a scalar. Scalars are directly supported in Python: x = 5 I need a zero-dimensional array as a single cell - an object that holds a value that can change over time. It works just like a cell in a spreadsheet: For example, say that if you change the value of cell A1 to 0.18, cell A2 changes to 5. When using the library I design, you would write sheet1[0, 0] = 0.18, and, magically, sheet1[0, 1] will become 5. But in my library, everything is meaningful and doesn't have to be two-dimensional. So, if in the spreadsheet example, A1 meant the income tax rate, you would write income_tax[] = 0.18, and, magically, profit['Jerusalem', 2005] will become 5. I hope I managed to explain myself - my use case and why the simplest way to treat scalars like income_tax is as zero-dimensional arrays. Also, in an assignment, what are you putting on the right-hand side? A read access from another zero-dimensional array? I hope my example explained that, but you can put there any object - for example, you can write income_tax[] = 0.18 (If I didn't yet manage to explain myself, please say so - it seems that it's not a very simple example and I'm not a very good explainer, at least in English.) Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
Hi, sorry for my repeated posts. I just wanted to say that I improved my patch a little bit, so it does exactly the same thing, but with smaller code: you can see for yourself at http://python.pastebin.com/715221 - it changed exactly 10 lines of code, and adds additional 8 lines, all of them really short and obvious. I thought that it might convince someone that it's just a little generalization of syntax, nothing frightening... Noam 2006/6/17, Noam Raphael [EMAIL PROTECTED]: I know that it's not such a big difference, but I'm not talking about a big change to the language either - it affects less than 20 lines of code (probably could be done with even less), and doesn't cause any problems with anything. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
Hello, It seems to me that people don't object to my proposal, but don't find it useful to them either. The question is, what to do next. I guess one possibility is to raise this discussion again in a few months, when people will be less occupied with 2.5 beta. This is ok, although I would prefer a decision before that, because it might affect the design of the library - should I find a permanent workaround, or one that I know that will be removed in the future. If you do want to continue the discussion to reach a decision, please do. You can say that if nobody else on python-dev is interested, it shouldn't be implemented. You can examine my use case, say if you think it's reasonable, and suggest alternative solutions - or say that you see how allowing empty subscript list solves it elegantly (yes!) My point is, I don't want this discussion to naturally die because nobody is interested, since I am interested. So please say what you think should happen to it, so we can reach a conclusion. Now, if a the discussion is to continue, Nick proposed an alternative: 2006/6/11, Nick Coghlan [EMAIL PROTECTED]: For your specific use cases, though, I'd be inclined to tweak the API a bit, and switch to using attributes for the single-valued data: tax_rates.income_tax = 0.18 It's probably ok, although I would prefer not having to artificially group scalars just to make them attributes of something. I would prefer remaining with one object, and having something like income_tax.setvalue(), or even income_tax.value. Although the income tax rate should actually depend on the current financial year, since it can change over time as the government increases taxes ;) But that's exactly why I prefer writing simply income_tax[] = 0.18 when it's a constant, which is completely analogous to income_tax[2005] = 0.17; income_tax[2006] = 0.18 when it depends on something. By the way, another thing about consistency: A friend of mine brought the point that there isn't another example of forbidden empty brackets - [], {}, (), x() are all allowed. And about the other thing Nick said: I guess I'm really only -0 on the idea of x[] invoking x.__getitem__(), and allowing the class to decide whether or not to define a default value for the subscript. I wouldn't implement it myself, but I wouldn't object strenuously if Guido decided it was OK :) I would prefer an empty tuple, since invoking __getitem__ with no arguments would be a special case: for all other possible subscript lists, exactly one argument is passed to __getitem__. This leaves us with one special case: a subscript list with one item and without a trailing comma results in __getitem__ not getting a tuple, where in all other cases it does get a tuple. This works exactly like parentheses: they don't mean a tuple only when there's one item inside them and no trailing comma. Good bye, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
Hello, I'll try to answer the questions in one message. Sorry for not being able to do it until now. About the joke - it isn't, I really need it. About the timing - Of course, I can live with this getting into 2.6, and I think that I may even be able to stay alive if this were rejected. I still personally think that if people agree that it's a good idea it might get in, since there's almost nothing to be decided except for that - but of course, I can understand not wanting to rush things too much. About whether NumPy should return real scalars or 0-dimensional arrays - I don't know. About the use case - I think that's the real thing I didn't explain well and needs explanation, so I will try to do it better this time. I'm talking about something similar to a spreadsheet in that it saves data, calculation results, and the way to produce the results. However, it is not similar to a spreadsheet in that the data isn't saved in an infinite two-dimensional array with numerical indices. Instead, the data is saved in a few tables, each storing a different kind of data. The tables may be with any desired number of dimensions, and are indexed by meaningful indices, instead of by natural numbers. For example, you may have a table called sales_data. It will store the sales data in years from set([2003, 2004, 2005]), for car models from set(['Subaru', 'Toyota', 'Ford']), for cities from set(['Jerusalem', 'Tel Aviv', 'Haifa']). To refer to the sales of Ford in Haifa in 2004, you will simply write: sales_data[2004, 'Ford', 'Haifa']. If the table is a source of data (that is, not calculated), you will be able to set values by writing: sales_data[2004, 'Ford', 'Haifa'] = 1500. Tables may be computed tables. For example, you may have a table which holds for each year the total sales in that year, with the income tax subtracted. It may be defined by a function like this: lambda year: sum(sales_data[year, model, city] for model in models for city in cities) / (1 + income_tax_rate) Now, like in a spreadsheet, the function is kept, so that if you change the data, the result will be automatically recalculated. So, if you discovered a mistake in your data, you will be able to write: sales_data[2004, 'Ford', 'Haifa'] = 2000 and total_sales[2004] will be automatically recalculated. Now, note that the total_sales table depends also on the income_tax_rate. This is a variable, just like sales_data. Unlike sales_data, it's a single value. We should be able to change it, with the result of all the cells of the total_sales table recalculated. But how will we do it? We can write income_tax_rate = 0.18 but it will have a completely different meaning. The way to make the income_tax_rate changeable is to think of it as a 0-dimensional table. It makes sense: sales_data depends on 3 parameters (year, model, city), total_sales depends on 1 parameter (year), and income_tax_rate depends on 0 parameters. That's the only difference. So, thinking of it like this, we will simply write: income_tax_rate[] = 0.18 Now the system can know that the income tax rate has changed, and recalculate what's needed. We will also have to change the previous function a tiny bit, to: lambda year: sum(sales_data[year, model, city] for model in models for city in cities) / (1 + income_tax_rate[]) But it's fine - it just makes it clearer that income_tax_rate[] is a part of the model that may change its value. I hope that I managed to explain the use case better this time - please ask if my description isn't clear enough. Thanks for your comments, please send more, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
Hello, 2006/6/10, Nick Coghlan [EMAIL PROTECTED]: The closest parallel would be with return/yield, as those actually create real tuples the same way subscripts do, and allow the expression to be omitted entirely. By that parallel, however, an implicit subscript (if adopted) should be None rather than (). Adapting the table from the pre-PEP to describe return statements (and yield expressions): return i, j, k -- return (i, j, k) return i, j -- return (i, j) return i, -- return (i, ) return i-- return (i) return ()# (No implicit equivalent) return -- return None With the status quo, however, subscripts are simply equivalent to the RHS of an assignment statement in *requiring* that the expression be non-empty: x = i, j, k -- x = (i, j, k) x = i, j -- x = (i, j) x = i, -- x = (i, ) x = i-- x = (i) x = () # (No implicit equivalent) x = None # (No implicit equivalent) The PEP doesn't make a sufficiently compelling case for introducing yet-another-variant on the implicit behaviour invoked when a particular subexpression is missing from a construct. I hope that my (hopefully) better explanation made the use case more compelling, but I want to add two points in favour of an empty tuple: 1. If you want, you can ignore the x[(i, j, k)] equivalence completely, since it doesn't work all the times - for example, you can write x[1:2, 3:4], but you can't write x[(1:2, 3:4)]. You can think of x[i, j, k] as a syntax for specifying a cell in a 3-dimensional array, resulting in a call to x.__getitem__ with a 3-tuple describing the subscript for each dimension. In that view, x[], which is a syntax for specifying the cell of a 0-dimensional, should result in a __getitem__ call with an empty tuple, as there are no subscripts to be described. 2. My equivalencies are better than yours :-), since they are dealing with equivalencies for this specific syntax, while yours are dealing with similar properies of a syntax for doing something completely different. I guess I could have gone with my initial instinct of -1 and saved myself some mental exercise ;) Why? Mental exercise is a good way to keep you mental ;) Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses
Hello, Recently I discovered that a small change to the Python grammar that could help me a lot. It's simply this: Currently, the expression x[] is a syntax error. I suggest that it will be a valid syntax, and equivalent to x[()], just as x[a, b] is equivalent to x[(a, b)] right now. I discussed this in python-list, and Fredrik Lundh suggested that I quickly write a pre-PEP if I want this to go into 2.5. Since I want this, I wrote a pre-PEP. It's available in the wiki, at http://wiki.python.org/moin/EmptySubscriptListPEP and I also copied it to this message. I know that now is really close to 2.5b1, but I thought that perhaps there was still a chance for this suggestion getting in, since: * It's a simple change and there's almost nothing to be decided except whether to accept it or not. * It has a simple implementation (It was fairly easy for me to implement it, and I know almost nothing about the AST). * It causes no backwards compatibilities issues. Ok, here's the pre-PEP. Please say what you think about it. Have a good day, Noam PEP: XXX Title: Allow Empty Subscript List Without Parentheses Version: $Revision$ Last-Modified: $Date$ Author: Noam Raphael [EMAIL PROTECTED] Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 09-Jun-2006 Python-Version: 2.5? Post-History: 30-Aug-2002 Abstract This PEP suggests to allow the use of an empty subscript list, for example ``x[]``, which is currently a syntax error. It is suggested that in such a case, an empty tuple will be passed as an argument to the __getitem__ and __setitem__ methods. This is consistent with the current behaviour of passing a tuple with n elements to those methods when a subscript list of length n is used, if it includes a comma. Specification = The Python grammar specifies that inside the square brackets trailing an expression, a list of subscripts, separated by commas, should be given. If the list consists of a single subscript without a trailing comma, a single object (an ellipsis, a slice or any other object) is passed to the resulting __getitem__ or __setitem__ call. If the list consists of many subscripts, or of a single subscript with a trailing comma, a tuple is passed to the resulting __getitem__ or __setitem__ call, with an item for each subscript. Here is the formal definition of the grammar: :: trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME subscriptlist: subscript (',' subscript)* [','] subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop] sliceop: ':' [test] This PEP suggests to allow an empty subscript list, with nothing inside the square brackets. It will result in passing an empty tuple to the resulting __getitem__ or __setitem__ call. The change in the grammar is to make subscriptlist in the first quoted line optional: :: trailer: '(' [arglist] ')' | '[' [subscriptlist] ']' | '.' NAME Motivation == This suggestion allows you to refer to zero-dimensional arrays elegantly. In NumPy, you can have arrays with a different number of dimensions. In order to refer to a value in a two-dimensional array, you write ``a[i, j]``. In order to refer to a value in a one-dimensional array, you write ``a[i]``. You can also have a zero-dimensional array, which holds a single value (a scalar). To refer to its value, you currently need to write ``a[()]``, which is unexpected - the user may not even know that when he writes ``a[i, j]`` he constructs a tuple, so he won't guess the ``a[()]`` syntax. If the suggestion is accepted, the user will be able to write ``a[]`` in order to refer to the value, as expected. It will even work without changing the NumPy package at all! In the normal use of NumPy, you usually don't encounter zero-dimensional arrays. However, the author of this PEP is designing another library for managing multi-dimensional arrays of data. Its purpose is similar to that of a spreadsheet - to analyze data and preserve the relations between a source of a calculation and its destination. In such an environment you may have many multi-dimensional arrays - for example, the sales of several products over several time periods. But you may also have several zero-dimensional arrays, that is, single values - for example, the income tax rate. It is desired that the access to the zero-dimensional arrays will be consistent with the access to the multi-dimensional arrays. Just using the name of the zero-dimensional array to obtain its value isn't going to work - the array and the value it contains have to be distinguished. Rationale = Passing an empty tuple to the __getitem__ or __setitem__ call was chosen because it is consistent with passing a tuple of n elements when a subscript list of n elements is used. Also, it will make NumPy and similar packages work as expected for zero-dimensional arrays without any changes. Another hint for consistency: Currently, these equivalences hold: :: x[i, j, k] -- x[(i, j, k
Re: [Python-Dev] Alternative path suggestion
Hello all again! Thanks to Mike's suggestion, I now opened a new wiki page, AlternativePathDiscussion, in http://wiki.python.org/moin/AlternativePathDiscussion The idea is to organize the discussion by dividing it into multiple sections, and seeing what is agreed and what should be further discussed. I now wrote there what I think, and quoted a few opinions from other posts. The posts by others are only a minority - what's written there currently is mostly what I think. I'm sorry for the inconvinience, but please go there and post your opinions (you can copy and paste from your emails, of course). I apologize first for not replying to each post, and second for only writing my opinions in the wiki. I simply write pretty slowly in English, and am already developing a growing sleep-hours deficit. I hope you forgive me. Have a good day, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Saving the hash value of tuples
On 4/2/06, Guido van Rossum [EMAIL PROTECTED] wrote: I tried the change, and it turned out that I had to change cPickle a tiny bit: it uses a 2-tuple which is allocated when the module initializes to lookup tuples in a dict. I changed it to properly use PyTuple_New and Py_DECREF, and now the complete test suite passes. I run test_cpickle before the change and after it, and it took the same time (0.89 seconds on my computer). Not just cPickle. I believe enumerate() also reuses a tuple. Maybe it does, but I believe that it doesn't calculate the hash value of it - otherwise, the test suite would probably have failed. What do you think? I see three possibilities: 1. Nothing should be done, everything is as it should be. 2. The cPickle module should be changed to not abuse the tuple, but there's no reason to add an extra word to the tuple structure and break binary backwards compatibility. 3. Both should be changed. I'm -1 on the change. Tuples are pretty fundamental in Python and hashing them is relatively rare. I think the extra required space for all tuples isn't worth the potential savings for some cases. That's fine with me. But what about option 2? Perhaps cPickle (and maybe enumerate) should properly discard their tuples, so that if someone in the future decides that saving the hash value is a good idea, he won't encounter strange bugs? At least in cPickle I didn't notice any loss of speed because of the change, and it's quite sensible, since there's a tuple-reuse mechanism anyway. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Saving the hash value of tuples
Hello, I've found out that the hash value of tuples isn't saved after it's calculated. With strings it's different: the hash value of a string is calculated only on the first call to hash(string), and saved in the structure for future use. Saving the value makes dict lookup of tuples an operation with an amortized cost of O(1). Saving the hash value means that if an item of the tuple changes its hash value, the hash value of the tuple won't be changed. I think it's ok, since: 1. Hash value of things shouldn't change. 2. Dicts assume that too. I tried the change, and it turned out that I had to change cPickle a tiny bit: it uses a 2-tuple which is allocated when the module initializes to lookup tuples in a dict. I changed it to properly use PyTuple_New and Py_DECREF, and now the complete test suite passes. I run test_cpickle before the change and after it, and it took the same time (0.89 seconds on my computer). What do you think? I see three possibilities: 1. Nothing should be done, everything is as it should be. 2. The cPickle module should be changed to not abuse the tuple, but there's no reason to add an extra word to the tuple structure and break binary backwards compatibility. 3. Both should be changed. I will be happy to send a patch, if someone shows interest. Have a good day, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Saving the hash value of tuples
Ok, I uploaded it. Patch no. 1462796: https://sourceforge.net/tracker/index.php?func=detailaid=1462796group_id=5470atid=305470 On 4/1/06, Aahz [EMAIL PROTECTED] wrote: On Sat, Apr 01, 2006, Noam Raphael wrote: I've found out that the hash value of tuples isn't saved after it's calculated. With strings it's different: the hash value of a string is calculated only on the first call to hash(string), and saved in the structure for future use. Saving the value makes dict lookup of tuples an operation with an amortized cost of O(1). [...] I will be happy to send a patch, if someone shows interest. Regardless of whether anyone shows interest, please submit a patch! Then post the URL back here. That way if someone gets interested in the future, your code is still available. -- Aahz ([EMAIL PROTECTED]) * http://www.pythoncraft.com/ Look, it's your affair if you want to play with five people, but don't go calling it doubles. --John Cleese anticipates Usenet ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351
Hello, I just wanted to say this: you can reject PEP 351, please don't reject the idea of frozen objects completely. I'm working on an idea similar to that of the PEP, and I think that it can be done elegantly, without the concrete problems that Raymond pointed. I didn't work on it in the last few weeks, because of my job, but I hope to come back to it soon and post a PEP and a reference implementation in CPython. My quick responses, mostly to try to convince that I know a bit about what I'm talking about: First about the last point: I suggest that the function will be named frozen(x), which suggests that nothing happens to x, you only get a frozen x. I suggest that this operation won't be called freezing x, but making a frozen copy of x. Now, along with the original order. Frozen dicts - if you want, you can decide that dicts aren't frozenable, and that's ok. But if you do want to make frozen copies of dicts, it isn't really such a problem - it's similar to hashing a tuple, which requires recursive hashing of all its elements; for making a frozen copy of a dict, you make a frozen copy of all its values. Treating all containers polymorphically - I don't suggest that. In my suggestion, you may have frozen lists, frozen tuples (which are normal tuples with frozen elements), frozen sets and frozen dicts. Treating tuples as frozen lists - I don't suggest to do that. But if my suggestion is accepted, there would be no need for tuples - frozen lists would be just as useful. And about the other concerns: More important than all of the above is the thought that auto-freezing is like a bad C macro, it makes too much implicit and hides too much -- the supported methods change, there is a issue keeping in sync with the non-frozen original, etc. In my experience with frozensets, I've learned that freezing is not an incidental downstream effect; instead, it is an intentional, essential part of the design and needs to be explicit. I think these concerns can only be judged given a real suggestion, along with an implementation. I have already implemented most of my idea in CPython, and I think it's elegant and doesn't cause problems. Of course, I may not be objective about the subject, but I only ask to wait for the real suggestion before dropping it down. To summarize, I see the faults in PEP 351. I think that another, fairly similar idea might be a good one. Have a good week, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
Hello, I thought about another reason to resize the hash table when it has too few elements. It's not only a matter of memory usage, it's also a matter of time usage: iteration over a set or a dict requires going over all the table. For example, iteration over a set which once had 1,000,000 members and now has 2 can take 1,000,000 operations every time you traverse all the (2) elements. Apologies: 1. It may be trivial to you - I'm sorry, I thought about it just now. 2. You can, of course, still do whatever tradeoff you like. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
On 12/31/05, Raymond Hettinger [EMAIL PROTECTED] wrote: [Noam] For example, iteration over a set which once had 1,000,000 members and now has 2 can take 1,000,000 operations every time you traverse all the (2) elements. Do you find that to be a common or plausible use case? I don't have a concrete example in this minute, but a set which is repeatedly filled with elements and then emptied by pop operations doesn't seem to me that far-fetched. Was Guido's suggestion of s=set(s) unworkable for some reason? dicts and sets emphasize fast lookups over fast iteration -- apps requiring many iterations over a collection may be better off converting to a list (which has no dummy entries or empty gaps between entries). It's workable, but I think that most Python programmers haven't read that specific message, and are expecting operations which should take a short time to take a short time. Converting to a list won't help the use-case above, and anyway, it's another thing that I wouldn't expect anyone to do - there's no reason that iteration over a set should take a long time. (I'm speaking of my point of view, which I believe is common. I don't expect programs I write in Python to be super-fast - if that were the case, I would write them in C. I do expect them to take a reasonable amount of time, and in the case of iteration over a set, that means a time proportional to the number of elements in the set.) Would the case be improved by incurring the time cost of 999,998 tests for possible resizing (one for each pop) and some non-trivial number of resize operations along the way (each requiring a full-iteration over the then current size)? I believe it would. It seems to me that those 999,998 tests take not much more than a machine clock, which means about 1 milisecond on todays computers. Those resize operations will take some more miliseconds. It all doesn't really matter, since probably all other things will take much more. I now run this code s = set() for j in xrange(100): ... s.add(j) ... while s: ... tmp = s.pop() ... And it took 2.4 seconds to finish. And it's okay - I'm just saying that a few additional clock ticks per operation will usually not matter when the overall complexity is the same, but changes in order of complexity can matter much more. Even if this unique case could be improved, what is the impact on common cases? Would a downsizing scheme risk thrashing with the over-allocation scheme in cases with mixed adds and pops? I think that there shouldn't be additional damage beyond those clock ticks. The simple method I took from Introduction to Algorithms works no matter what sequence of adds and pops you have. Is there any new information/research beyond what has been obvious from the moment the dict resizing scheme was born? I wanted to say that there isn't any new information, and yet I don't think that I have to assume that everything in current Python is the best that can be. All I did was finding another reason why a downsizing scheme might be good, and posting it to ask if people have thought about it. If you have a document listing all the design decisions that went into dict implementation, then please send it to me and I won't ask about things that were already thought about. But the answer is, yes. I beleive that the current dict resizing scheme was born before the iterator protocol was introduced, and it may be a reason why the current scheme doesn't try to minimize the number of empty hash table entries. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
On 12/29/05, Fredrik Lundh [EMAIL PROTECTED] wrote: Noam Raphael wrote: I'm not saying that practically it must be used - I'm just saying that it can't be called a heuristic, and that it doesn't involve any fancy overkill size hinting or history tracking. It actually means something like this: 1. If you want to insert and the table is full, resize the table to twice the current size. 2. If you delete and the number of elements turns out to be less than a quarter of the size of the table, resize the table to half of the current size. sure sounds like a heuristic algorithm to me... (as in not guaranteed to be optimal under all circumstances, even if it's probably quite good in all practical cases) I'm not saying it's optimal, but it is really amortized O(1) per insert/delete. I looked up in Introduction to Algorithms for this, and it has a complicated explanation. A simple explanation is that after every resize the table is exactly half-full. Let's say it has n elements and the table size is 2*n. To get to the next resize, you have to do at least n/2 removals of elements, or n insertion of elements. After that, you do a resize operation. In either case, you do an O(n) resize operation after at least O(n) insertions/removals which are O(1) operations. This means that the toal cost remains O(n) per n simple operations, which you can say is O(1) per simple operation. I hope that if you read this slowly it makes sense... Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A few questions about setobject
On 12/28/05, Martin v. Löwis [EMAIL PROTECTED] wrote: The setentry typedef clearly violates the principles of the API, so it should be renamed. (And so will _setobject, although I think it will be much easier to remove it.) Perhaps the header file should stick with writing struct { long hash; PyObject *key; } three times (or define it in a macro and then undefine it), and the typedef be left to the .c file? That would not be conforming to the C language standard (although accepted by most compilers). Can you explain why it is not conforming to the standard? Can't a typedef be used intechangably with the original type? Not sure what this refers to in your message: the text of the C API documentation certainly is desirable as it stands (although it should be clearer as to whether struct names should be prefixed). Even if it is, it seems that the second sentence contradicts the first sentence. Why does that seem so? To quote, the first sentence is 'All user visible names defined by Python.h (except those defined by the included standard headers) have one of the prefixes Py or _Py.' and the second sentence is 'Names beginning with _Py are for internal use by the Python implementation and should not be used by extension writers.' I cannot see any contradiction between these. Oops. It's the first and the third: The first: All user visible names defined by Python.h (except those defined by the included standard headers) have one of the prefixes Py or _Py. The third: Structure member names do not have a reserved prefix. I think that structure member names refers to things like setentry. The third sentence contradicts the first since structure member names are user visible names. Anyway, it seems to me best if the third sentence will be removed and all names will start with Py or _Py. I think it should be ok because it's never used really as a PyObject. Am I missing something? (Ok, I now thought that maybe it's because some parts don't treat dummy elements specially. But it seems to me that most parts do treat them specially, so perhaps it would be better to make a few changes so that all parts will treat them specially?) In principle, you are right. One place that doesn't special-case the dummy is set_clear_internal (in fact, the Py_DEBUG assertion is completely useless there, AFAICT). The tricky question is: can we be certain that we find all places, in all code paths, where we have to special-case the dummy? Having PyObject* which don't point to PyObject is error-prone. Also, what would we gain? If you think it is speed: I doubt it. In one place, a comment suggests that actually seeing the dummy element is so much more unlikely than the other cases that, for performance, the test for the dummy is done last. We would lose additional speed in the cases where the dummy isn't yet considered. Ok, I tried. It took me 25 minutes to change the code, while going over every occurence of key and decref in setobject.c. (It seems to me that the dummy element can only be accessed from entry-key.) Most of the code isn't bothered by the dummy element, since it uses the data structure in a more abstract way. I think that it simplifies the code, while adding a condition in only two places, which I don't think should make anything noticably slower. The result passes the complete test suite. In one sentence: I think that it makes the code slightly better, and the risk is pretty low. I thought to post it as a patch, but sourceforge didn't work for me, and it's not that long, so I paste it at the end of this message. Feel free to do whatever you want with it. 3) The type of the result of a binary operator applied on a set and a frozenset is the type of the left set. You are welcomed to ignore this, but I just wanted to say that it seems to me better to make the operator symmetric, and to return a frozenset only if both sets are frozen. How would you implement this? The result is obtained from copying the left operand, and then applying the other operand. This is done so that set subtyping becomes possible: class myset(set):pass ... x=myset([2,6]) y=set([2,6]) x.union(y) myset([2, 6]) So if the result is not obtained by copying the left operand first, how would you compute the result type, so that this example still works? The behaviour will change to work like in other types - the returned value will be of the base type: class MyInt(int): pass ... x = MyInt(3) y = 5 x.__add__(y) 8 I'm not talking about backwards compatibility - I'm just currently asking if others also feel that the symmetric version is preferable. Ok, here's the diff: === modified file 'Objects/setobject.c' --- Objects/setobject.c +++ Objects/setobject.c @@ -13,8 +13,12 @@ /* This must be = 1. */ #define PERTURB_SHIFT 5 -/* Object used as dummy key to fill deleted entries */ -static PyObject *dummy = NULL; /* Initialized by first call to make_new_set() */ +/*
Re: [Python-Dev] Keep default comparisons - or add a second set?
And another example: a = 1+2j b = 2+1j a b Traceback (most recent call last): File stdin, line 1, in module TypeError: no ordering relation is defined for complex numbers I came to think that, when forgetting backwards compatibility for a while, the best thing for comparison operators to do is to raise a TypeError by default, and work only for types that it makes sense to compare. I think it is more explicit is better than implicit, and I have now two reasons for that: 1. Things like Decimal(3.0) == 3.0 will make more sense (raise an exception which explains that Decimals should not be compared to floats, instead of returning false constantly). 2. It is more forward compatible - when it is discovered that two types can sensibly be compared, the comparison can be defined, without changing an existing behaviour which doesn't raise an exception. Perhaps you can explain to me again why arbitrary objects should be comparable? I don't see why sorting objects according to values should work when the order has no real meaning. I don't see why you need all objects to be comparable for storing them in containers with the behaviour of dict or set. If the reason is that you want containers that work among multiple sessions, and are identity-based (that is, only one object can be used as a key), you can attach to each object an id that isn't session dependent, and use that instead of the default memory address. It may be a reason for dropping the default hash is id: suppose that you want a persistent storage that will work like dicts but will not be associated with one Python session (it may be exactly Zope's BTrees - I don't know). Currently you have a problem with using __hash__ for that purpose, since the hash value of an object can change between sessions - that happens when it's the id of the object. Now suppose that we have a persistent id function - it can be implemented by using weakrefs to associate a unique value with an object on the first time that the function is called, and storing it with the object when serializing it. Also suppose that we drop the default hash method, so where currently hash(x) is id(x), hash(x) will raise a TypeError. Then our persistent storage can use the persistent id instead of the default id, and it will work. (You might not want mutable objects to be used as keys, but that's another problem - the data structure will be consistent anyway.) In fewer words, the built-in id() is just one way to assign identities to objects. __hash__ shouldn't use it implicitly when there's no value-based hash value - if it wouldn't, the rule that x == y - hash(x) == hash(y) will be preserved also between different sessions, so persistent objects would be able to use hash values. Does it make sense to you? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keep default comparisons - or add a second set?
On 12/28/05, Adam Olsen [EMAIL PROTECTED] wrote: I agree for greaterthan and lessthan operators but I'm not convinced for equality. Consider this in contrast to your example: a = 1+2j b = 1+2j a is b False a == b True Complex numbers can't be sorted but they can be tested for equality. Decimal numbers can't currently be tested for equality against a float but there's no loss-of-accuracy argument to prevent it (just a difficulty of implementation one.) If the comparison is to fail I would prefer an exception rather than an id-based fallback though. I think we agree. I don't think that every type that supports equality comparison should support order comparison. I think that if there's no meaningful comparison (whether equality or order), an exception should be raised. Speaking of id, there's no reason why id(a) == id(b) has to fail for mismatched types in the face of persistence so long as the result of id() has the same lifetime as the target object. This could be done using weakrefs or by making an id type with a strong reference to the target object. I don't mean to change the current behaviour of id() - I just meant that an additional one may be implemented, possible by a specific library (Zope, for instance), so the built-in one shouldn't be used as a fallback default. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Keep default comparisons - or add a second set?
On 12/29/05, Robert Brewer [EMAIL PROTECTED] wrote: Just to keep myself sane... def date_range(start=None, end=None): if start == None: start = datetime.date.today() if end == None: end = datetime.date.today() return end - start Are you saying the if statements will raise TypeError if start or end are dates? That would be a sad day for Python. Perhaps you're saying that there is a meaningful comparison between None and anything else, but please clarify if so. Yes, I'm suggesting that they will raise a TypeError. Your example shows that the change is not compatible with a lot of existing Python code, which means that it's a Python 3000 thing. The following code will continue to work: def date_range(start=None, end=None): if start is None: start = datetime.date.today() if end is None: end = datetime.date.today() return end - start Using is None instead of == None is considered a better style even now. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] set.copy documentation string
is currently Return a shallow copy of a set. Perhaps shallow should be removed, since set members are supposed to be immutable so there's no point in a deep copy? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] When do sets shrink?
Hello, If I do something like this: s = set() for i in xrange(100): s.add(i) while s: s.pop() gc.collect() the memory consumption of the process remains the same even after the pops. I checked the code (that's where I started from, really), and there's nothing in set.pop or set.remove that resizes the table. And it turns out that it's the same with dicts. Should something be done about it? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] set.copy documentation string
On 12/29/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Noam Raphael wrote: is currently Return a shallow copy of a set. Perhaps shallow should be removed, since set members are supposed to be immutable so there's no point in a deep copy? That still doesn't make copy return a deep copy, right? shallow copy is more precise than copy, and correct - what is gained from removing it? Perhaps it bothers the programmer with something that shouldn't bother him. I mean that I might do help(set.copy), and then think, Oh, it returns a shallow copy. Wait a minute - 'shallow' means that I get a new object, which references the same objects as the old one. Perhaps I should use another function, which does deep copying? Let me think about it - no. All members of a set are immutable, so it doesn't matter. I think that in this case, the fact that the copy is shallow is an implementation detail, since there's no sense in making a deep copy of a set. Implementation details should probably not be a part of the definition of what a method does. I know it's just a word, and that it doesn't matter a lot. But why not make things a tiny bit better? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
On 12/29/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Noam Raphael wrote: Should something be done about it? It's very difficult to do something useful about it. Even if you manage to determine how much memory you want to release, it's nearly impossible to actually release the memory to the operating system, because of the many layers of memory management software that all need to agree that the memory should be reused somewhere else (instead of keeping it on that layer, just in case somebody using that layer wants some memory). I checked - when doing the same thing with lists, all the memory was released for use by other Python objects, and most of it was released for use by the operating system. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] set.copy documentation string
On 12/29/05, Martin v. Löwis [EMAIL PROTECTED] wrote: If makes no sense means would not make a difference, then you are wrong. Objects in a set are not necessarily unmodifiable; they just have to be hashable. Oh, you are right. I thought so much about dropping default hash=id, or more generally that only frozen objects should be hashable, that I forgot that it's not the case yet... :) (I used the term frozen instead of immutable since I think that immutable is not defined very well, because tuples are considered immutable even though their value can change if they reference mutable objects.) Thanks, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When do sets shrink?
On 12/29/05, Raymond Hettinger [EMAIL PROTECTED] wrote: What could be done is to add a test for excess dummy entries and trigger a periodic resize operation. That would make the memory available for other parts of the currently running script and possibly available for the O/S. The downside is slowing down a fine-grained operation like pop(). For dictionaries, this wasn't considered worth it. For sets, I made the same design decision. It wasn't an accident. I don't plan on changing that decision unless we find a body of real world code that would be better-off with more frequent re-sizing. The computer scientist in me prefers O() terms over changes in a constant factor, but that's only me. Perhaps a note about it should be added to the documentation, though? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] A few questions about setobject
Hello, I'm going over setobject.c/setobject.h, while trying to change them to support cheap frozen-copying. I have a few questions. 1) This is a part of setobject.h: typedef struct { long hash; PyObject *key; } setentry; typedef struct _setobject PySetObject; struct _setobject { ... setentry *table; setentry *(*lookup)(PySetObject *so, PyObject *key, long hash); setentry smalltable[PySet_MINSIZE]; ... }; It seems to me that setentry and _setobject are defined for every file that includes Python.h. In the Python C API, in the section about include files, it is written that: All user visible names defined by Python.h (except those defined by the included standard headers) have one of the prefixes Py or _Py. Names beginning with _Py are for internal use by the Python implementation and should not be used by extension writers. Structure member names do not have a reserved prefix. Is this desirable? Even if it is, it seems that the second sentence contradicts the first sentence. Perhaps the header file should stick with writing struct { long hash; PyObject *key; } three times (or define it in a macro and then undefine it), and the typedef be left to the .c file? 2) The hash table used by sets uses a dummy element for deleted entries. The implementation goes into the trouble of allocating it, managing its reference count, and deallocating it at the end. What is the reason for that? It seems to me that the only requirement of the dummy element is that it shouldn't be a pointer to a valid PyObject, and as such I would think that defining it like int dummy_int; PyObject *dummy = (PyObject *)(dummy_int); would be enough, and that it shouldn't be INCREFed or DECREFed every time it is used. I think it should be ok because it's never used really as a PyObject. Am I missing something? (Ok, I now thought that maybe it's because some parts don't treat dummy elements specially. But it seems to me that most parts do treat them specially, so perhaps it would be better to make a few changes so that all parts will treat them specially?) 3) The type of the result of a binary operator applied on a set and a frozenset is the type of the left set. You are welcomed to ignore this, but I just wanted to say that it seems to me better to make the operator symmetric, and to return a frozenset only if both sets are frozen. If you think that these questions belong to c.l.py, then please say so and I will go away. Have a good day, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] A missing piece of information in weakref documentation
On 12/12/05, Aahz [EMAIL PROTECTED] wrote: Please submit a doc patch to SF (or even just a bug report if you don't have time). The patch may be plain text or reST; no need for Latex. Done - patch number 1379023. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison
On 11/27/05, Martin v. Löwis [EMAIL PROTECTED] wrote: Noam Raphael wrote: I would greatly appreciate repliers that find a tiny bit of reason in what I said (even if they don't agree), and not deny it all as a complete load of rubbish. I don't understand what your message is. With this posting, did you suggest that somebody does something specific? If so, who is that one, and what should he do? Perhaps I felt a bit attacked. It was probably my fault, and anyway, a general message like this is not the proper way - I'm sorry. Anyway, a lot of your posting is what I thought was common knowledge; and with some of it, I disagree. This is fine, of course. We may want to compare wheels based on value, for example to make sure that all the car's wheels fit together nicely: assert car.wheel1 == car.wheel2 == car.wheel3 == car.wheel4. I would never write it that way. This would suggest that the wheels have to be the same. However, this is certainly not true for wheels: they have to have to be of the same make. Now, you write that wheels only carry manufacturer and diameter. However, I would expect that wheels grow additional attributes over time, like whether they are left or right, and what their wear level is. So to write your property, I would write car.wheel1.manufacturer_and_make() == car.wheel2.manufacturer_and_make() == car.wheel3.manufacturer_and_make() == car.wheel4.manufacturer_and_make() You may be right in the case of wheels. From time to time, in the real (programming) world, I encounter objects that I wish to compare by value - this is certainly the case for built-in objects, but is sometimes the case for more complex objects. We may want to associate values with wheels based on their values. For example, it's reasonable to suppose that the price of every wheel of the same model is the same. In that case, we'll write: price[wheel] = 25. Again, I would not write it this way. I would find wheel.price() Many times the objects are not yours to add attributes, or may have __slots__ defined. The truth is that I prefer not to add attributes to external objects even when it's possible. most natural. If I have the notion of a price list, then I would try to understand what the price list is keyed-by, e.g. model number: price[wheel.model] = 25 Sometimes there's no key - it's just the state of the object (what if wheels don't have a model number?) Now again, how will we say that a specific wheel is broken? Like this: broken[Ref(wheel)] = True If I want things to be keyed by identity, I would write broken = IdentityDictionary() ... broken[wheel] = True although I would prefer to write wheel.broken = True I personally prefer the first method, but the second one is ok too. I think that most objects, especially most user-defined objects, have a *value*. I don't have an exact definition, but a hint is that two objects that were created in the same way have the same value. Here I disagree. Consider the wheel example. I would expect that a wheel has a wear level or some such, and that this changes over time, and that it belongs to the value of the wheel (value being synonym to state). As this changes over time, it is certainly not that the object is created with that value. Think of lists: what is their value? Are they created with it? My tounge failed me. I meant: created in the same way = have gone through the same series of actions. That is: a = []; a.append(5); a.extend([2,1]); a.pop() b = []; b.append(5); b.entend([2,1]); b.pop() a == b Sometimes we wish to use the identity of objects as a dictionary key or as a set member - and I claim that we should do that by using the Ref class, whose *value* is the object's *identity*, or by using a dict/set subclass, and not by misusing the __hash__ and __eq__ methods. I think we should a specific type of dictionary then. That's OK too. My point was that the one who uses the objects should explicitly specify whether he means value-based of identity-based lookup. This means that if an object has a value, it should not make __eq__ and __hash__ be identity-based just to make identity-based lookup easier and implicit. I think that whenever value-based comparison is meaningful, the __eq__ and __hash__ should be value-based. Treating objects by identity should be done explicitly, by the one who uses the objects, by using the is operator or the Ref class. It should not be the job of the object to decide which method (value or identity) is more useful - it should allow the user to use both methods, by defining __eq__ and __hash__ based on value. If objects are compared for value equality, the object should decide which part of its state goes into that comparison. It may be that two objects compare equal even though their state is memberwise different: Rational(1,2) == Rational(5,10) I completely agree. Indeed, the value of an object is in many times
Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison
On 11/27/05, Samuele Pedroni [EMAIL PROTECTED] wrote: well, this still belongs to comp.lang.python. ... not if you think python-dev is a forum for such discussions on OO thinking vs other paradigms. Perhaps my style made it look like a discussion on OO thinking vs other paradigms, but my conclusion is exactly about the issue of this thread - Jim suggested to drop default __hash__ and __eq__ for Python 3K. Guido decided not to, because it's useful to use them for identity-based comparison and lookup. I say that I disagree, because I think that __hash__ and __eq__ should be used for value-based comparison and lookup, and because if the user of the object does explicit identity-based comparison/lookup, it doesn't matter to him whether __hash__ and __eq__ are defined or not. I also suggested, in a way, that it's OK to define a default value-based __eq__ method. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison
Three weeks ago, I read this and thought, well, you have two options for a default comparison, one based on identity and one on value, both are useful sometimes and Guido prefers identity, and it's OK. But today I understood that I still think otherwise. In two sentences: sometimes you wish to compare objects according to identity, and sometimes you wish to compare objects according to values. Identity-based comparison is done by the is operator; Value-based comparison should be done by the == operator. Let's take the car example, and expand it a bit. Let's say wheels have attributes - say, diameter and manufacturer. Let's say those can't change (which is reasonable), to make wheels hashable. There are two ways to compare wheels: by value and by identity. Two wheels may have the same value, that is, they have the same diameter and were created by the same manufacturer. Two wheels may have the same identity, that is, they are actually the same wheel. We may want to compare wheels based on value, for example to make sure that all the car's wheels fit together nicely: assert car.wheel1 == car.wheel2 == car.wheel3 == car.wheel4. We may want to compare wheels based on identity, for example to make sure that we actually bought four wheels in order to assemble the car: assert car.wheel1 is not car.wheel2 and car.wheel3 is not car.wheel1 and car.wheel3 is not car.wheel2... We may want to associate values with wheels based on their values. For example, it's reasonable to suppose that the price of every wheel of the same model is the same. In that case, we'll write: price[wheel] = 25. We may want to associate values with wheels based on their identities. For example, we may want to note that a specific wheel is broken. For this, I'll first define a general class (I defined it before in one of the discussions, that's because I believe it's useful): class Ref(object): def __init__(self, obj): self._obj = obj def __call__(self): return self._obj def __eq__(self, other): return isinstance(other, ref) and self._obj is other._obj def __hash__(self): return id(self._obj) ^ 0xBEEF Now again, how will we say that a specific wheel is broken? Like this: broken[Ref(wheel)] = True Note that the Ref class also allows us to group wheels of the same kind in a set, regardless of their __hash__ method. I think that most objects, especially most user-defined objects, have a *value*. I don't have an exact definition, but a hint is that two objects that were created in the same way have the same value. Sometimes we wish to compare objects based on their identity - in those cases we use the is operator. Sometimes we wish to compare objects based on their value - and that's what the == operator is for. Sometimes we wish to use the value of objects as a dictionary key or as a set member, and that's easy. Sometimes we wish to use the identity of objects as a dictionary key or as a set member - and I claim that we should do that by using the Ref class, whose *value* is the object's *identity*, or by using a dict/set subclass, and not by misusing the __hash__ and __eq__ methods. I think that whenever value-based comparison is meaningful, the __eq__ and __hash__ should be value-based. Treating objects by identity should be done explicitly, by the one who uses the objects, by using the is operator or the Ref class. It should not be the job of the object to decide which method (value or identity) is more useful - it should allow the user to use both methods, by defining __eq__ and __hash__ based on value. Please give me examples which prove me wrong. I currently think that the only objects for whom value-based comparison is not meaningful, are objects which represent entities which are outside of the process, or in other words, entities which are not computational. This includes files, sockets, possibly user-interface objects, loggers, etc. I think that objects that represent purely data, have a value that they can be compared according to. Even wheels that don't have any attributes are simply equal to other wheels, and not equal to other objects. Since user-defined classes can interact with the environment only through other objects or functions, it is reasonable to suggest that they should get a value-based equality operator. Many times the value is defined by the __dict__ and __slots__ members, so it seems to me a reasonable default. I would greatly appreciate repliers that find a tiny bit of reason in what I said (even if they don't agree), and not deny it all as a complete load of rubbish. Thanks, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.dedent
On 11/19/05, Steven Bethard [EMAIL PROTECTED] wrote: You are missing an important point here: There are intentionally no line breaks in this string; it must be a single line, or else showerror will break it in funny ways. So converting it to a multi-line string would break it, dedent or not. Only if you didn't include newline escapes, e.g.:: msg = textwrap.dedent('''\ IDLE's subprocess can't connect to %s:%d. This may be due \ to your personal firewall configuration. It is safe to \ allow this internal connection because no data is visible on \ external ports.''' % address) Unfortunately, it won't help, since the 'dedent' method won't treat those spaces as indentation. But if those messages were printed to the standard error, the line breaks would be ok, and the use case valid. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.dedent
On 11/14/05, M.-A. Lemburg [EMAIL PROTECTED] wrote: We have to draw a line somewhere - otherwise you could just as well add all functions that accept single string arguments as methods to the basestring sub-classes. Please read my first post in this thread - I think there's more reason for 'dedent' to be a string method than there is, for example, for 'expandtabs', since it allows you to write clearer code. The point is that the presented use case does not originate in a common need (to dedent strings), but from a desire to write Python code with embedded indented triple-quoted strings which lies in the scope of the parser, not that of string objects. That's a theoretical argument. In practice, if you do it in the parser, you have two options: 1. Automatically dedent all strings. 2. Add a 'd' or some other letter before the string. Option 1 breaks backwards compatibility, and makes the parser do unexpected things. Option 2 adds another string-prefix letter, which is confusing, and it will also be hard to find out what that letter means. On the other hand, adding .dedent() at the end is very clear, and is just as easy. Now, about performance, please see the message I'll post in a few minutes... Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.dedent
Just two additional notes: On 9/15/05, Raymond Hettinger [EMAIL PROTECTED] wrote: -1 Let it continue to live in textwrap where the existing pure python code adequately serves all string-like objects. It's not worth losing the duck typing by attaching new methods to str, unicode, UserString, and everything else aspiring to be string-like. It may seem like the 'dedent' code would have to be written a lot of times, but I've checked the examples. It may be needed to write different versions for 'str' and for 'unicode', but these are going to be unified. In UserString you'll have to add exactly one line: def dedent(self): return self.data.dedent() I've just taken the line created for 'isalpha' and replaced 'isalpha' with 'dedent'. So in the long run, there will be exactly one implementation of 'dedent' in the Python code. (I don't know of any other objects which try to provide the full string interface.) Another reason for prefering a 'dedent' method over a 'dedent' function in some module, is that it allows sometime in the future to add an optimization to the compiler, so that it will dedent the string in compile time (this can't work for a function, since the function is found in run time). This will solve the performance problem completely, so that there will be an easy way to write multilined strings which do not interfere with the visual structure of the code, without the need to worry about performance. I'm not saying that this optimization has to be done now, just that 'dedent' as a method makes it possible, which adds to the other arguments for making it a method. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] str.dedent
On 11/14/05, Fredrik Lundh [EMAIL PROTECTED] wrote: so is putting the string constant in a global variable, outside the scope you're in, like you'd do with any other constant. Usually when I use a constant a single time, I write it where I use it, and don't give it a name. I don't do: messagea = The value of A is ... (a long class definition) print messagea, A This is what I mean when I say constant - a value which is known when I write the code, not necessarily an arbitrary value that may change, so I write it at the beginning of the program for others to know it's there. There's no reason why multilined strings that are used only once should be defined at the beginning of a program (think about a simple CGI script, which prints HTML parts in a function.) (how about a new rule: you cannot post to a zombie thread on python- dev unless they've fixed/reviewed/applied or otherwise processed at least one tracker item earlier the same day. there are hundreds of items on the bugs and patches trackers that could need some loving care) I posted to this thread because it was relevant to a new post about dedenting strings. Anyway, I looked at bug 1356720 (Ctrl+C for copy does not work when caps-lock is on), and posted there a very simple patch which will most probably solve the problem. I also looked at bug 1337987 (IDLE, F5 and wrong external file content. (on error!)). One problem it raises is that IDLE doesn't have a revert command and that it doesn't notice if the file was changed outside of IDLE. I am planning to fix it. The other problem that is reported in that bug is that exceptions show misleading code lines when the source file was changed but wasn't loaded into Python. Perhaps in compiled code, not only the file name should be written but also its modification time? This way, when tracebacks print lines of changed files, they can warn if the line might not be the right line. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter
On 11/15/05, Fredrik Lundh [EMAIL PROTECTED] wrote: If you want to write portable code that keeps things running in the background while the users hack away at the standard interactive prompt, InputHook won't help you. So probably it should be improved, or changed a bit, to work also on Windows. Or perhaps it's Tkinter. Anyway, what I'm saying is - don't use threads! Process events in the main thread while it doesn't run the user's Python code. If he runs another thread - that's his problem. The implicit event loop should never execute Python code while a user's Python code is running in the main thread. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter
On 11/13/05, Greg Ewing [EMAIL PROTECTED] wrote: Noam Raphael wrote: All that is needed to make Tkinter and Michiels' code run together is a way to say add this callback to the input hook instead of the current replace the current input hook with this callback. Then, when the interpreter is idle, it will call all the registered callbacks, one at a time, and everyone would be happy. Except for those who don't like busy waiting. I'm not sure I understand what you meant. If you meant that it will work slowly - a lot of people (including me) are using Tkinter without a mainloop from the interactive shell, and don't feel the difference. It uses exactly the method I described. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should the default equality operator compare values instead of identities?
On 11/3/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... Right, but lists (dicts, tuples, etc.) are defined as containers, and their comparison operation is defined on their contents. Objects are not defined as containers in the general case, so defining comparisons based on their contents (as opposed to identity) is just one of the two assumptions to be made. I personally like the current behavior, and I see no /compelling/ reason to change it. You obviously feel so compelled for the behavior to change that you are willing to express your desires. How about you do something more productive and produce a patch which implements the changes you want, verify that it passes tests in the standard library, then post it on sourceforge. If someone is similarly compelled and agrees with you (so far I've not seen any public support for your proposal by any of the core developers), the discussion will restart, and it will be decided (not by you or I). Thanks for the advice - I will try to do as you suggest. To summarize, I think that value-based equality testing would usually be what you want, and currently implementing it is a bit of a pain. Actually, implementing value-based equality testing, when you have a finite set of values you want to test, is quite easy. def __eq__(self, other): for i in self.__cmp_eq__: if getattr(self, i) != getattr(other, i): return False return True With a simple metaclass that discovers all of those values automatically, and/or your own protocol for exclusion, and you are done. Remember, not all 5-line functions should become builtin/default behavior, and this implementation shows that it is not a significant burdon for you (or anyone else) to implement this in your own custom library. You are right that not all 5-line functions should become builtin/default behaviour. However, I personally think that this one should, since: 1. It doesn't add complexity, or a new builtin. 2. Those five line doesn't include the metaclass code, which will probably take more than five lines and won't be trivial. 3. It will make other objects behave better, not only mine - other classes will get a meaningful comparison operator, for free. P.S. One thing that you should remember is that even if your patch is accepted, and even if this is desireable, Python 2.5 is supposed to be released sometime next year (spring/summer?), and because it is a backwards incompatible change, would need at least 2.6-2.7 before it becomes the default behavior without a __future__ import, which is another 3-4 years down the line. I hope that the warning can go in by Python 2.5, so the change (which I think will cause relatively few backwards incompatibility problems) can go in by Python 2.6, which I think is less than 2 years down the line. I understand you are passionate, really I do (you should see some of my proposals), but by the time these things get around to getting into mainline Python, there are high odds that you probably won't care about them much anymore (I've come to feel that way myself about many of my proposals), and I think it is a good idea to attempt to balance - when it comes to Python - Now is better than never. and Although never is often better than *right* now. Removing __hash__, changing __eq__, and trying to get in copy-on-write freezing (which is really copy-and-cache freezing), all read to me like We gotta do this now!, which certainly isn't helping the proposal. Thanks - I should really calm down a bit. I will try to go safe and slowly, and I hope that at the end I will succeed in making my own small contribution to Python. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should the default equality operator compare values instead of identities?
On 11/5/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... 1. It doesn't add complexity, or a new builtin. It changes default behavior (which I specified as a portion of my statement, which you quote. And you are wrong, it adds complexity to the implementation of both class instantiation and the default comparison mechanism. The former, I believe, you will find more difficult to patch than the comparison, though if you have not yet had adventures in that which is writing C extension modules, modifying the default class instantiation may be the deal breaker for you (I personally would have no idea where to start). Sorry, I meant complexity to the Python user - it won't require him to learn more in order to write programs in Python. class eqMetaclass(type): def __new__(cls, name, bases, dct): if '__cmp_include__' in dct: include = dict.fromkeys(dct['__cmp_include__']) else: include = dict.fromkeys(dct.keys) for i in dct.get('__cmp_exclude__'): _ = include.pop(i, None) dct['__cmp_eq__'] = include.keys() return type.__new__(cls, name, bases, dct) It took 10 lines of code, and was trivial (except for not-included multi-metaclass support code, which is being discussed in another thread). Oh, I suppose I should modify that __eq__ definition to be smarter about comparison... def __eq__(self, other): if not hasattr(other, '__cmp_eq__'): return False if dict.fromkeys(self.__cmp_eq__) != \ dict.fromkeys(other.__cmp_eq__): return False for i in self.__cmp_eq__: if getattr(self, i) != getattr(other, i): return False return True Thanks for the implementation. It would be very useful in order to explain my suggestion. It's nice that it compares only attributes, not types. It makes it possible for two people to write classes that can be equal to one another. Wow, 20 lines of support code, how could one ever expect users to write that? ;) This might mean that implementing it in C, once I find the right place, won't be too difficult. And I think that for most users it will be harder than it was for you, and there are some subtleties in those lines. 3. It will make other objects behave better, not only mine - other classes will get a meaningful comparison operator, for free. You are that the comparison previously wasn't meaningful. It has a meaning, though it may not be exactly what you wanted it to be, which is why Python allows users to define __eq__ operators to be exactly what they want, and which is why I don't find your uses compelling. I think that value-based equality testing is a better default, since in more cases it does what you want it to, and since in those cases they won't have to write those 20 lines, or download them from somewhere. ... From what I have previously learned from others in python-dev, the warnings machinery is slow, so one is to be wary of using warnings unless absolutely necessary. Regardless of it being absolutely necessary, it would be 2 years at least before the feature would actually make it into Python and become default behavior, IF it were desireable default behavior. All right. I hope that those warnings will be ok - it's yet to be seen. And about those 2 years - better later than never. ... You should also realize that you can make contributions to Python without changing the language or the implementation of the langauge. Read and review patches, help with bug reports, hang out on python-list and attempt to help the hundreds (if not thousands) of users who are asking for help, try to help new users in python-tutor, etc. I confess that I don't do these a lot. I can say that I from time to time teach beginners Python, and that where I work I help a lot of other people with Python. If you have an idea for a language change, offer it up on python-list first (I've forgotten to do this more often than I would like to admit), and if it generally has more cool than ick, then bring it back here. I will. Thanks again. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should the default equality operator compare values instead of identities?
On 11/6/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... Sorry, I meant complexity to the Python user - it won't require him to learn more in order to write programs in Python. Ahh, but it does add complexity. Along with knowing __doc__, __slots__, __metaclass__, __init__, __new__, __cmp__, __eq__, ..., __str__, __repr__, __getitem__, __setitem__, __delitem__, __getattr__, __setattr__, __delattr__, ... The user must also know what __cmp_include__ and __cmp_exclude__ means in order to understand code which uses them, and they must understand that exclude entries overwrite include entries. You are right. But that's Python - I think that nobody knows all the exact details of what all these do. You look in the documentation. It is a compliation - but it's of the type that I can live with, if there's a reason. Wow, 20 lines of support code, how could one ever expect users to write that? ;) This might mean that implementing it in C, once I find the right place, won't be too difficult. And I think that for most users it will be harder than it was for you, and there are some subtleties in those lines. So put it in the Python Cookbook: http://aspn.activestate.com/ASPN/Cookbook/Python A good idea. 3. It will make other objects behave better, not only mine - other classes will get a meaningful comparison operator, for free. You are that the comparison previously wasn't meaningful. It has a meaning, though it may not be exactly what you wanted it to be, which is why Python allows users to define __eq__ operators to be exactly what they want, and which is why I don't find your uses compelling. I think that value-based equality testing is a better default, since in more cases it does what you want it to, and since in those cases they won't have to write those 20 lines, or download them from somewhere. You are making a value judgement on what people want to happen with default Python. Until others state that they want such an operation as a default, I'm going to consider this particular argument relatively unfounded. All right. I will try to collect more examples for my proposal. From what I have previously learned from others in python-dev, the warnings machinery is slow, so one is to be wary of using warnings unless absolutely necessary. Regardless of it being absolutely necessary, it would be 2 years at least before the feature would actually make it into Python and become default behavior, IF it were desireable default behavior. All right. I hope that those warnings will be ok - it's yet to be seen. And about those 2 years - better later than never. It won't be OK. Every comparison using the default operator will incur a speed penalty while it checks the (pure Python) warning machinery to determine if the warning has been issued yet. This alone makes the transition require a __future__ import. How will the __future__ statement help? I think that the warning is still needed, so that people using code that may stop working will know about it. I see that they can add a __future__ import and see if it still works, but it will catch much fewer problems, because usually code would be run without the __future__ import. If it really slows down things, it seems to me that the only solution is to optimize the warning module... Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why should the default hash(x) == id(x)?
On 11/3/05, Greg Ewing [EMAIL PROTECTED] wrote: 3. If someone does want to associate values with objects, he can explicitly use id: dct[id(x)] = 3. This is fragile. Once all references to x are dropped, it is possible for another object to be created having the same id that x used to have. The dict now unintentionally references the new object. You are right. Please see the simple ref class that I wrote in my previous post, which solves this problem. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Why should the default hash(x) == id(x)?
Hello, While writing my PEP about unifying mutable and immutable, I came upon this: Is there a reason why the default __hash__ method returns the id of the objects? It is consistent with the default __eq__ behaviour, which is the same as is, but: 1. It can easily become inconsistent, if someone implements __eq__ and doesn't implement __hash__. 2. It is confusing: even if someone doesn't implement __eq__, he may see that it is suitable as a key to a dict, and expect it to be found by other objects with the same value. 3. If someone does want to associate values with objects, he can explicitly use id: dct[id(x)] = 3. This seems to better explain what he wants. Now, I just thought of a possible answer: because he wants to store in his dict both normal objects and objects of his user-defined type, which turn out to be not equal to any other object. This leads me to another question: why should the default __eq__ method be the same as is? If someone wants to check if two objects are the same object, that's what the is operator is for. Why not make the default __eq__ really compare the objects, that is, their dicts and their slot-members? I would be happy to get answers. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] apparent ruminations on mutable immutables (was:PEP 351, the freeze protocol)
Thank you for your encouraging words! I am currently working on a PEP. I am sure that writing it is a good idea, and that it would help with explaining this idea both to others and to myself. What I already wrote makes me think that it can be accomplished with no really large changes to the language - only six built-in types are affected, and there is no reason why existing code, both in C and in Python, would stop working. I hope others would be interested in the idea too, when I finish writing the PEP draft, so it would be discussed. Trying the idea with PyPy is a really nice idea - it seems that it would be much simpler to implement, and I'm sure that learning PyPy would be interesting. Thanks again, and I would really like to hear your comments when I post the PEP draft, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why should the default hash(x) == id(x)?
On 11/2/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... A quick search in the list archives via google search site:mail.python.org object __hash__ Says that Guido wanted to remove the default __hash__ method for object in Python 2.4, but that never actually happened. http://www.python.org/sf/660098 http://mail.python.org/pipermail/python-dev/2003-December/041375.html There may be more code which relies on the default behavior now, but fixing such things is easy. Cool! If Guido also thinks that it should be gone, who am I to argue... (Seriously, I am in favor of removing it. I really think that it is confusing.) And if backwards-compatibility is a problem: You can, in Python 2.5, show a warning when the default __hash__ method is being called, saying that it is going to disappear in Python 2.6. [Snip - I will open a new thread about the equality operator] As for removing the default __hash__ for objects, I'm actually hovering around a -0, if only because it is sometimes useful to generate unique keys for dictionaries (which can be done right now with object() ), and I acknowledge that it would be easy to subclass and use that instead. I can suggest a new class, that will help you in the cases that you do want a dict of identities: class ref(object): def __init__(self, obj): self._obj = obj def __call__(self): return self._obj def __eq__(self, other): return self._obj is other._obj def __hash__(self): return hash(id(self._obj)) It has the advantage over using ids as keys, that it saves a reference to the object, so it won't be killed. It lets you make a dict of object identities just as easily as before, in a more explicit and error-prone way. Perhaps it should become a builtin? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Should the default equality operator compare values instead of identities?
I think it should. (I copy here messages from the thread about the default hash method.) On 11/2/05, Michael Chermside [EMAIL PROTECTED] wrote: Why not make the default __eq__ really compare the objects, that is, their dicts and their slot-members? Short answer: not the desired behavior. Longer answer: there are three common patterns in object design. There are value objects, which should be considered equal if all fields are equal. There are identity objects which are considered equal only when they are the same object. And then there are (somewhat less common) value objects in which a few fields don't count -- they may be used for caching a pre-computed result for example. The default __eq__ behavior has to cater to one of these -- clearly either value objects or identity objects. Guido chose to cater to identity objects believing that they are actually more common in most situations. A beneficial side-effect is that the default behavior of __eq__ is QUITE simple to explain, and if the implementation is easy to explain then it may be a good idea. This is a very nice observation. I wish to explain why I think that the default __eq__ should compare values, not identities. 1. If you want to compare identities, you can always use is. There is currently no easy way to compare your user-defined classes by value, in case they happen to be value objects, in Michael's terminology - you have to compare every single member. (Comparing the __dict__ attributes is ugly, and will not always work). If the default were to compare the objects by value, and they happen to be identity objects, you can always do: def __eq__(self, other): return self is other 2. I believe that counter to what Michael said, value objects are more common than identity objects, at least when talking about user-defined classes, and especially when talking about simple user-defined classes, where the defaults are most important, since the writer wouldn't care to define all the appropriate protocols. (this was a long sentence) Can you give examples of common identity objects? I believe that they are usually dealing with some input/output, that is, with things that interact with the environment (files, for example). I believe almost all algorithmic classes are value objects. And I think that usually, comparison based on value will give the correct result for identity objects too, since if they do I/O, they will usually hold a reference to an I/O object, like file, which is an identity object by itself. This means that the comparison will compare those objects, and return false, since the I/O objects they hold are not the same one. 3. I think that value-based comparison is also quite easy to explain: user-defined classes combine functions with a data structure. In Python, the data structure is simply member names which reference other objects. The default, value-based, comparison, checks if two objects have the same member names, and that they are referencing equal (by value) objects, and if so, returns True. I think that explaining this is not harder than explaining the current dict comparison. Now, for Josiah's reply: On 11/2/05, Josiah Carlson [EMAIL PROTECTED] wrote: This leads me to another question: why should the default __eq__ method be the same as is? If someone wants to check if two objects are the same object, that's what the is operator is for. Why not make the default __eq__ really compare the objects, that is, their dicts and their slot-members? Using 'is' makes sense when the default hash is id (and actually in certain other cases as well). Actually comparing the contents of an object is certainly not desireable with the default hash, and probably not desireable in the general case because equality doesn't always depend on /all/ attributes of extension objects. Explicit is better than implicit. In the face of ambiguity, refuse the temptation to guess. I hope that the default hash would stop being id, as Josiah showed that Guido decided, so let's don't discuss it. Now, about the good point that sometimes the state doesn't depend on all the attributes. Right. But the current default doesn't compare them well too - you have no escape from writing an equality operator by yourself. And I think this is not the common case. I think that the meaning of in the face of ambiguity, refuse the temptation to guess is that you should not write code that changes its behaviour according to what the user will do, based on your guess as to what he meant. This is not the case - the value-based comparison is strictly defined. It may just not be what the user would want - and in most cases, I think it will. Explicit is better than implicit says only better. identity-based comparison is just as implicit as value-based comparison. (I want to add that there is a simple way to support value-based comparison when some members don't count, by writing a metaclass that will check if your class
Re: [Python-Dev] Should the default equality operator compare values instead of identities?
I've looked for classes in my /usr/lib/python2.4 directory. I won't go over all the 7346 classes that were found there, but let's see: identity objects that will continue to work because they contain other identity objects SocketServer, and everything which inherits from it (like HTTPServer) Queue csv (contains _csv objects) value objects that would probably gain a meaningful equality operator StringIO ConfigParser markupbase, HTMLParser HexBin, BinHex cgi.FieldStorage AST Nodes others == Cookie - inherits from dict its __eq__ method. I'll stop here. I was not strictly scientific, because I chose classes that I thought that I might guess what they do easily, and perhaps discarded classes that didn't look interesting to me. But I didn't have any bad intention when choosing the classes. I have seen no class that the change would damage its equality operator. I have seen quite a lot of classes which didn't define an equality operator, and that a value-based comparison would be the right way to compare them. I'm getting more convinced in my opinion. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Should the default equality operator compare valuesinstead of identities?
On 11/2/05, Raymond Hettinger [EMAIL PROTECTED] wrote: Should the default equality operator compare valuesinstead of identities? No. Look back into last year's python-dev postings where we agreed that identity would always imply equality. There were a number of practical reasons. Also, there are a number of places in CPython where that assumption is implicit. Perhaps you've meant something else, or I didn't understand? Identity implying equality is true also in value-based comparison. If the default __eq__ operator compares by value, I would say that it would do something like: def __eq__(self, other): if self is other: return True if type(self) is not type(other): return False (compare the __dict__ and any __slots__, and if they are all ==, return True.) Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)
On 11/1/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... I am an advocate for PEP 351. However, I am against your proposed implementation/variant of PEP 351 because I don't believe it ads enough to warrant the additional complication and overhead necessary for every object (even tuples would need to get a .frozen_cache member). Give me a recursive freeze from PEP 351 (which handles objects that are duplicated, but errors out on circular references), and I'll be happy. That's fine - but it doesn't mean that I must be happy with it. ... This isn't correct - freezing a set won't require a single copy to be performed, as long as the frozen copy isn't saved after the original is changed. Copy+cache always requires one copy. You are wrong, and you even say you are wrong...freezing a set doesn't require a COPY, IF the frozen COPY isn't saved after the original is CHANGED. Creating an immutable set IS CREATING A COPY, so it ALSO copies, and you admit as much, but then say the equivalent of copying isn't copying because I say so. No, I am not wrong. I am just using misleading terms. I will call a frozen copy a frozen image. Here it goes: freezing a set doesn't require a COPY, IF the frozen IMAGE isn't saved after the original is CHANGED. I suggest that there would be a way to create a frozenset without COPYING an O(n) amount of MEMORY. When a frozen set is created by a call frozen(x), it would not copy all the data, but would rather reference the existing data, which was created by the non-frozen set. Only if the original set changes, when there's a frozen set referencing the data, the MEMORY would be actually copied. I call it a frozen copy because it behaves as a frozen copy, even though not all the memory is being copied. When you call the COPY function in the COPY module with a string, it doesn't really copy memory - the same string is returned. When you copy a file inside subversion, it doesn't actually copy all the data associated with it, but does something smarter, which takes O(1). The point is, for the user, it's a copy. Whether or not memory is actually being copied, is an implementation detail. ... I think that adding an additional attribute to literally every single object to handle the caching of 'frozen' objects, as well as a list to every object to handle callbacks which should be called on object mutation, along with a _call_stuff_when_mutated() method that handles these callback calls, IN ADDITION TO the __freeze__ method which is necessary to support this, is a little much, AND IS CERTAINLY NOT A SIMPLIFICATION! I don't agree. You don't need to add a list to every object, since you can store all those relations in one place, with a standard function for registering them. Anyway, code written in Python (which is the language we are discussing) WON'T BE COMPLICATED! The frozen mechanism, along with two new protocols (__frozen__ and __changed__), would be added automatically! The internal state of a class written in Python can be automatically frozen, since it's basically a dict. Now let's see if it's a simplification: 1. No Python code would have to be made more complicated because of the change. 2. There would be no need to find workarounds, like cStringIO, for the fact that strings and tuples are immutable. 3. You would be able to put any kind of object into a set, or use it as a dict key. 4. Classes (like the graph example) would be able to give users things without having to make a choice between risking their users with strange bugs, making a complicated interface, making very inefficient methods, and writing complicated wrapper classes. I will ask you: Is this a complication? The answer is: it requires a significent change of the CPython implementation. But about the Python language: it's definitely a simplification. Let us pause for a second and consider: Original PEP proposed 1 new method: __freeze__, which could be implemented as a subclass of the original object (now), and integrated into the original classes as time goes on. One could /register/ __freeze__ functions/methods a'la Pickle, at which point objects wouldn't even need a native freeze method. Your suggestion offers 2 new methods along with 2 new instance variables. Let's see, a callback handler, __freeze__, the cache, and the callback list. Doesn't that seem a little excessive to you to support freezing? It does to me. If Guido were to offer your implementation of freeze, or no freeze at all, I would opt for no freeze, as implementing your freeze on user-defined classes would be a pain in the ass, not to mention implementing them in C code would be more than I would care to do, and more than I would ask any of the core developers to work on. As I said above: this suggestion would certainly require more change in the Python implementation than your suggestion. But the Python language would gain a lot more. Implementing my frozen on user-defined classes would not be a pain
Re: [Python-Dev] a different kind of reduce...
On 11/1/05, Reinhold Birkenfeld [EMAIL PROTECTED] wrote: Hmm, using the function's own namespace is an interesting idea. It might also be a good place to put other functionals: results = f.map(data) newf = f.partial(somearg) And we have solved the map, filter and reduce are going away! Let's all weep together problem with one strike! Reinhold I have no problems with map and filter goint away. About reduce - please remember that you need to add this method to any callable, including every type (I mean the constructor). I am not sure it is a good trade for throwing away one builting, which is a perfectly reasonable function. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)
On 11/1/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... I still consider it dead. If the implementation is hard to explain, it's a bad idea. It is sometimes true, but not always. It may mean two other things: 1. The one trying to explain is not talented enough. 2. The implementation is really not very simple. A hash table, used so widely in Python, is really not a simple idea, and it's not that easy to explain. Also, not all user-defined classes have a __dict__, and not all user-defined classes can have arbitrary attributes added to them. c class foo(object): ... __slots__ = ['lst'] ... def __init__(self): ... self.lst = [] ... a = foo() a.bar = 1 Traceback (most recent call last): File stdin, line 1, in ? AttributeError: 'foo' object has no attribute 'bar' It doesn't matter. It only means that the implementation would have to make frozen copies also of __slots__ items, when freezing a user-defined class. I am afraid that this question proves that I didn't convey my idea to you. If you like, please forgive my inability to explain it clearly, and try again to understand my idea, by going over what I wrote again, and thinking on it. You can also wait for the PEP that I intend to write. And you can also forget about it, if you don't want to bother with it - you've already helped a lot. Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Hello, I have slept quite well, and talked about it with a few people, and I still think I'm right. About the users-changing-my-internal-data issue: Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work. ... When someone complains that something doesn't work, I tell them to read the documentation. If your users haven't been told to RTFM often enough to actually make it happen, then you need a RTFM-bat. Want to know how you make one? You start wrapping the objects you return which segfaults the process if they change things. When they start asking, tell them it is documented quite clearly do not to modify objects returned, or else. Then there's the other option, which I provide below. I disagree. I read the manual when I don't know what something does. If I can guess what it does (and this is where conventions are good), I don't read the manual. And let's say I ought to read the complete manual for every method that I use, and that I deserve a death punishment (or a segmentation fault) if I don't. But let's say that I wrote a software, without reading the manual, and it worked. I have gone to work on other things, and suddenly a bug arises. When the poor guy who needs to fix it goes over the code, everything looks absolutely correct. Should he also read the complete manuals of every library that I used, in order to fix that bug? And remember that in this case, the object could have traveled between several places (including functions in other libraries), before it was changed, and the original data structure starts behaving weird. You suggest two ways for solving the problem. The first is by copying my mutable objects to immutable copies: Also from the sounds of it, you are storing both source and destination values in the same table...hrm, that sounds quite a bit like a spreadsheet. How does every spreadsheet handle that again? Oh yeah, they only ever store immutables (generally strings which are interpreted). But I suppose since you are (of course) storing mutable objects, you need to work a bit harder...so store mutables, and return immutable copies (which you can cache if you want, and invalidate when your application updates the results...like a wx.Grid update on changed). This is basically ok. It's just that in my solution, for many objects it's not necessary to make a complete copy just to prevent changing the value: Making frozen copies of objects which can't reference nonfrozen objects (sets, for example), costs O(1), thanks to the copy-on-write. Now, about the graph: So let me get this straight: You've got a graph. You want to be able to change the graph, but you don't want your users to accidentally change the graph. Sounds to me like an API problem, not a freeze()/mutable problem. Want an API? class graph: ... def IterOutgoing(self, node): ... def IterIncoming(self, node): ... def IsAdjacent(self, node1, node2): ... def IterNodes(self): ... def AddEdge(self, f_node, t_node): ... def RemEdge(self, node1, node2): ... def AddNode(self): ... If you are reasonable in your implementation, all of the above operations can be fast, and you will never have to worry about your users accidentally mucking about with your internal data structures: because you aren't exposing them. If you are really paranoid, you can take the next step and implement it in Pyrex or C, so that only a malicous user can muck about with internal structures, at which point you stop supporting them. This will work. It's simply... well, not very beautiful. I have to invent a lot of names, and my users need to remember them all. If I give them a frozen set, with all the vertices than a vertex points to (which is an absolutely reasonable API), they will know how to deal with it without learning a lot of method names, thanks to the fact that they are already familiar with sets, and that a lot of thought has gone into the set interface. Now, about copy-on-write: ... What you have written here is fairly unintelligible, but thankfully you clarify yourself...pity it still doesn't work, I explain below. I'm sorry if I am sometimes hard to understand. English is not my mother tongue, and it degrades as the hour gets later - and sometimes, things are hard to explain. If I don't explain myself, please say so and I'll try again. This is an excellent example - I wrote about callbacks, and went to sleep. Let me try to explain again how it *does* work. The frozen() function, and the __frozen__ protocol, would get another optional argument - an object to be notified when the *nonfrozen* object has changed. It may be called at most once - only on the first change to the object, and only if the object which requested to be notified is still
Re: [Python-Dev] PEP 351, the freeze protocol
On 10/31/05, Josiah Carlson [EMAIL PROTECTED] wrote: ... And I'm going to point out why you are wrong. I still don't think so. I think that I need to reclarify what I mean. About the users-changing-my-internal-data issue: ... You can have a printout before it dies: I'm crashing your program because something attempted to modify a data structure (here's the traceback), and you were told not to. Then again, you can even raise an exception when people try to change the object, as imdict does, as tuples do, etc. Both solutions would solve the problem, but would require me to wrap the built-in set with something which doesn't allow changes. This is a lot of work - but it's quite similiar to what my solution would actually do, in a single built-in function. You suggest two ways for solving the problem. The first is by copying my mutable objects to immutable copies: And by caching those results, then invalidating them when they are updated by your application. This is the same as what you would like to do, except that I do not rely on copy-on-write semantics, which aren't any faster than freeze+cache by your application. This isn't correct - freezing a set won't require a single copy to be performed, as long as the frozen copy isn't saved after the original is changed. Copy+cache always requires one copy. ... I never claimed it was beautiful, I claimed it would work. And it does. There are 7 methods, which you can reduce if you play the special method game: RemEdge - __delitem__((node, node)) RemNode - __delitem__(node) #forgot this method before IterNodes - __iter__() IterOutgoing,IterIncoming - IterAdjacent(node) I just wanted to say that this game is of course a lot of fun, but it doesn't simplify the interface. In any case, whether you choose to use freeze, or use a different API, this particular problem is solvable without copy-on-write semantics. Right. But I think that a significant simplification of the API is a nice bonus for my solution. And about those copy-on-write semantics - it should be proven how complex they are. Remember that we are talking about frozen-copy-on-write, which I think would simplify matters considerably - for example, there are at most two instances sharing the same data, since the frozen copy can be returned again and again. Now, about copy-on-write: ... Thank you for the clarification (btw, your english is far better than any of the foreign languages I've been taught over the years). Thanks! It seems that if you are forced to use a language from time to time it improves a little... ... Even without validation, there are examples that force a high number of calls, which are not O(1), ammortized or otherwise. [Snap - a very interesting example] Now, the actual time analysis on repeated freezings and such gets ugly. There are actually O(k) objects, which take up O(k**2) space. When you modify object b[i][j] (which has just been frozen), you get O(k) callbacks, and when you call freeze(b), it actually results in O(k**2) time to re-copy the O(k**2) pointers to the O(k) objects. It should be obvious that this IS NOT AMMORTIZABLE to original object creation time. That's absolutely right. My ammortized analysis is correct only if you limit yourself to cases in which the original object doesn't change after a frozen() call was made. In that case, it's ok to count the O(k**2) copy with the O(k**2) object creation, because it's made only once. Why it's ok to analyze only that limited case? I am suggesting a change in Python: that every object you would like be mutable, and would support the frozen() protocol. When you evaluate my suggestion, you need to take a program, and measure its performance in the current Python and in a Python which implements my suggestion. This means that the program should work also on the current Python. In that case, my assumption is true - you won't change objects after you have frozen them, simply because these objects (strings which are used as dict keys, for example) can't be changed at all in the current Python implementation! I will write it in another way: I am proposing a change that will make Python objects, including strings, mutable, and gives you other advantages as well. I claim that it won't make existing Python programs run slower in O() terms. It would allow you to do many things that you can't do today; some of them would be fast, like editing a string, and some of them would be less fast - for example, repeatedly changing an object and freezing it. I think that the performance penalty may be rather small - remember that in programs which do not change strings, there would never be a need to copy the string data at all. And since I think that usually most of the dict lookups are for method or function names, there would almost never be a need to constuct a new object on dict lookup, because you search for the same names again and again, and a new object is created only on the
Re: [Python-Dev] PEP 351, the freeze protocol
I thought about something - I think that the performance penalty may be rather small - remember that in programs which do not change strings, there would never be a need to copy the string data at all. And since I think that usually most of the dict lookups are for method or function names, there would almost never be a need to constuct a new object on dict lookup, because you search for the same names again and again, and a new object is created only on the first frozen() call. You might even gain performance, because s += x would be faster. Name lookups can take virtually the same time they take now - method names can be saved from the beginning as frozen strings, so finding them in a dict will take just another bit test - is the object frozen - before doing exactly what is done now. Remember, the strings we are familiar with are simply frozen strings... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 351, the freeze protocol
Hello, It seems that we both agree that freezing is cool (-; . We disagree on whether a copy-on-write behaviour is desired. Your arguments agains copy-on-write are: 1. It's not needed. 2. It's complicated to implement. But first of all, you didn't like my use cases. I want to argue with that. In reading your description of a 'table of values', I can't help but be reminded of the wxPython (and wxWidget) wx.Grid and its semantics. It offers arbitrary tables of values (whose editors and viewers you can change at will), which offers a mechanism by which you can listen to changes that occur to the contents of a cell. I can't help but think that if you offered a protocol by which a user can signal that a cell has been changed, perhaps by writing the value to the table itself (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351 freeze), etc., that both you and the users of your table would be much happier. Perhaps I didn't make it clear. The difference between wxPython's Grid and my table is that in the table, most values are *computed*. This means that there's no point in changing the values themselves. They are also used frequently as set members (I can describe why, but it's a bit complicated.) I want to say that even if sets weren't used, the objects in the table should have been frozen. The fact the sets (and dicts) only allow immutable objects as members/keys is just for protecting the user. They could have declared, you shouldn't change anything you insert - as long as you don't, we'll function properly. The only reason why you can't compute hash values of mutable objects is that you don't want your user to make mistakes, and make the data structure inconsistent. As for the graph issue, you've got a bigger problem than users just being able to edit edge lists, users can clear the entire dictionary of vertices (outgoing.clear()). It seems to me that a more reasonable method to handle this particular case is to tell your users don't modify the dictionaries or the edge lists, and/or store your edge lists as tuples instead of lists or dictionaries, and/or use an immutable dictionary (as offered by Barry in the PEP). As I wrote before, telling my users don't modify the edge lists is just like making lists hashable, and telling all Python users, dont modify lists that are dictionary keys. There's no way to tell the users that - there's no convention for objects which should not be changed. You can write it in the documentation, but who'll bother looking there? I don't think that your other suggestions will work: the data structure of the graph itself can't be made of immutable objects, because of the fact that the graph is a mutable object - you can change it. It can be made of immutable objects, but this means copying all the data every time the graph changes. Now, about copy-on-write: There's also this little issue of copy on write semantics with Python. Anyone who tells you that copy on write is easy, is probably hanging out with the same kind of people who say that threading is easy. Of course both are easy if you limit your uses to some small subset of interesting interactions, but copy on write gets far harder when you start thinking of dictionaries, lists, StringIOs, arrays, and all the possible user-defined classes, which may be mutated beyond obj[key] = value and/or obj.attr = value (some have obj.method() which mutates the object). As such, offering a callback mechanism similar to weak references is probably pretty close to impossible with CPython. Let's limit ourselves to copy-on-write of objects which do not contain nonfrozen objects. Perhaps it's enough - the table, the graph, and strings, are perfect examples of these. Implementation doesn't seem to complicated to me - whenever the object is about to change, and there is a connected frozen copy, you make a shallow copy of the object, point the frozen copy to it, release the reference to the frozen copy, and continue as usual. That's all. I really think that this kind of copy-on-write is correct. The temporary freezing of sets in order to check if they are members of other sets is a not-very-nice way of implementing it. This kind of copy-on-write would allow, in principle, for Python strings to become mutable, with almost no speed penalty. It would allow my table, and other containers, to automatically freeze the objects that get into it, without having to trust the user on not changing the objects - and remember that there's no way of *telling* him not to change the objects. Now, the computer scientist in me wants to explain (and think about) freezing containers of nonfrozen objects. What I actually want is that as long as an object doesn't change after it's freezed, the cost of freezing would be nothing - that is, O(1). Think about a mutable string object, which is used in the same way as the current, immutable strings. It is constructed once, and then may be used as a key in a
Re: [Python-Dev] PEP 351, the freeze protocol
Hello, I have thought about freezing for some time, and I think that it is a fundamental need - the need to know, sometimes, that objects aren't going to change. This is mostly the need of containers. dicts need to know that the objects that are used as keys aren't going to change, because if they change, their hash value changes, and you end up with a data structure in an inconsistent state. This is the need of sets too, and of heaps, and binary trees, and so on. I want to give another example: I and my colleges designed something which can be described as an electronic spreadsheet in Python. We called it a table. The values in the table are Python objects, and the functions which relate them are written in Python. Then comes the problem: the user has, of course, access to the objects stored in the table. What would happen if he changes them? The answer is that the table would be in an inconsistent state, since something which should be the return value of a function is now something else, and there's no way for the table to know about that. The solution is to have a freeze protocol. It may be called frozen (like frozen(set([1,2,3]))), so that it will be clear that it does not change the object itself. The definition of a frozen object is that its value can't change - that is, if you compare it with another object, you should get the same result as long as the other object hasn't changed. As a rule, only frozen objects should be hashable. I want to give another, different, use case for freezing objects. I once thought about writing a graph package in Python - I mean a graph with vertices and edges. The most obvious way to store a directed graph is as a mapping (dict) from a node to the set of nodes that it points to. Since I want to be able to find also which nodes point to a specific node, I will store another mapping, from a node to the set of nodes that point to it. Now, I want a method of the graph which will return the set of nodes that a given node points to, for example to let me write if y in graph.adjacent_nodes(x) then. The question is, what will the adjacent_nodes method return? If it returns the set which is a part of the data structure, there is nothing (even no convention!) that will prevent the user from playing with it. This will corrupt the data structure, since the change won't be recorded in the inverse mapping. adjacent_nodes can return a copy of the set, it's a waste if you only want to check whether an object is a member of the set. I gave this example to say that the frozen protocol should (when possible) return an object which doesn't really contain a copy of the data, but rather gives an image of the original object. If the original object changes while there are frozen copies of it, the data will be copied, and all the frozen objects will then reference a version of the data that will never change again. This will solve the graph problem nicely - adjacent_nodes would simply return a frozen copy of the set, and a copy operation would happen only in the rare cases when the returned set is being modified. This would also help the container use cases: they may call the frozen() method on objects that should be inserted into the container, and usually the data won't be copied. Some objects can't be created in their final form, but can only be constructed step after step. This means that they must be non-frozen objects. Sometimes they are constructed in order to get into a container. Unless the frozen() method is copy-on-change the way I described, all the data would have to be copied again, just for the commitment that it won't change. I don't mean to frighten, but in principle, this may mean that immutable strings might be introduced, which will allow us to get rid of all the cStringIO workarounds. Immutable strings would be constructed whenever they are needed, at a low performance cost (remember that a frozen copy of a given object has to be constructed only once - once it has been created, the same object can be returned on additional frozen() calls.) Copy-on-change of containers of non-frozen objects requires additional complication: it requires frozen objects to have a way for setting a callback that will be called when the original object was changed. This is because the change makes the container of the original object change, so it must drop its own frozen copy. This needs to happen only once per frozen object, since after a change, all the containers drop their frozen copies. I think this callback is conceptually similar to the weakref callback. Just an example that copy-on-change (at least of containers of frozen objects) is needed: sets. It was decided that you can test whether a non-frozen set is a member of a set. I understand that it is done by temporarily freezing the set, and that it caused some threading issues. A copy-on-change mechanism might solve it more elegantly. What do you think? Noam ___ Python-Dev
Re: [Python-Dev] [Python-checkins] commit of r41352 - in python/trunk: . Lib Lib/distutils Lib/distutils/command Lib/encodings
That might be reasonable. I just noticed that it is convenient to do svn propset svn:ignore -F .cvsignore . Without a file, I wouldn't know how to edit the property, so I would probably do svn propget svn:ignore . ignores vim ignores svn propset svn:ignore -F ignores . rm ignores Won't svn propedit svn:ignore . do the trick? Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] IDLE development
Hello, More than a year and a half ago, I posted a big patch to IDLE which adds support for completion and much better calltips, along with some other improvements. Since then, I had some mail conversations with Kurt B. Kaiser, who is responsible for IDLE, which resulted in nothing. My last mail, from Jul 10, saying (with more details) I made the minor changes you asked for, let's get it in, it's not very complicated was unanswered. This is just an example of the fact that IDLE development was virtually nonexistent in the last months, because most patches were simply ignored. I and my colleges use IDLE intensively - that is, a heavily patched IDLE. It includes my patch and many other improvements made by me and my friends. The improved IDLE is MUCH better than the standard IDLE, especially for interactive work. Since we would like to share our work with the rest of the world, if nothing is changed we would start a new IDLE fork soon, perhaps at python-hosting.com. I really don't like that - maintaining a fork requires a lot of extra work, and it is certain that many more people will enjoy our work if it integrated in the standard Python distribution. But sending patches and watching them stay open despite a continuous nagging is worse. Please, either convince KBK to invest more time in IDLE development, or find someone else who would take care of it. If you like, I would happily help in the development. I hope I am not sounding offensive. It's actually quite simple: if the excellent development environment IDLE can't develop inside standard Python, it should be developed outside it. As I said, I prefer the first option. Have a good week, Noam Raphael ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] IDLE development
On 9/11/05, Guido van Rossum [EMAIL PROTECTED] wrote: On 9/10/05, Noam Raphael [EMAIL PROTECTED] wrote: I and my colleges use IDLE intensively - that is, a heavily patched IDLE. It includes my patch and many other improvements made by me and my friends. The improved IDLE is MUCH better than the standard IDLE, especially for interactive work. Could it be that this is a rather subjective judgement? It wouldn't be the first time that someone pushing for their personal set of functionality changes is overlooking the needs of other user groups. I don't think so, since: 1. These are added features, not functionality changes. 2. There are quite a lot of people using the improved IDLE where I work, and I never heard anyone saying he prefers the standard IDLE - on the contrary, many are asking how they can use the improved IDLE in their homes. 3. Kurt agreed to integrate the change - he just didn't do it. Since we would like to share our work with the rest of the world, if nothing is changed we would start a new IDLE fork soon, perhaps at python-hosting.com. I have no problem with this. You might be able to save yourself some maintenance work by structuring your version as a set of subclasses rather than a set of patches (even if you distribute it as a complete working program). Many people have needs that aren't met by standard Python; they write their own modules or extensions and distribute these independently from Python; your case probably isn't all that different. I think that rewriting the patches as subclasses will be a lot of work, and won't be a very good idea - if you change one line in a function, copy-pasting it to a subclass and changing the line seems a little weird for me - not to mention the cases where some refactoring needs to be done. I think we will be talking about a separate package - say, idleforklib instead of idlelib. You can always run diff to find the differences between the two packages. Often the needs of certain user groups and the development speeds of such 3rd party modules are so different that it simply doesn't make sense to fold them in the Python distribution anyway -- consider what you would have to do if Kurt accepted your patches: you'll still have to wait until Python 2.5 is released before others can benefit from your changes, and if you come up with an improvement after that release, your next chance will be 18 months later... I don't think so - if IDLE is developed on the Python CVS, we can still distribute a stand-alone package with IDLE from the CVS head, for eager people. All others will get the changes a year later, which isn't that bad. Perhaps it can even be less than a year - since IDLE is a GUI application and not a library, so there isn't a lot of backward compatibility to maintain, it seems to me that updated versions can be shipped also with new minor versions of Python. The advantages of developing IDLE on the Python CVS are that there is no need to synchronize two versions, and a wider audience. Of course, after you see the improved IDLE you will surely decide to immediately import it into the Python CVS, so there's not much of a problem... :) Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Split MIME headers into multiple lines near a space
On 5/30/05, Nick Coghlan [EMAIL PROTECTED] wrote: Noam's suggestion seems reasonable to me, but I'm not sure what the performance implications are. I think that they are not critical. The number of lines can grow by at most twice, because shorter words would not have a line of their own. The added rfind call seems not very significant to me, since it is preceded by about log2n string encodings, to test if an encoded prefix fits the required line length. Have a good day, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Getting rid of unbound methods: patch available
Hello, I would like to add here another small thing which I encountered this week, and seems to follow the same logic as does Guido's proposal. It's about staticmethods. I was writing a class, and its pretty-printing method got a function for converting a value to a string as an argument. I wanted to supply a default function. I thought that it should be in the namespace of the class, since its main use lies there. So I made it a staticmethod. But - alas! After I declared the function a staticmethod, I couldn't make it a default argument for the method, since there's nothing to do with staticmethod instances. The minor solution for this is to make staticmethod objects callable. This would solve my problem. But I suggest a further step: I suggest that if this is done, it would be nice if classname.staticmethodname would return the classmethod instance, instead of the function itself. I know that this things seems to contradict Guido's proposal, since he suggests to return the function instead of a strange object, and I suggest to return a strange object instead of a function. But this is not true; Both are according to the idea that class attributes should be, when possible, the same objects that were created when defining the class. This is more consistent with the behaviour of modules (module attributes are the objects that were created when the code was run), and is more consistent with the general convention, that running A = B causes A == B to be true. Currently, Class.func = staticmethod(func), and Class.func = func, don't behave by this rule. If the suggestions are accepted, both will. I just think it's simpler and cleaner that way. Just making staticmethods callable would solve my practical problem too. Noam Raphael ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Getting rid of unbound methods: patch available
and is more consistent with the general convention, that running A = B causes A == B to be true. Currently, Class.func = staticmethod(func), and Class.func = func, don't behave by this rule. If the suggestions are accepted, both will. Well, given that attribute assignment can be overloaded, you can't depend on that requirement all the time. Yes, I know. For example, I don't know how you can make this work for classmethods. (although I have the idea that if nested scopes were including classes, and there was a way to assign names to a different scope, then there would be no need for them. But I have no idea how this can be done, so never mind.) I just think of it as a very common convention, and I don't find the exceptions aesthetically pleasing. But of course, I accept practical reasons for not making it that way. I recommend that you work around it by setting the default to None and substituting the real default in the function. That's a good idea, I will probably use it. (I thought of a different way: don't use decorators, and wrap the function in a staticmethod after defining the function that uses it. But this is really ugly.) Thanks for your reply, Noam ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] An import hook which does nothing
Hello, I'm currently writing an import hook which will cache information on the local disk, instead of retrieving it every time from the slow NFS where I work. To make sure that I understand how import hooks work, and as a starting point, I wrote a dummy import hook, which is supposed to behave like the python's built-in import mechanism. I post it here, for two reasons: 1. So that developers who understand the import mechanism better than I do may review it and find things which I didn't do exactly right. 2. Because I think other people might find it helpful when writing new import hooks, as a starting point as well as for better understanding -- there's nothing like a working example to help you settle up on what does which where. (Although perhaps a few more comments, in addition to those which I wrote, might help...) (Note: I wrote the DEBUG parts in order to make sure that my importer works, because if it fails things might be done by the built-in importer and I won't notice. These parts can be removed, of course.) Do you think that it might be useful? Maybe something like that can go into the examples section of the imp module? Thanks, Noam Raphael import imp import sys DEBUG = True if DEBUG: myimports = [] class InPackageFinder(object): Find a module/package in a package. def find_module(self, fullname, path=None): if path is None: # Not in a package - don't handle it here. return None try: f, fn, desc = imp.find_module(fullname[fullname.rfind('.')+1:], path) except ImportError: # Well, I don't find it, maybe others will. return None return Loader(f, fn, desc) class TopLevelFinder(object): Find a top level module/package. def __init__(self, path): self._path = path def find_module(self, fullname): try: f, fn, desc = imp.find_module(fullname, [self._path]) except ImportError: # It is not in this path. Maybe in another one. return None return Loader(f, fn, desc) class Loader(object): Load a module/package. def __init__(self, f, fn, desc): self._f, self._fn, self._desc = f, fn, desc def load_module(self, fullname): if DEBUG: myimports.append(fullname) try: return imp.load_module(fullname, self._f, self._fn, self._desc) finally: if self._f is not None: # For packages we have None instead of a file object. self._f.close() def install(): sys.meta_path.append(InPackageFinder()) sys.path_hooks.append(TopLevelFinder) sys.path_importer_cache.clear() if DEBUG: myimports.extend(sys.modules.iterkeys()) if DEBUG: def checkok(): return not [x for x in sys.modules if sys.modules[x] is not None and hasattr(sys.modules[x], __file__) and x not in myimports] ___ Python-Dev mailing list [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com