Re: [Python-Dev] Can LOAD_GLOBAL be optimized to a simple array lookup?

2006-08-23 Thread Noam Raphael
2006/8/24, Brett Cannon [EMAIL PROTECTED]:



 On 8/23/06, K.S.Sreeram [EMAIL PROTECTED] wrote:
  Hi all,
 
  I noticed in Python/ceval.c that LOAD_GLOBAL uses a dictionary lookup,
  and was wondering if that can be optimized to a simple array lookup.


 No, not as the language  stands now.

  If i'm right there are 3 kinds of name lookups: locals, outer
  scopes(closures), and globals. (not counting attribute lookup). Locals
  are identified by, either the presence of assignments, or their presence
  in the arg list. So all name lookups can be classified into the 3 types
  at compile/load time.
 
  Since we know, at load time, which names are global.. Can't we simply
  build a global name table and replace LOAD_GLOBALs with a lookup at the
  corresponding index into the global name table?


 But we don't know statically what the globals will be.  You can import a
 module and put something in its global namespace externally.  That is done
 after load time or compile time.

I think that it can be implemented for the language as it stands now.
I don't know whether it will be a good thing or not.

In principle, you can add a feature to dict implementation that will
allow it to notify when the value of a specific key changes. If you
have that, you can change LOAD_GLOBAL implementation to:
1. look for the global.
2. ask for notification when the global dict changes in a way that
will change the meaning of the global.
3. change the LOAD_GLOBAL opcode to something like LOAD_CONST, and set
the notification from the dict to update the LOAD_CONST opcode to the
new object.

In that way, LOAD_GLOBAL will cause a dict lookup only once. Changing
the value of globals will require more work, though.

Again, I'm not saying that it's desired, just that it's possible.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] ANN: byteplay - a bytecode assembler/disassembler

2006-08-14 Thread Noam Raphael
Hello,

I wanted to tell you that I wrote a Python bytecode
assembler/disassembler, and would be happy if people tried it and said
what they think.

I send this message to this list because this module deals with pretty
low-level Python, so I thought it might interest the audience here. If
I was wrong - please forgive me.

The idea is to define an object which is equivalent to Python's code
object, but which is easy to work with. To explain what I mean, I'll
show a quick example. We can define this stupid function:

 def f(a, b):
... print (a, b)

We can convert it to an equivalent object, and see how it stores the byte code:

 from byteplay import *
 c = Code.from_code(f.func_code)
 from pprint import pprint; pprint(c.code)
[(SetLineno, 2),
 (LOAD_FAST, 'a'),
 (LOAD_FAST, 'b'),
 (BUILD_TUPLE, 2),
 (PRINT_ITEM, None),
 (PRINT_NEWLINE, None),
 (LOAD_CONST, None),
 (RETURN_VALUE, None)]

We can change the bytecode easily, and see what happens:

 c.code[3:3] = [(ROT_TWO, None)]
 f.func_code = c.to_code()
 f(3, 5)
(5, 3)

The idea is basically the same as that of Michael Hudson's
bytecodehacks, but this one works with Python 2.4 and 2.5. I also
think that it's simpler to use. I borrowed some code from Phillip J.
Eby's peak.util.assembler - the main difference between his package
and mine is that mine lets you play with existing bytecode, not only
create new code objects.

I learned a lot about Python's bytecode from writing this, and I think
that other may learn from it as well - I think it's much easier to
understand how code objects work by understanding equivalent objects
which were meant to be as simple as possible, instead of as fast as
possible.

I think it got pretty good testing - I patched __import__ so that
after a module is imported, all function objects (found by the gc
module) were disassembled and assembled again. I then managed to get
the complete test suite to pass!

You can download the module from
http://byteplay.googlecode.com/svn/trunk/byteplay.py . I wrote a
documentation (which goes into some length in purpose of explaining
how bytecode works). It's on the wiki:
http://wiki.python.org/moin/ByteplayDoc .

I even thought that it might get to the standard library, because it
seemed to me to be a pretty good documentation of bytecode details,
and because it has a pretty straightforward interface. It also makes
meddling with bytecode much less dangerous - for example, you can see
Raymond's original recipe for binding constants at compile time (I
reposted it at http://python.pastebin.com/768312) and how using
byteplay makes it simple (posted at http://python.pastebin.com/768318
so you can view the diff). But, of course - it's up to you. I will be
entirely satisfied if people simply find it useful.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Empty Subscript PEP on Wiki - keep or toss?

2006-07-01 Thread Noam Raphael
Hello,

I posted it as a pre-PEP, in the hope that it may become a PEP and
be accepted. As it happened, Guido said no at the end, so I stopped
pushing the subject.

I think that the main reason for the no was that my use case wasn't
convincing enough - the objections were that this wasn't useful
enough, not that it does anything harmful*. As the one who does think
it's useful, I have the tiniest hope that if, in the future, people
will become familiar with the package and see the usefulness of
allowing empty subscript list, the decision will change. If the only
result of me posting it as a PEP is a final rejected status that
will prevent any chance of that happening, I don't think I'll bother
to make it a PEP. If it's not the case, then I'll make it a PEP and
post it.

Have a good week,
Noam

* Yes, I know that adding unneeded feature to the language can be
considered harmful. You may not agree with my distinction in this
case. As it is, I barely consider this as an added feature - I would
say it's mostly a small generalization.

2006/6/30, Georg Brandl [EMAIL PROTECTED]:
 [EMAIL PROTECTED] wrote:
  Noam Raphael posted an empty subscript PEP on the Python Wiki:
 
  http://wiki.python.org/moin/EmptySubscriptListPEP
 
  It's not linked to by any other pages on the wiki.  Is there a reason it
  wasn't added to the peps repository?

 Perhaps the author forgot to submit it to the PEP editor, or he decided
 to abandon it after the mostly negative discussion here.

 Georg

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/noamraph%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-18 Thread Noam Raphael
2006/6/18, Shane Hathaway [EMAIL PROTECTED]:
 Try to think more about how users will use your API.  You haven't
 specified where those names (sheet1, income_tax, and profit) are coming
 from.  What do you expect users of your library to do to bring those
 names into their namespace?

That's a good question. I'm going to do some bytecode hacks! Something
like this:

from spreadsheetlib import SourceCube, CalculatedCube
income_tax = SourceCube([])
income_tax[] = 0.18
years = set([2003, 2004, 2005])
profit = SourceCube([years])
profit[2003] = 1000; profit[2004] = 2000; profit[2005] = 2500
real_profit = CalculatedCube([years], lambda year: profit[year] / (1+
income_tax[]))
print real_profit[2004]
(1694.9152542372883)

It may be what Talin meant about a higher level language, but I
don't really change the language - I only inspect the function to see
on what other changeable objects it depends. Those changeable objects
implement some sort of change notification protocol, and it allows the
system to automatically recalculate the result when one of the values
it depends on changes.

(Actually, I intend to change the function to depend directly on the
changeable object instead of look it up every time in the global
namespace, but I don't think that it changes the explanation.)

Note that requiring that all changeable objects will be attributes of
some other object won't remove the need for bytecode hacking: the only
way is to explicitly specify a list of all the objects that the
function depends on, and then give a function that gets these as
arguments. This will really be inconvenient.

But thanks for the suggestion!

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-17 Thread Noam Raphael
Hello,

2006/6/16, Josiah Carlson [EMAIL PROTECTED]:
 I'm not a mathematician, and I don't really work with arrays of any
 dimensionality, so the need for 0-D subscripting via arr[] while being
 cute, isn't compelling to my uses for Python.

Thanks for appreciating its cuteness...

 Now, I appreciate the desire to reduce code length and complexity, but
 from what I understand, the ultimate result of such a change to your
 code would be to go from:
 arr[()]
 to:
 arr[]

 I don't see how this can reduce lines of code in implementation or use.
 At most it is a two characters per use, and a change in documentation
 (specifying how you subscript 0-D arrays).  If you can show an example
 where actual code line count is reduced with this change, I can't
 guarantee that I would get behind this proposal in a few months (if the
 conversation starts up again), but it may make me feel less that your
 proposal is essentially about aesthetics.

I meant the extra code for writing a special class to handle scalars,
if I decide that the x[()] syntax is too ugly or too hard to type,
so I write a special class which will allow the syntax x.value.

The extra parentheses might not seem to matter for code using that
library, but I intend for people to use it directly, in an interactive
way, just like you type an expression in a spreadsheet. I expect that
for such a usage, the extra parentheses will be slightly unfun.

I know that it's not such a big difference, but I'm not talking about
a big change to the language either - it affects less than 20 lines of
code (probably could be done with even less), and doesn't cause any
problems with anything.

I can imagine Guido designing the grammar, thinking, Should I allow
an empty subscript list? No, why should anyone want such a thing?
Besides, if someone wants them, we can always add them later. - at
least, it may be how I would think if I designed a language. So now, a
use was found. Indeed, it is farely rare. But why not to allow it now?

Have a good week,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-17 Thread Noam Raphael
2006/6/17, Martin v. Löwis [EMAIL PROTECTED]:
 Noam Raphael wrote:
  I meant the extra code for writing a special class to handle scalars,
  if I decide that the x[()] syntax is too ugly or too hard to type,
  so I write a special class which will allow the syntax x.value.

 What I cannot understand is why you use a zero-dimensional array to
 represent a scalar. Scalars are directly supported in Python:

 x = 5

I need a zero-dimensional array as a single cell - an object that
holds a value that can change over time. It works just like a cell in
a spreadsheet: For example, say that if you change the value of cell
A1 to 0.18, cell A2 changes to 5. When using the library I design, you
would write sheet1[0, 0] = 0.18, and, magically, sheet1[0, 1] will
become 5. But in my library, everything is meaningful and doesn't have
to be two-dimensional. So, if in the spreadsheet example, A1 meant the
income tax rate, you would write income_tax[] = 0.18, and,
magically, profit['Jerusalem', 2005] will become 5.

I hope I managed to explain myself - my use case and why the simplest
way to treat scalars like income_tax is as zero-dimensional arrays.

 Also, in an assignment, what are you putting on the right-hand side?
 A read access from another zero-dimensional array?

I hope my example explained that, but you can put there any object -
for example, you can write income_tax[] = 0.18

(If I didn't yet manage to explain myself, please say so - it seems
that it's not a very simple example and I'm not a very good explainer,
at least in English.)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-17 Thread Noam Raphael
Hi, sorry for my repeated posts. I just wanted to say that I improved
my patch a little bit, so it does exactly the same thing, but with
smaller code: you can see for yourself at
http://python.pastebin.com/715221 - it changed exactly 10 lines of
code, and adds additional 8 lines, all of them really short and
obvious.

I thought that it might convince someone that it's just a little
generalization of syntax, nothing frightening...

Noam

2006/6/17, Noam Raphael [EMAIL PROTECTED]:
 I know that it's not such a big difference, but I'm not talking about
 a big change to the language either - it affects less than 20 lines of
 code (probably could be done with even less), and doesn't cause any
 problems with anything.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-16 Thread Noam Raphael
Hello,

It seems to me that people don't object to my proposal, but don't find
it useful to them either.

The question is, what to do next. I guess one possibility is to raise
this discussion again in a few months, when people will be less
occupied with 2.5 beta. This is ok, although I would prefer a decision
before that, because it might affect the design of the library -
should I find a permanent workaround, or one that I know that will be
removed in the future.

If you do want to continue the discussion to reach a decision, please
do. You can say that if nobody else on python-dev is interested, it
shouldn't be implemented. You can examine my use case, say if you
think it's reasonable, and suggest alternative solutions - or say that
you see how allowing empty subscript list solves it elegantly (yes!)

My point is, I don't want this discussion to naturally die because
nobody is interested, since I am interested. So please say what you
think should happen to it, so we can reach a conclusion.

Now, if a the discussion is to continue, Nick proposed an alternative:

2006/6/11, Nick Coghlan [EMAIL PROTECTED]:
 For your specific use cases, though, I'd be inclined to tweak the API a bit,
 and switch to using attributes for the single-valued data:

 tax_rates.income_tax = 0.18

It's probably ok, although I would prefer not having to artificially
group scalars just to make them attributes of something. I would
prefer remaining with one object, and having something like
income_tax.setvalue(), or even income_tax.value.

 Although the income tax rate should actually depend on the current financial
 year, since it can change over time as the government increases taxes ;)

But that's exactly why I prefer writing simply income_tax[] = 0.18
when it's a constant, which is completely analogous to
income_tax[2005] = 0.17; income_tax[2006] = 0.18 when it depends on
something.

By the way, another thing about consistency: A friend of mine brought
the point that there isn't another example of forbidden empty brackets
- [], {}, (), x() are all allowed.

And about the other thing Nick said:
 I guess I'm really only -0 on the idea of x[] invoking x.__getitem__(), and
 allowing the class to decide whether or not to define a default value for the
 subscript. I wouldn't implement it myself, but I wouldn't object strenuously
 if Guido decided it was OK :)

I would prefer an empty tuple, since invoking __getitem__ with no
arguments would be a special case: for all other possible subscript
lists, exactly one argument is passed to __getitem__. This leaves us
with one special case: a subscript list with one item and without a
trailing comma results in __getitem__ not getting a tuple, where in
all other cases it does get a tuple. This works exactly like
parentheses: they don't mean a tuple only when there's one item inside
them and no trailing comma.

Good bye,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-10 Thread Noam Raphael
Hello,

I'll try to answer the questions in one message. Sorry for not being
able to do it until now.

About the joke - it isn't, I really need it.

About the timing - Of course, I can live with this getting into 2.6,
and I think that I may even be able to stay alive if this were
rejected. I still personally think that if people agree that it's a
good idea it might get in, since there's almost nothing to be decided
except for that - but of course, I can understand not wanting to rush
things too much.

About whether NumPy should return real scalars or 0-dimensional arrays
- I don't know.

About the use case - I think that's the real thing I didn't explain
well and needs explanation, so I will try to do it better this time.

I'm talking about something similar to a spreadsheet in that it saves
data, calculation results, and the way to produce the results.
However, it is not similar to a spreadsheet in that the data isn't
saved in an infinite two-dimensional array with numerical indices.
Instead, the data is saved in a few tables, each storing a different
kind of data. The tables may be with any desired number of dimensions,
and are indexed by meaningful indices, instead of by natural numbers.

For example, you may have a table called sales_data. It will store the
sales data in years from set([2003, 2004, 2005]), for car models from
set(['Subaru', 'Toyota', 'Ford']), for cities from set(['Jerusalem',
'Tel Aviv', 'Haifa']). To refer to the sales of Ford in Haifa in 2004,
you will simply write: sales_data[2004, 'Ford', 'Haifa']. If the table
is a source of data (that is, not calculated), you will be able to set
values by writing: sales_data[2004, 'Ford', 'Haifa'] = 1500.

Tables may be computed tables. For example, you may have a table which
holds for each year the total sales in that year, with the income tax
subtracted. It may be defined by a function like this:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate)

Now, like in a spreadsheet, the function is kept, so that if you
change the data, the result will be automatically recalculated. So, if
you discovered a mistake in your data, you will be able to write:

sales_data[2004, 'Ford', 'Haifa'] = 2000

and total_sales[2004] will be automatically recalculated.

Now, note that the total_sales table depends also on the
income_tax_rate. This is a variable, just like sales_data. Unlike
sales_data, it's a single value. We should be able to change it, with
the result of all the cells of the total_sales table recalculated. But
how will we do it? We can write

income_tax_rate = 0.18

but it will have a completely different meaning. The way to make the
income_tax_rate changeable is to think of it as a 0-dimensional table.
It makes sense: sales_data depends on 3 parameters (year, model,
city), total_sales depends on 1 parameter (year), and income_tax_rate
depends on 0 parameters. That's the only difference. So, thinking of
it like this, we will simply write:

income_tax_rate[] = 0.18

Now the system can know that the income tax rate has changed, and
recalculate what's needed. We will also have to change the previous
function a tiny bit, to:

lambda year: sum(sales_data[year, model, city] for model in models for
city in cities) / (1 + income_tax_rate[])

But it's fine - it just makes it clearer that income_tax_rate[] is a
part of the model that may change its value.

I hope that I managed to explain the use case better this time -
please ask if my description isn't clear enough.

Thanks for your comments, please send more,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-10 Thread Noam Raphael
Hello,

2006/6/10, Nick Coghlan [EMAIL PROTECTED]:
 The closest parallel would be with return/yield, as those actually create real
 tuples the same way subscripts do, and allow the expression to be omitted
 entirely.

 By that parallel, however, an implicit subscript (if adopted) should be None
 rather than ().

 Adapting the table from the pre-PEP to describe return statements (and yield
 expressions):

  return i, j, k  --  return (i, j, k)
  return i, j --  return (i, j)
  return i,   --  return (i, )
  return i--  return (i)
return ()# (No implicit equivalent)
  return  --  return None

 With the status quo, however, subscripts are simply equivalent to the RHS of
 an assignment statement in *requiring* that the expression be non-empty:

  x = i, j, k  --  x = (i, j, k)
  x = i, j --  x = (i, j)
  x = i,   --  x = (i, )
  x = i--  x = (i)
 x = () # (No implicit equivalent)
 x = None   # (No implicit equivalent)

 The PEP doesn't make a sufficiently compelling case for introducing
 yet-another-variant on the implicit behaviour invoked when a particular
 subexpression is missing from a construct.

I hope that my (hopefully) better explanation made the use case more
compelling, but I want to add two points in favour of an empty tuple:

1. If you want, you can ignore the x[(i, j, k)] equivalence
completely, since it doesn't work all the times - for example, you can
write x[1:2, 3:4], but you can't write x[(1:2, 3:4)]. You can
think of x[i, j, k] as a syntax for specifying a cell in a
3-dimensional array, resulting in a call to x.__getitem__ with a
3-tuple describing the subscript for each dimension. In that view,
x[], which is a syntax for specifying the cell of a 0-dimensional,
should result in a __getitem__ call with an empty tuple, as there are
no subscripts to be described.

2. My equivalencies are better than yours :-), since they are dealing
with equivalencies for this specific syntax, while yours are dealing
with similar properies of a syntax for doing something completely
different.

 I guess I could have gone with my initial instinct of -1 and saved myself some
 mental exercise ;)

Why? Mental exercise is a good way to keep you mental ;)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pre-PEP: Allow Empty Subscript List Without Parentheses

2006-06-09 Thread Noam Raphael
Hello,

Recently I discovered that a small change to the Python grammar that
could help me a lot.

It's simply this: Currently, the expression x[] is a syntax error. I
suggest that it will be a valid syntax, and equivalent to x[()],
just as x[a, b] is equivalent to x[(a, b)] right now.

I discussed this in python-list, and Fredrik Lundh suggested that I
quickly write a pre-PEP if I want this to go into 2.5. Since I want
this, I wrote a pre-PEP.

It's available in the wiki, at
http://wiki.python.org/moin/EmptySubscriptListPEP and I also copied it
to this message.

I know that now is really close to 2.5b1, but I thought that perhaps
there was still a chance for this suggestion getting in, since:
 * It's a simple change and there's almost nothing to be decided
except whether to accept it or not.
 * It has a simple implementation (It was fairly easy for me to
implement it, and I know almost nothing about the AST).
 * It causes no backwards compatibilities issues.

Ok, here's the pre-PEP. Please say what you think about it.

Have a good day,
Noam


PEP: XXX
Title: Allow Empty Subscript List Without Parentheses
Version: $Revision$
Last-Modified: $Date$
Author: Noam Raphael [EMAIL PROTECTED]
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 09-Jun-2006
Python-Version: 2.5?
Post-History: 30-Aug-2002

Abstract


This PEP suggests to allow the use of an empty subscript list, for
example ``x[]``, which is currently a syntax error. It is suggested
that in such a case, an empty tuple will be passed as an argument to
the __getitem__ and __setitem__ methods. This is consistent with the
current behaviour of passing a tuple with n elements to those methods
when a subscript list of length n is used, if it includes a comma.


Specification
=

The Python grammar specifies that inside the square brackets trailing
an expression, a list of subscripts, separated by commas, should be
given. If the list consists of a single subscript without a trailing
comma, a single object (an ellipsis, a slice or any other object) is
passed to the resulting __getitem__ or __setitem__ call. If the list
consists of many subscripts, or of a single subscript with a trailing
comma, a tuple is passed to the resulting __getitem__ or __setitem__
call, with an item for each subscript.

Here is the formal definition of the grammar:

::
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: '.' '.' '.' | test | [test] ':' [test] [sliceop]
sliceop: ':' [test]

This PEP suggests to allow an empty subscript list, with nothing
inside the square brackets. It will result in passing an empty tuple
to the resulting __getitem__ or __setitem__ call.

The change in the grammar is to make subscriptlist in the first
quoted line optional:

::
trailer: '(' [arglist] ')' | '[' [subscriptlist] ']' | '.' NAME


Motivation
==

This suggestion allows you to refer to zero-dimensional arrays elegantly. In
NumPy, you can have arrays with a different number of dimensions. In
order to refer to a value in a two-dimensional array, you write
``a[i, j]``. In order to refer to a value in a one-dimensional array,
you write ``a[i]``. You can also have a zero-dimensional array, which
holds a single value (a scalar). To refer to its value, you currently
need to write ``a[()]``, which is unexpected - the user may not even
know that when he writes ``a[i, j]`` he constructs a tuple, so he
won't guess the ``a[()]`` syntax. If the suggestion is accepted, the
user will be able to write ``a[]`` in order to refer to the value, as
expected. It will even work without changing the NumPy package at all!

In the normal use of NumPy, you usually don't encounter
zero-dimensional arrays. However, the author of this PEP is designing
another library for managing multi-dimensional arrays of data. Its
purpose is similar to that of a spreadsheet - to analyze data and
preserve the relations between a source of a calculation and its
destination. In such an environment you may have many
multi-dimensional arrays - for example, the sales of several products
over several time periods. But you may also have several
zero-dimensional arrays, that is, single values - for example, the
income tax rate. It is desired that the access to the zero-dimensional
arrays will be consistent with the access to the multi-dimensional
arrays. Just using the name of the zero-dimensional array to obtain
its value isn't going to work - the array and the value it contains
have to be distinguished.


Rationale
=

Passing an empty tuple to the __getitem__ or __setitem__ call was
chosen because it is consistent with passing a tuple of n elements
when a subscript list of n elements is used. Also, it will make NumPy
and similar packages work as expected for zero-dimensional arrays
without
any changes.

Another hint for consistency: Currently, these equivalences hold:

::
x[i, j, k]  --  x[(i, j, k

Re: [Python-Dev] Alternative path suggestion

2006-05-07 Thread Noam Raphael
Hello all again!

Thanks to Mike's suggestion, I now opened a new wiki page,
AlternativePathDiscussion, in
http://wiki.python.org/moin/AlternativePathDiscussion

The idea is to organize the discussion by dividing it into multiple
sections, and seeing what is agreed and what should be further
discussed. I now wrote there what I think, and quoted a few opinions
from other posts. The posts by others are only a minority - what's
written there currently is mostly what I think. I'm sorry for the
inconvinience, but please go there and post your opinions (you can
copy and paste from your emails, of course).

I apologize first for not replying to each post, and second for only
writing my opinions in the wiki. I simply write pretty slowly in
English, and am already developing a growing sleep-hours deficit. I
hope you forgive me.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Saving the hash value of tuples

2006-04-02 Thread Noam Raphael
On 4/2/06, Guido van Rossum [EMAIL PROTECTED] wrote:
  I tried the change, and it turned out that I had to change cPickle a
  tiny bit: it uses a 2-tuple which is allocated when the module
  initializes to lookup tuples in a dict. I changed it to properly use
  PyTuple_New and Py_DECREF, and now the complete test suite passes. I
  run test_cpickle before the change and after it, and it took the same
  time (0.89 seconds on my computer).

 Not just cPickle. I believe enumerate() also reuses a tuple.

Maybe it does, but I believe that it doesn't calculate the hash value
of it - otherwise, the test suite would probably have failed.

  What do you think? I see three possibilities:
1. Nothing should be done, everything is as it should be.
2. The cPickle module should be changed to not abuse the tuple, but
  there's no reason to add an extra word to the tuple structure and
  break binary backwards compatibility.
3. Both should be changed.

 I'm -1 on the change. Tuples are pretty fundamental in Python and
 hashing them is relatively rare. I think the extra required space for
 all tuples isn't worth the potential savings for some cases.

That's fine with me. But what about option 2? Perhaps cPickle (and
maybe enumerate) should properly discard their tuples, so that if
someone in the future decides that saving the hash value is a good
idea, he won't encounter strange bugs? At least in cPickle I didn't
notice any loss of speed because of the change, and it's quite
sensible, since there's a tuple-reuse mechanism anyway.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Saving the hash value of tuples

2006-04-01 Thread Noam Raphael
Hello,

I've found out that the hash value of tuples isn't saved after it's
calculated. With strings it's different: the hash value of a string is
calculated only on the first call to hash(string), and saved in the
structure for future use. Saving the value makes dict lookup of tuples
an operation with an amortized cost of O(1).

Saving the hash value means that if an item of the tuple changes its
hash value, the hash value of the tuple won't be changed. I think it's
ok, since:
  1. Hash value of things shouldn't change.
  2. Dicts assume that too.

I tried the change, and it turned out that I had to change cPickle a
tiny bit: it uses a 2-tuple which is allocated when the module
initializes to lookup tuples in a dict. I changed it to properly use
PyTuple_New and Py_DECREF, and now the complete test suite passes. I
run test_cpickle before the change and after it, and it took the same
time (0.89 seconds on my computer).

What do you think? I see three possibilities:
  1. Nothing should be done, everything is as it should be.
  2. The cPickle module should be changed to not abuse the tuple, but
there's no reason to add an extra word to the tuple structure and
break binary backwards compatibility.
  3. Both should be changed.

I will be happy to send a patch, if someone shows interest.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Saving the hash value of tuples

2006-04-01 Thread Noam Raphael
Ok, I uploaded it.
Patch no. 1462796:
https://sourceforge.net/tracker/index.php?func=detailaid=1462796group_id=5470atid=305470


On 4/1/06, Aahz [EMAIL PROTECTED] wrote:
 On Sat, Apr 01, 2006, Noam Raphael wrote:
 
  I've found out that the hash value of tuples isn't saved after it's
  calculated. With strings it's different: the hash value of a string is
  calculated only on the first call to hash(string), and saved in the
  structure for future use. Saving the value makes dict lookup of tuples
  an operation with an amortized cost of O(1).
   [...]
  I will be happy to send a patch, if someone shows interest.

 Regardless of whether anyone shows interest, please submit a patch!  Then
 post the URL back here.  That way if someone gets interested in the
 future, your code is still available.
 --
 Aahz ([EMAIL PROTECTED])   * http://www.pythoncraft.com/

 Look, it's your affair if you want to play with five people, but don't
 go calling it doubles.  --John Cleese anticipates Usenet

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351

2006-02-11 Thread Noam Raphael
Hello,

I just wanted to say this: you can reject PEP 351, please don't reject
the idea of frozen objects completely. I'm working on an idea similar
to that of the PEP, and I think that it can be done elegantly, without
the concrete problems that Raymond pointed. I didn't work on it in the
last few weeks, because of my job, but I hope to come back to it soon
and post a PEP and a reference implementation in CPython.

My quick responses, mostly to try to convince that I know a bit about
what I'm talking about:

First about the last point: I suggest that the function will be named
frozen(x), which suggests that nothing happens to x, you only get a
frozen x. I suggest that this operation won't be called freezing
x, but making a frozen copy of x.

Now, along with the original order. Frozen dicts - if you want, you
can decide that dicts aren't frozenable, and that's ok. But if you do
want to make frozen copies of dicts, it isn't really such a problem -
it's similar to hashing a tuple, which requires recursive hashing of
all its elements; for making a frozen copy of a dict, you make a
frozen copy of all its values.

Treating all containers polymorphically - I don't suggest that. In my
suggestion, you may have frozen lists, frozen tuples (which are normal
tuples with frozen elements), frozen sets and frozen dicts.

Treating tuples as frozen lists - I don't suggest to do that. But if
my suggestion is accepted, there would be no need for tuples - frozen
lists would be just as useful.

And about the other concerns:

 More important than all of the above is the thought that auto-freezing is
 like a bad C macro, it makes too much implicit and hides too much -- the
 supported methods change, there is a issue keeping in sync with the
 non-frozen original, etc.

 In my experience with frozensets, I've learned that freezing is not an
 incidental downstream effect; instead, it is an intentional, essential part
 of the design and needs to be explicit.

I think these concerns can only be judged given a real suggestion,
along with an implementation. I have already implemented most of my
idea in CPython, and I think it's elegant and doesn't cause problems.
Of course, I may not be objective about the subject, but I only ask to
wait for the real suggestion before dropping it down.

To summarize, I see the faults in PEP 351. I think that another,
fairly similar idea might be a good one.

Have a good week,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-31 Thread Noam Raphael
Hello,

I thought about another reason to resize the hash table when it has
too few elements. It's not only a matter of memory usage, it's also a
matter of time usage: iteration over a set or a dict requires going
over all the table. For example, iteration over a set which once had
1,000,000 members and now has 2 can take 1,000,000 operations every
time you traverse all the (2) elements.

Apologies:
1. It may be trivial to you - I'm sorry, I thought about it just now.
2. You can, of course, still do whatever tradeoff you like.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-31 Thread Noam Raphael
On 12/31/05, Raymond Hettinger [EMAIL PROTECTED] wrote:
 [Noam]
  For example, iteration over a set which once had
  1,000,000 members and now has 2 can take 1,000,000 operations every
  time you traverse all the (2) elements.

 Do you find that to be a common or plausible use case?

I don't have a concrete example in this minute, but a set which is
repeatedly filled with elements and then emptied by pop operations
doesn't seem to me that far-fetched.

 Was Guido's suggestion of s=set(s) unworkable for some reason?  dicts
 and sets emphasize fast lookups over fast iteration -- apps requiring
 many iterations over a collection may be better off converting to a list
 (which has no dummy entries or empty gaps between entries).

It's workable, but I think that most Python programmers haven't read
that specific message, and are expecting operations which should take
a short time to take a short time. Converting to a list won't help the
use-case above, and anyway, it's another thing that I wouldn't expect
anyone to do - there's no reason that iteration over a set should take
a long time.

(I'm speaking of my point of view, which I believe is common. I don't
expect programs I write in Python to be super-fast - if that were the
case, I would write them in C. I do expect them to take a reasonable
amount of time, and in the case of iteration over a set, that means a
time proportional to the number of elements in the set.)

 Would the case be improved by incurring the time cost of 999,998 tests
 for possible resizing (one for each pop) and some non-trivial number of
 resize operations along the way (each requiring a full-iteration over
 the then current size)?

I believe it would. It seems to me that those 999,998 tests take not
much more than a machine clock, which means about 1 milisecond on
todays computers. Those resize operations will take some more
miliseconds. It all doesn't really matter, since probably all other
things will take much more. I now run this code

 s = set()
 for j in xrange(100):
... s.add(j)
...
 while s:
... tmp = s.pop()
...

And it took 2.4 seconds to finish. And it's okay - I'm just saying
that a few additional clock ticks per operation will usually not
matter when the overall complexity is the same, but changes in order
of complexity can matter much more.

 Even if this unique case could be improved, what is the impact on common
 cases?  Would a downsizing scheme risk thrashing with the
 over-allocation scheme in cases with mixed adds and pops?

I think that there shouldn't be additional damage beyond those clock
ticks. The simple method I took from Introduction to Algorithms
works no matter what sequence of adds and pops you have.

 Is there any new information/research beyond what has been obvious from
 the moment the dict resizing scheme was born?

I wanted to say that there isn't any new information, and yet I don't
think that I have to assume that everything in current Python is the
best that can be. All I did was finding another reason why a
downsizing scheme might be good, and posting it to ask if people have
thought about it. If you have a document listing all the design
decisions that went into dict implementation, then please send it to
me and I won't ask about things that were already thought about.

But the answer is, yes. I beleive that the current dict resizing
scheme was born before the iterator protocol was introduced, and it
may be a reason why the current scheme doesn't try to minimize the
number of empty hash table entries.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-29 Thread Noam Raphael
On 12/29/05, Fredrik Lundh [EMAIL PROTECTED] wrote:
 Noam Raphael wrote:

  I'm not saying that practically it must be used - I'm just saying that
  it can't be called a heuristic, and that it doesn't involve any fancy
  overkill size hinting or history tracking. It actually means
  something like this:
  1. If you want to insert and the table is full, resize the table to
  twice the current size.
  2. If you delete and the number of elements turns out to be less than
  a quarter of the size of the table, resize the table to half of the
  current size.

 sure sounds like a heuristic algorithm to me... (as in not guaranteed to
 be optimal under all circumstances, even if it's probably quite good in all
 practical cases)

I'm not saying it's optimal, but it is really amortized O(1) per
insert/delete. I looked up in Introduction to Algorithms for this,
and it has a complicated explanation. A simple explanation is that
after every resize the table is exactly half-full. Let's say it has n
elements and the table size is 2*n. To get to the next resize, you
have to do at least n/2 removals of elements, or n insertion of
elements. After that, you do a resize operation. In either case, you
do an O(n) resize operation after at least O(n) insertions/removals
which are O(1) operations. This means that the toal cost remains O(n)
per n simple operations, which you can say is O(1) per simple
operation.

I hope that if you read this slowly it makes sense...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A few questions about setobject

2005-12-28 Thread Noam Raphael
On 12/28/05, Martin v. Löwis [EMAIL PROTECTED] wrote:

 The setentry typedef clearly violates the principles of the API, so
 it should be renamed.
(And so will _setobject, although I think it will be much easier to remove it.)

  Perhaps the header file should stick
  with writing struct { long hash; PyObject *key; } three times (or
  define it in a macro and then undefine it), and the typedef be left to
  the .c file?

 That would not be conforming to the C language standard (although
 accepted by most compilers).

Can you explain why it is not conforming to the standard? Can't a
typedef be used intechangably with the original type?

 Not sure what this refers to in your message: the text of the C API
 documentation certainly is desirable as it stands (although it should
 be clearer as to whether struct names should be prefixed).

  Even if it is, it seems that the second sentence
  contradicts the first sentence.

 Why does that seem so? To quote, the first sentence is

 'All user visible names defined by Python.h (except those defined by
 the included standard headers) have one of the prefixes Py or _Py.'

 and the second sentence is

 'Names beginning with _Py are for internal use by the Python
 implementation and should not be used by extension writers.'

 I cannot see any contradiction between these.

Oops. It's the first and the third:
The first: All user visible names defined by Python.h (except those
defined by the included standard headers) have one of the prefixes
Py or _Py.
The third: Structure member names do not have a reserved prefix.

I think that structure member names refers to things like setentry.
The third sentence contradicts the first since structure member names
are user visible names. Anyway, it seems to me best if the third
sentence will be removed and all names will start with Py or _Py.

  I think it should be ok because it's never used
  really as a PyObject. Am I missing something? (Ok, I now thought that
  maybe it's because some parts don't treat dummy elements specially.
  But it seems to me that most parts do treat them specially, so perhaps
  it would be better to make a few changes so that all parts will treat
  them specially?)

 In principle, you are right. One place that doesn't special-case the
 dummy is set_clear_internal (in fact, the Py_DEBUG assertion is
 completely useless there, AFAICT).

 The tricky question is: can we be certain that we find all places,
 in all code paths, where we have to special-case the dummy? Having
 PyObject* which don't point to PyObject is error-prone.

 Also, what would we gain? If you think it is speed: I doubt it. In
 one place, a comment suggests that actually seeing the dummy element
 is so much more unlikely than the other cases that, for performance,
 the test for the dummy is done last. We would lose additional speed
 in the cases where the dummy isn't yet considered.

Ok, I tried. It took me 25 minutes to change the code, while going
over every occurence of key and decref in setobject.c. (It seems
to me that the dummy element can only be accessed from entry-key.)
Most of the code isn't bothered by the dummy element, since it uses
the data structure in a more abstract way. I think that it simplifies
the code, while adding a condition in only two places, which I don't
think should make anything noticably slower. The result passes the
complete test suite. In one sentence: I think that it makes the code
slightly better, and the risk is pretty low.

I thought to post it as a patch, but sourceforge didn't work for me,
and it's not that long, so I paste it at the end of this message. Feel
free to do whatever you want with it.

  3) The type of the result of a binary operator applied on a set and a
  frozenset is the type of the left set. You are welcomed to ignore
  this, but I just wanted to say that it seems to me better to make the
  operator symmetric, and to return a frozenset only if both sets are
  frozen.

 How would you implement this? The result is obtained from copying the
 left operand, and then applying the other operand. This is done so
 that set subtyping becomes possible:

  class myset(set):pass
 ...
  x=myset([2,6])
  y=set([2,6])
  x.union(y)
 myset([2, 6])

 So if the result is not obtained by copying the left operand first,
 how would you compute the result type, so that this example still
 works?

The behaviour will change to work like in other types - the returned
value will be of the base type:
 class MyInt(int): pass
...
 x = MyInt(3)
 y = 5
 x.__add__(y)
8

I'm not talking about backwards compatibility - I'm just currently
asking if others also feel that the symmetric version is preferable.

Ok, here's the diff:

=== modified file 'Objects/setobject.c'
--- Objects/setobject.c
+++ Objects/setobject.c
@@ -13,8 +13,12 @@
 /* This must be = 1. */
 #define PERTURB_SHIFT 5

-/* Object used as dummy key to fill deleted entries */
-static PyObject *dummy = NULL; /* Initialized by first call to
make_new_set() */
+/* 

Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-28 Thread Noam Raphael
And another example:

 a = 1+2j
 b = 2+1j
 a  b
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: no ordering relation is defined for complex numbers

I came to think that, when forgetting backwards compatibility for a
while, the best thing for comparison operators to do is to raise a
TypeError by default, and work only for types that it makes sense to
compare. I think it is more explicit is better than implicit, and I
have now two reasons for that:
1. Things like Decimal(3.0) == 3.0 will make more sense (raise an
exception which explains that Decimals should not be compared to
floats, instead of returning false constantly).
2. It is more forward compatible - when it is discovered that two
types can sensibly be compared, the comparison can be defined, without
changing an existing behaviour which doesn't raise an exception.

Perhaps you can explain to me again why arbitrary objects should be
comparable? I don't see why sorting objects according to values should
work when the order has no real meaning. I don't see why you need all
objects to be comparable for storing them in containers with the
behaviour of dict or set.

If the reason is that you want containers that work among multiple
sessions, and are identity-based (that is, only one object can be
used as a key), you can attach to each object an id that isn't session
dependent, and use that instead of the default memory address.

It may be a reason for dropping the default hash is id: suppose that
you want a persistent storage that will work like dicts but will not
be associated with one Python session (it may be exactly Zope's BTrees
- I don't know). Currently you have a problem with using __hash__ for
that purpose, since the hash value of an object can change between
sessions - that happens when it's the id of the object. Now suppose
that we have a persistent id function - it can be implemented by
using weakrefs to associate a unique value with an object on the first
time that the function is called, and storing it with the object when
serializing it. Also suppose that we drop the default hash method, so
where currently hash(x) is id(x), hash(x) will raise a TypeError. Then
our persistent storage can use the persistent id instead of the
default id, and it will work. (You might not want mutable objects to
be used as keys, but that's another problem - the data structure will
be consistent anyway.)

In fewer words, the built-in id() is just one way to assign identities
to objects. __hash__ shouldn't use it implicitly when there's no
value-based hash value - if it wouldn't, the rule that x == y -
hash(x) == hash(y) will be preserved also between different sessions,
so persistent objects would be able to use hash values.

Does it make sense to you?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-28 Thread Noam Raphael
On 12/28/05, Adam Olsen [EMAIL PROTECTED] wrote:
 I agree for greaterthan and lessthan operators but I'm not convinced
 for equality.  Consider this in contrast to your example:
  a = 1+2j
  b = 1+2j
  a is b
 False
  a == b
 True

 Complex numbers can't be sorted but they can be tested for equality.
 Decimal numbers can't currently be tested for equality against a float
 but there's no loss-of-accuracy argument to prevent it (just a
 difficulty of implementation one.)

 If the comparison is to fail I would prefer an exception rather than
 an id-based fallback though.

I think we agree. I don't think that every type that supports equality
comparison should support order comparison. I think that if there's no
meaningful comparison (whether equality or order), an exception should
be raised.

 Speaking of id, there's no reason why id(a) == id(b) has to fail for
 mismatched types in the face of persistence so long as the result of
 id() has the same lifetime as the target object.  This could be done
 using weakrefs or by making an id type with a strong reference to the
 target object.

I don't mean to change the current behaviour of id() - I just meant
that an additional one may be implemented, possible by a specific
library (Zope, for instance), so the built-in one shouldn't be used as
a fallback default.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-28 Thread Noam Raphael
On 12/29/05, Robert Brewer [EMAIL PROTECTED] wrote:

  Just to keep myself sane...

  def date_range(start=None, end=None):
  if start == None:
  start = datetime.date.today()
  if end == None:
  end = datetime.date.today()
  return end - start

  Are you saying the if statements will raise TypeError if start or end are
 dates? That would be a sad day for Python. Perhaps you're saying that there
 is a meaningful comparison between None and anything else, but please
 clarify if so.

Yes, I'm suggesting that they will raise a TypeError. Your example
shows that the change is not compatible with a lot of existing Python
code, which means that it's a Python 3000 thing. The following code
will continue to work:

def date_range(start=None, end=None):
if start is None:
start = datetime.date.today()
if end is None:
end = datetime.date.today()
return end - start

Using is None instead of == None is considered a better style even now.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] set.copy documentation string

2005-12-28 Thread Noam Raphael
is currently Return a shallow copy of a set.

Perhaps shallow should be removed, since set members are supposed to
be immutable so there's no point in a deep copy?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
Hello,

If I do something like this:

s = set()
for i in xrange(100):
s.add(i)
while s:
s.pop()
gc.collect()

the memory consumption of the process remains the same even after the pops.

I checked the code (that's where I started from, really), and there's
nothing in set.pop or set.remove that resizes the table. And it turns
out that it's the same with dicts.

Should something be done about it?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] set.copy documentation string

2005-12-28 Thread Noam Raphael
On 12/29/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Noam Raphael wrote:
  is currently Return a shallow copy of a set.
 
  Perhaps shallow should be removed, since set members are supposed to
  be immutable so there's no point in a deep copy?

 That still doesn't make copy return a deep copy, right? shallow copy
 is more precise than copy, and correct - what is gained from
 removing it?

Perhaps it bothers the programmer with something that shouldn't bother
him. I mean that I might do help(set.copy), and then think, Oh, it
returns a shallow copy. Wait a minute - 'shallow' means that I get a
new object, which references the same objects as the old one. Perhaps
I should use another function, which does deep copying? Let me think
about it - no. All members of a set are immutable, so it doesn't
matter. I think that in this case, the fact that the copy is shallow
is an implementation detail, since there's no sense in making a deep
copy of a set. Implementation details should probably not be a part of
the definition of what a method does.

I know it's just a word, and that it doesn't matter a lot. But why not
make things a tiny bit better?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
On 12/29/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Noam Raphael wrote:
  Should something be done about it?

 It's very difficult to do something useful about it. Even if
 you manage to determine how much memory you want to release,
 it's nearly impossible to actually release the memory to the
 operating system, because of the many layers of memory
 management software that all need to agree that the memory
 should be reused somewhere else (instead of keeping it on
 that layer, just in case somebody using that layer wants
 some memory).

I checked - when doing the same thing with lists, all the memory was
released for use by other Python objects, and most of it was released
for use by the operating system.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] set.copy documentation string

2005-12-28 Thread Noam Raphael
On 12/29/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
 If makes no sense means would not make a difference, then you
 are wrong. Objects in a set are not necessarily unmodifiable;
 they just have to be hashable.

Oh, you are right. I thought so much about dropping default hash=id,
or more generally that only frozen objects should be hashable, that I
forgot that it's not the case yet... :)

(I used the term frozen instead of immutable since I think that
immutable is not defined very well, because tuples are considered
immutable even though their value can change if they reference mutable
objects.)

Thanks,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When do sets shrink?

2005-12-28 Thread Noam Raphael
On 12/29/05, Raymond Hettinger [EMAIL PROTECTED] wrote:
 What could be done is to add a test for excess dummy entries and trigger
 a periodic resize operation.  That would make the memory available for
 other parts of the currently running script and possibly available for
 the O/S.

 The downside is slowing down a fine-grained operation like pop().  For
 dictionaries, this wasn't considered worth it.  For sets, I made the
 same design decision.  It wasn't an accident.  I don't plan on changing
 that decision unless we find a body of real world code that would be
 better-off with more frequent re-sizing.

The computer scientist in me prefers O() terms over changes in a
constant factor, but that's only me. Perhaps a note about it should be
added to the documentation, though?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] A few questions about setobject

2005-12-26 Thread Noam Raphael
Hello,

I'm going over setobject.c/setobject.h, while trying to change them to
support cheap frozen-copying. I have a few questions.

1) This is a part of setobject.h:

typedef struct {
long hash;
PyObject *key;
} setentry;

typedef struct _setobject PySetObject;
struct _setobject {
...
setentry *table;
setentry *(*lookup)(PySetObject *so, PyObject *key, long hash);
setentry smalltable[PySet_MINSIZE];
...
};

It seems to me that setentry and _setobject are defined for every file
that includes Python.h. In the Python C API, in the section about
include files, it is written that:
All user visible names defined by Python.h (except those defined by
the included standard headers) have one of the prefixes Py or _Py.
Names beginning with _Py are for internal use by the Python
implementation and should not be used by extension writers. Structure
member names do not have a reserved prefix.

Is this desirable? Even if it is, it seems that the second sentence
contradicts the first sentence. Perhaps the header file should stick
with writing struct { long hash; PyObject *key; } three times (or
define it in a macro and then undefine it), and the typedef be left to
the .c file?


2) The hash table used by sets uses a dummy element for deleted
entries. The implementation goes into the trouble of allocating it,
managing its reference count, and deallocating it at the end. What is
the reason for that? It seems to me that the only requirement of the
dummy element is that it shouldn't be a pointer to a valid PyObject,
and as such I would think that defining it like

int dummy_int;
PyObject *dummy = (PyObject *)(dummy_int);

would be enough, and that it shouldn't be INCREFed or DECREFed every
time it is used. I think it should be ok because it's never used
really as a PyObject. Am I missing something? (Ok, I now thought that
maybe it's because some parts don't treat dummy elements specially.
But it seems to me that most parts do treat them specially, so perhaps
it would be better to make a few changes so that all parts will treat
them specially?)


3) The type of the result of a binary operator applied on a set and a
frozenset is the type of the left set. You are welcomed to ignore
this, but I just wanted to say that it seems to me better to make the
operator symmetric, and to return a frozenset only if both sets are
frozen.


If you think that these questions belong to c.l.py, then please say so
and I will go away.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A missing piece of information in weakref documentation

2005-12-12 Thread Noam Raphael
On 12/12/05, Aahz [EMAIL PROTECTED] wrote:
 Please submit a doc patch to SF (or even just a bug report if you don't
 have time).  The patch may be plain text or reST; no need for Latex.

Done - patch number 1379023.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison

2005-11-27 Thread Noam Raphael
On 11/27/05, Martin v. Löwis [EMAIL PROTECTED] wrote:
 Noam Raphael wrote:
   I would greatly appreciate repliers that find a tiny bit of reason in
   what I said (even if they don't agree), and not deny it all as a
   complete load of rubbish.

 I don't understand what your message is. With this posting, did you
 suggest that somebody does something specific? If so, who is that one,
 and what should he do?

Perhaps I felt a bit attacked. It was probably my fault, and anyway, a
general message like this is not the proper way - I'm sorry.


 Anyway, a lot of your posting is what I thought was common knowledge;
 and with some of it, I disagree.

This is fine, of course.
  We may want to compare wheels based on value, for example to make sure
  that all the car's wheels fit together nicely: assert car.wheel1 ==
  car.wheel2 == car.wheel3 == car.wheel4.

 I would never write it that way. This would suggest that the wheels
 have to be the same. However, this is certainly not true for wheels:
 they have to have to be of the same make. Now, you write that wheels
 only carry manufacturer and diameter. However, I would expect that
 wheels grow additional attributes over time, like whether they are
 left or right, and what their wear level is. So to write your property,
 I would write

 car.wheel1.manufacturer_and_make() ==
 car.wheel2.manufacturer_and_make() ==
 car.wheel3.manufacturer_and_make() ==
 car.wheel4.manufacturer_and_make()

You may be right in the case of wheels. From time to time, in the real
(programming) world, I encounter objects that I wish to compare by
value - this is certainly the case for built-in objects, but is
sometimes the case for more complex objects.

  We may want to associate values with wheels based on their values. For
  example, it's reasonable to suppose that the price of every wheel of
  the same model is the same. In that case, we'll write: price[wheel] =
  25.

 Again, I would not write it this way. I would find

 wheel.price()

Many times the objects are not yours to add attributes, or may have
__slots__ defined. The truth is that I prefer not to add attributes to
external objects even when it's possible.

 most natural. If I have the notion of a price list, then I would
 try to understand what the price list is keyed-by, e.g. model number:

 price[wheel.model] = 25

Sometimes there's no key - it's just the state of the object (what
if wheels don't have a model number?)

  Now again, how will we say that a specific wheel is broken? Like this:
 
  broken[Ref(wheel)] = True

 If I want things to be keyed by identity, I would write

 broken = IdentityDictionary()
 ...
 broken[wheel] = True

 although I would prefer to write

 wheel.broken = True

I personally prefer the first method, but the second one is ok too.

  I think that most objects, especially most user-defined objects, have
  a *value*. I don't have an exact definition, but a hint is that two
  objects that were created in the same way have the same value.

 Here I disagree. Consider the wheel example. I would expect that
 a wheel has a wear level or some such, and that this changes over
 time, and that it belongs to the value of the wheel (value
 being synonym to state). As this changes over time, it is certainly
 not that the object is created with that value.

 Think of lists: what is their value? Are they created with it?

My tounge failed me. I meant: created in the same way = have gone
through the same series of actions. That is:
a = []; a.append(5); a.extend([2,1]); a.pop()
b = []; b.append(5); b.entend([2,1]); b.pop()
a == b

  Sometimes we wish to use the
  identity of objects as a dictionary key or as a set member - and I
  claim that we should do that by using the Ref class, whose *value* is
  the object's *identity*, or by using a dict/set subclass, and not by
  misusing the __hash__ and __eq__ methods.

 I think we should a specific type of dictionary then.
That's OK too. My point was that the one who uses the objects should
explicitly specify whether he means value-based of identity-based
lookup. This means that if an object has a value, it should not make
__eq__ and __hash__ be identity-based just to make identity-based
lookup easier and implicit.

  I think that whenever value-based comparison is meaningful, the __eq__
  and __hash__ should be value-based. Treating objects by identity
  should be done explicitly, by the one who uses the objects, by using
  the is operator or the Ref class. It should not be the job of the
  object to decide which method (value or identity) is more useful - it
  should allow the user to use both methods, by defining __eq__ and
  __hash__ based on value.

 If objects are compared for value equality, the object should decide
 which part of its state goes into that comparison. It may be that
 two objects compare equal even though their state is memberwise
 different:

 Rational(1,2) == Rational(5,10)

I completely agree. Indeed, the value of an object is in many times

Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison

2005-11-27 Thread Noam Raphael
On 11/27/05, Samuele Pedroni [EMAIL PROTECTED] wrote:
 well, this still belongs to comp.lang.python.
...
 not if you think python-dev is a forum for such discussions
 on OO thinking vs other paradigms.

Perhaps my style made it look like a discussion on OO thinking vs
other paradigms, but my conclusion is exactly about the issue of this
thread -
Jim suggested to drop default __hash__ and __eq__ for Python 3K. Guido
decided not to, because it's useful to use them for identity-based
comparison and lookup. I say that I disagree, because I think that
__hash__ and __eq__ should be used for value-based comparison and
lookup, and because if the user of the object does explicit
identity-based comparison/lookup, it doesn't matter to him whether
__hash__ and __eq__ are defined or not. I also suggested, in a way,
that it's OK to define a default value-based __eq__ method.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] For Python 3k, drop default/implicit hash, and comparison

2005-11-26 Thread Noam Raphael
Three weeks ago, I read this and thought, well, you have two options
for a default comparison, one based on identity and one on value, both
are useful sometimes and Guido prefers identity, and it's OK. But
today I understood that I still think otherwise.

In two sentences: sometimes you wish to compare objects according to
identity, and sometimes you wish to compare objects according to
values. Identity-based comparison is done by the is operator;
Value-based comparison should be done by the == operator.

Let's take the car example, and expand it a bit. Let's say wheels have
attributes - say, diameter and manufacturer. Let's say those can't
change (which is reasonable), to make wheels hashable. There are two
ways to compare wheels: by value and by identity. Two wheels may have
the same value, that is, they have the same diameter and were created
by the same manufacturer. Two wheels may have the same identity, that
is, they are actually the same wheel.

We may want to compare wheels based on value, for example to make sure
that all the car's wheels fit together nicely: assert car.wheel1 ==
car.wheel2 == car.wheel3 == car.wheel4. We may want to compare wheels
based on identity, for example to make sure that we actually bought
four wheels in order to assemble the car: assert car.wheel1 is not
car.wheel2 and car.wheel3 is not car.wheel1 and car.wheel3 is not
car.wheel2...

We may want to associate values with wheels based on their values. For
example, it's reasonable to suppose that the price of every wheel of
the same model is the same. In that case, we'll write: price[wheel] =
25. We may want to associate values with wheels based on their
identities. For example, we may want to note that a specific wheel is
broken. For this, I'll first define a general class (I defined it
before in one of the discussions, that's because I believe it's
useful):

class Ref(object):
def __init__(self, obj):
self._obj = obj
def __call__(self):
return self._obj
def __eq__(self, other):
return isinstance(other, ref) and self._obj is other._obj
def __hash__(self):
return id(self._obj) ^ 0xBEEF

Now again, how will we say that a specific wheel is broken? Like this:

broken[Ref(wheel)] = True

Note that the Ref class also allows us to group wheels of the same
kind in a set, regardless of their __hash__ method.

I think that most objects, especially most user-defined objects, have
a *value*. I don't have an exact definition, but a hint is that two
objects that were created in the same way have the same value.
Sometimes we wish to compare objects based on their identity - in
those cases we use the is operator. Sometimes we wish to compare
objects based on their value - and that's what the == operator is for.
Sometimes we wish to use the value of objects as a dictionary key or
as a set member, and that's easy. Sometimes we wish to use the
identity of objects as a dictionary key or as a set member - and I
claim that we should do that by using the Ref class, whose *value* is
the object's *identity*, or by using a dict/set subclass, and not by
misusing the __hash__ and __eq__ methods.

I think that whenever value-based comparison is meaningful, the __eq__
and __hash__ should be value-based. Treating objects by identity
should be done explicitly, by the one who uses the objects, by using
the is operator or the Ref class. It should not be the job of the
object to decide which method (value or identity) is more useful - it
should allow the user to use both methods, by defining __eq__ and
__hash__ based on value.

Please give me examples which prove me wrong. I currently think that
the only objects for whom value-based comparison is not meaningful,
are objects which represent entities which are outside of the
process, or in other words, entities which are not computational.
This includes files, sockets, possibly user-interface objects,
loggers, etc. I think that objects that represent purely data, have
a value that they can be compared according to. Even wheels that
don't have any attributes are simply equal to other wheels, and not
equal to other objects. Since user-defined classes can interact with
the environment only through other objects or functions, it  is
reasonable to suggest that they should get a value-based equality
operator. Many times the value is defined by the __dict__ and
__slots__ members, so it seems to me a reasonable default.

I would greatly appreciate repliers that find a tiny bit of reason in
what I said (even if they don't agree), and not deny it all as a
complete load of rubbish.

Thanks,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-19 Thread Noam Raphael
On 11/19/05, Steven Bethard [EMAIL PROTECTED] wrote:
  You are missing an important point here: There are intentionally no line
  breaks in this string; it must be a single line, or else showerror will
  break it in funny ways. So converting it to a multi-line string would
  break it, dedent or not.

 Only if you didn't include newline escapes, e.g.::

 msg = textwrap.dedent('''\
 IDLE's subprocess can't connect to %s:%d.  This may be due \
 to your personal firewall configuration.  It is safe to \
 allow this internal connection because no data is visible on \
 external ports.''' % address)


Unfortunately, it won't help, since the 'dedent' method won't treat
those spaces as indentation.

But if those messages were printed to the standard error, the line
breaks would be ok, and the use case valid.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-14 Thread Noam Raphael
On 11/14/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 We have to draw a line somewhere - otherwise you could
 just as well add all functions that accept single
 string arguments as methods to the basestring
 sub-classes.

Please read my first post in this thread - I think there's more reason
for 'dedent' to be a string method than there is, for example, for
'expandtabs', since it allows you to write clearer code.

 The point is that the presented use case does not
 originate in a common need (to dedent strings), but
 from a desire to write Python code with embedded
 indented triple-quoted strings which lies in the scope
 of the parser, not that of string objects.

That's a theoretical argument. In practice, if you do it in the
parser, you have two options:
1. Automatically dedent all strings.
2. Add a 'd' or some other letter before the string.

Option 1 breaks backwards compatibility, and makes the parser do
unexpected things. Option 2 adds another string-prefix letter, which
is confusing, and it will also be hard to find out what that letter
means. On the other hand, adding .dedent() at the end is very clear,
and is just as easy.

Now, about performance, please see the message I'll post in a few minutes...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-14 Thread Noam Raphael
Just two additional notes:

On 9/15/05, Raymond Hettinger [EMAIL PROTECTED] wrote:

 -1

 Let it continue to live in textwrap where the existing pure python code
 adequately serves all string-like objects.  It's not worth losing the
 duck typing by attaching new methods to str, unicode, UserString, and
 everything else aspiring to be string-like.

It may seem like the 'dedent' code would have to be written a lot of
times, but I've checked the examples. It may be needed to write
different versions for 'str' and for 'unicode', but these are going to
be unified. In UserString you'll have to add exactly one line:

def dedent(self): return self.data.dedent()

I've just taken the line created for 'isalpha' and replaced 'isalpha'
with 'dedent'. So in the long run, there will be exactly one
implementation of 'dedent' in the Python code. (I don't know of any
other objects which try to provide the full string interface.)

Another reason for prefering a 'dedent' method over a 'dedent'
function in some module, is that it allows sometime in the future to
add an optimization to the compiler, so that it will dedent the string
in compile time (this can't work for a function, since the function is
found in run time). This will solve the performance problem
completely, so that there will be an easy way to write multilined
strings which do not interfere with the visual structure of the code,
without the need to worry about performance. I'm not saying that this
optimization has to be done now, just that 'dedent' as a method makes
it possible, which adds to the other arguments for making it a method.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-14 Thread Noam Raphael
On 11/14/05, Fredrik Lundh [EMAIL PROTECTED] wrote:
 so is putting the string constant in a global variable, outside the scope
 you're in, like you'd do with any other constant.

Usually when I use a constant a single time, I write it where I use
it, and don't give it a name. I don't do:

messagea = The value of A is 

... (a long class definition)
print messagea, A

This is what I mean when I say constant - a value which is known
when I write the code, not necessarily an arbitrary value that may
change, so I write it at the beginning of the program for others to
know it's there.

There's no reason why multilined strings that are used only once
should be defined at the beginning of a program (think about a simple
CGI script, which prints HTML parts in a function.)

 (how about a new rule: you cannot post to a zombie thread on python-
 dev unless they've fixed/reviewed/applied or otherwise processed at least
 one tracker item earlier the same day.  there are hundreds of items on the
 bugs and patches trackers that could need some loving care)

I posted to this thread because it was relevant to a new post about
dedenting strings. Anyway, I looked at bug 1356720 (Ctrl+C for copy
does not work when caps-lock is on), and posted there a very simple
patch which will most probably solve the problem. I also looked at bug
1337987 (IDLE, F5 and wrong external file content. (on error!)). One
problem it raises is that IDLE doesn't have a revert command and
that it doesn't notice if the file was changed outside of IDLE. I am
planning to fix it.

The other problem that is reported in that bug is that exceptions show
misleading code lines when the source file was changed but wasn't
loaded into Python. Perhaps in compiled code, not only the file name
should be written but also its modification time? This way, when
tracebacks print lines of changed files, they can warn if the line
might not be the right line.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-14 Thread Noam Raphael
On 11/15/05, Fredrik Lundh [EMAIL PROTECTED] wrote:
 If you want to write portable code that keeps things running in the
 background while the users hack away at the standard interactive
 prompt, InputHook won't help you.

So probably it should be improved, or changed a bit, to work also on
Windows. Or perhaps it's Tkinter. Anyway, what I'm saying is - don't
use threads! Process events in the main thread while it doesn't run
the user's Python code. If he runs another thread - that's his
problem. The implicit event loop should never execute Python code
while a user's Python code is running in the main thread.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Event loops, PyOS_InputHook, and Tkinter

2005-11-12 Thread Noam Raphael
On 11/13/05, Greg Ewing [EMAIL PROTECTED] wrote:
 Noam Raphael wrote:

  All that is needed to make Tkinter and Michiels'
  code run together is a way to say add this callback to the input
  hook instead of the current replace the current input hook with this
  callback. Then, when the interpreter is idle, it will call all the
  registered callbacks, one at a time, and everyone would be happy.

 Except for those who don't like busy waiting.

I'm not sure I understand what you meant. If you meant that it will
work slowly - a lot of people (including me) are using Tkinter without
a mainloop from the interactive shell, and don't feel the difference.
It uses exactly the method I described.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-05 Thread Noam Raphael
On 11/3/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...

 Right, but lists (dicts, tuples, etc.) are defined as containers, and
 their comparison operation is defined on their contents.  Objects are
 not defined as containers in the general case, so defining comparisons
 based on their contents (as opposed to identity) is just one of the two
 assumptions to be made.

 I personally like the current behavior, and I see no /compelling/ reason
 to change it.  You obviously feel so compelled for the behavior to
 change that you are willing to express your desires.  How about you do
 something more productive and produce a patch which implements the
 changes you want, verify that it passes tests in the standard library,
 then post it on sourceforge.  If someone is similarly compelled and
 agrees with you (so far I've not seen any public support for your
 proposal by any of the core developers), the discussion will restart,
 and it will be decided (not by you or I).

Thanks for the advice - I will try to do as you suggest.


  To summarize, I think that value-based equality testing would usually
  be what you want, and currently implementing it is a bit of a pain.

 Actually, implementing value-based equality testing, when you have a
 finite set of values you want to test, is quite easy.

 def __eq__(self, other):
 for i in self.__cmp_eq__:
 if getattr(self, i) != getattr(other, i):
 return False
 return True

 With a simple metaclass that discovers all of those values automatically,
 and/or your own protocol for exclusion, and you are done.  Remember, not
 all 5-line functions should become builtin/default behavior, and this
 implementation shows that it is not a significant burdon for you (or
 anyone else) to implement this in your own custom library.

You are right that not all 5-line functions should become
builtin/default behaviour. However, I personally think that this one
should, since:
1. It doesn't add complexity, or a new builtin.
2. Those five line doesn't include the metaclass code, which will
probably take more than five lines and won't be trivial.
3. It will make other objects behave better, not only mine - other
classes will get a meaningful comparison operator, for free.

 P.S. One thing that you should remember is that even if your patch is
 accepted, and even if this is desireable, Python 2.5 is supposed to be
 released sometime next year (spring/summer?), and because it is a
 backwards incompatible change, would need at least 2.6-2.7 before it
 becomes the default behavior without a __future__ import, which is
 another 3-4 years down the line.

I hope that the warning can go in by Python 2.5, so the change (which
I think will cause relatively few backwards incompatibility problems)
can go in by Python 2.6, which I think is less than 2 years down the
line.

 I understand you are passionate, really I do (you should see some of my
 proposals), but by the time these things get around to getting into
 mainline Python, there are high odds that you probably won't care about
 them much anymore (I've come to feel that way myself about many of my
 proposals), and I think it is a good idea to attempt to balance - when
 it comes to Python - Now is better than never. and Although never is
 often better than *right* now.

 Removing __hash__, changing __eq__, and trying to get in copy-on-write
 freezing (which is really copy-and-cache freezing), all read to me like
 We gotta do this now!, which certainly isn't helping the proposal.

Thanks - I should really calm down a bit. I will try to go safe and
slowly, and I hope that at the end I will succeed in making my own
small contribution to Python.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-05 Thread Noam Raphael
On 11/5/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...
  1. It doesn't add complexity, or a new builtin.

 It changes default behavior (which I specified as a portion of my
 statement, which you quote.

 And you are wrong, it adds complexity to the implementation of both
 class instantiation and the default comparison mechanism.  The former, I
 believe, you will find more difficult to patch than the comparison,
 though if you have not yet had adventures in that which is writing C
 extension modules, modifying the default class instantiation may be
 the deal breaker for you (I personally would have no idea where to start).

Sorry, I meant complexity to the Python user - it won't require him to
learn more in order to write programs in Python.

 class eqMetaclass(type):
 def __new__(cls, name, bases, dct):
 if '__cmp_include__' in dct:
 include = dict.fromkeys(dct['__cmp_include__'])
 else:
 include = dict.fromkeys(dct.keys)

 for i in dct.get('__cmp_exclude__'):
 _ = include.pop(i, None)

 dct['__cmp_eq__'] = include.keys()
 return type.__new__(cls, name, bases, dct)

 It took 10 lines of code, and was trivial (except for not-included
 multi-metaclass support code, which is being discussed in another thread).

 Oh, I suppose I should modify that __eq__ definition to be smarter about
 comparison...

 def __eq__(self, other):
 if not hasattr(other, '__cmp_eq__'):
 return False
 if dict.fromkeys(self.__cmp_eq__) != \
dict.fromkeys(other.__cmp_eq__):
 return False
 for i in self.__cmp_eq__:
 if getattr(self, i) != getattr(other, i):
 return False
 return True

Thanks for the implementation. It would be very useful in order to
explain my suggestion.

It's nice that it compares only attributes, not types. It makes it
possible for two people to write classes that can be equal to one
another.


 Wow, 20 lines of support code, how could one ever expect users to write
 that? ;)

This might mean that implementing it in C, once I find the right
place, won't be too difficult.

And I think that for most users it will be harder than it was for you,
and there are some subtleties in those lines.


  3. It will make other objects behave better, not only mine - other
  classes will get a meaningful comparison operator, for free.

 You are that the comparison previously wasn't meaningful.  It has a
 meaning, though it may not be exactly what you wanted it to be, which is
 why Python allows users to define __eq__ operators to be exactly what
 they want, and which is why I don't find your uses compelling.

I think that value-based equality testing is a better default, since
in more cases it does what you want it to, and since in those cases
they won't have to write those 20 lines, or download them from
somewhere.

...

 From what I have previously learned from others in python-dev, the
 warnings machinery is slow, so one is to be wary of using warnings
 unless absolutely necessary. Regardless of it being absolutely necessary,
 it would be 2 years at least before the feature would actually make it
 into Python and become default behavior, IF it were desireable default
 behavior.

All right. I hope that those warnings will be ok - it's yet to be
seen. And about those 2 years - better later than never.
...

 You should also realize that you can make contributions to Python
 without changing the language or the implementation of the langauge.
 Read and review patches, help with bug reports, hang out on python-list
 and attempt to help the hundreds (if not thousands) of users who are
 asking for help, try to help new users in python-tutor, etc.

I confess that I don't do these a lot. I can say that I from time to
time teach beginners Python, and that where I work I help a lot of
other people with Python.

 If you
 have an idea for a language change, offer it up on python-list first
 (I've forgotten to do this more often than I would like to admit), and
 if it generally has more cool than ick, then bring it back here.

I will. Thanks again.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-05 Thread Noam Raphael
On 11/6/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...
 
  Sorry, I meant complexity to the Python user - it won't require him to
  learn more in order to write programs in Python.

 Ahh, but it does add complexity.  Along with knowing __doc__, __slots__,
 __metaclass__, __init__, __new__, __cmp__, __eq__, ..., __str__,
 __repr__, __getitem__, __setitem__, __delitem__, __getattr__,
 __setattr__, __delattr__, ...


 The user must also know what __cmp_include__ and __cmp_exclude__ means
 in order to understand code which uses them, and they must understand
 that exclude entries overwrite include entries.

You are right. But that's Python - I think that nobody knows all the
exact details of what all these do. You look in the documentation. It
is a compliation - but it's of the type that I can live with, if
there's a reason.

   Wow, 20 lines of support code, how could one ever expect users to write
   that? ;)
 
  This might mean that implementing it in C, once I find the right
  place, won't be too difficult.
 
  And I think that for most users it will be harder than it was for you,
  and there are some subtleties in those lines.

 So put it in the Python Cookbook:
 http://aspn.activestate.com/ASPN/Cookbook/Python

A good idea.

3. It will make other objects behave better, not only mine - other
classes will get a meaningful comparison operator, for free.
  
   You are that the comparison previously wasn't meaningful.  It has a
   meaning, though it may not be exactly what you wanted it to be, which is
   why Python allows users to define __eq__ operators to be exactly what
   they want, and which is why I don't find your uses compelling.
  
  I think that value-based equality testing is a better default, since
  in more cases it does what you want it to, and since in those cases
  they won't have to write those 20 lines, or download them from
  somewhere.

 You are making a value judgement on what people want to happen with
 default Python. Until others state that they want such an operation as a
 default, I'm going to consider this particular argument relatively
 unfounded.

All right. I will try to collect more examples for my proposal.

   From what I have previously learned from others in python-dev, the
   warnings machinery is slow, so one is to be wary of using warnings
   unless absolutely necessary. Regardless of it being absolutely necessary,
   it would be 2 years at least before the feature would actually make it
   into Python and become default behavior, IF it were desireable default
   behavior.
 
  All right. I hope that those warnings will be ok - it's yet to be
  seen. And about those 2 years - better later than never.

 It won't be OK.  Every comparison using the default operator will incur
 a speed penalty while it checks the (pure Python) warning machinery to
 determine if the warning has been issued yet.  This alone makes the
 transition require a __future__ import.

How will the __future__ statement help? I think that the warning is
still needed, so that people using code that may stop working will
know about it. I see that they can add a __future__ import and see if
it still works, but it will catch much fewer problems, because usually
code would be run without the __future__ import.

If it really slows down things, it seems to me that the only solution
is to optimize the warning module...

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why should the default hash(x) == id(x)?

2005-11-04 Thread Noam Raphael
On 11/3/05, Greg Ewing [EMAIL PROTECTED] wrote:
  3. If someone does want to associate values with objects, he can
  explicitly use id:
  dct[id(x)] = 3.

 This is fragile. Once all references to x are dropped,
 it is possible for another object to be created having
 the same id that x used to have. The dict now
 unintentionally references the new object.

You are right. Please see the simple ref class that I wrote in my
previous post, which solves this problem.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Why should the default hash(x) == id(x)?

2005-11-02 Thread Noam Raphael
Hello,

While writing my PEP about unifying mutable and immutable, I came upon this:

Is there a reason why the default __hash__ method returns the id of the objects?

It is consistent with the default __eq__ behaviour, which is the same
as is, but:

1. It can easily become inconsistent, if someone implements __eq__ and
doesn't implement __hash__.
2. It is confusing: even if someone doesn't implement __eq__, he may
see that it is suitable as a key to a dict, and expect it to be found
by other objects with the same value.
3. If someone does want to associate values with objects, he can
explicitly use id:
dct[id(x)] = 3. This seems to better explain what he wants.


Now, I just thought of a possible answer: because he wants to store
in his dict both normal objects and objects of his user-defined type,
which turn out to be not equal to any other object.

This leads me to another question: why should the default __eq__
method be the same as is? If someone wants to check if two objects
are the same object, that's what the is operator is for. Why not
make the default __eq__ really compare the objects, that is, their
dicts and their slot-members?

I would be happy to get answers.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was:PEP 351, the freeze protocol)

2005-11-02 Thread Noam Raphael
Thank you for your encouraging words!

I am currently working on a PEP. I am sure that writing it is a good
idea, and that it would help with explaining this idea both to others
and to myself.

What I already wrote makes me think that it can be accomplished with
no really large changes to the language - only six built-in types are
affected, and there is no reason why existing code, both in C and in
Python, would stop working.

I hope others would be interested in the idea too, when I finish
writing the PEP draft, so it would be discussed. Trying the idea with
PyPy is a really nice idea - it seems that it would be much simpler to
implement, and I'm sure that learning PyPy would be interesting.

Thanks again, and I would really like to hear your comments when I
post the PEP draft,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why should the default hash(x) == id(x)?

2005-11-02 Thread Noam Raphael
On 11/2/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...

 A quick search in the list archives via google search
site:mail.python.org object __hash__
 Says that Guido wanted to remove the default __hash__ method for object
 in Python 2.4, but that never actually happened.

 http://www.python.org/sf/660098
 http://mail.python.org/pipermail/python-dev/2003-December/041375.html

 There may be more code which relies on the default behavior now, but
 fixing such things is easy.

Cool! If Guido also thinks that it should be gone, who am I to argue...

(Seriously, I am in favor of removing it. I really think that it is confusing.)

And if backwards-compatibility is a problem: You can, in Python 2.5,
show a warning when the default __hash__ method is being called,
saying that it is going to disappear in Python 2.6.

[Snip - I will open a new thread about the equality operator]

 As for removing the default __hash__ for objects, I'm actually hovering
 around a -0, if only because it is sometimes useful to generate unique
 keys for dictionaries (which can be done right now with object() ), and
 I acknowledge that it would be easy to subclass and use that instead.

I can suggest a new class, that will help you in the cases that you do
want a dict of identities:

class ref(object):
def __init__(self, obj):
self._obj = obj
def __call__(self):
return self._obj
def __eq__(self, other):
return self._obj is other._obj
def __hash__(self):
return hash(id(self._obj))

It has the advantage over using ids as keys, that it saves a reference
to the object, so it won't be killed.

It lets you make a dict of object identities just as easily as before,
in a more explicit and error-prone way. Perhaps it should become a
builtin?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-02 Thread Noam Raphael
I think it should.

(I copy here messages from the thread about the default hash method.)

On 11/2/05, Michael Chermside [EMAIL PROTECTED] wrote:
  Why not make the default __eq__ really compare the objects, that is,
  their dicts and their slot-members?

 Short answer: not the desired behavior. Longer answer: there are
 three common patterns in object design. There are value objects,
 which should be considered equal if all fields are equal. There are
 identity objects which are considered equal only when they are
 the same object. And then there are (somewhat less common) value
 objects in which a few fields don't count -- they may be used for
 caching a pre-computed result for example. The default __eq__
 behavior has to cater to one of these -- clearly either value
 objects or identity objects. Guido chose to cater to identity
 objects believing that they are actually more common in most
 situations. A beneficial side-effect is that the default behavior
 of __eq__ is QUITE simple to explain, and if the implementation is
 easy to explain then it may be a good idea.

This is a very nice observation. I wish to explain why I think that
the default __eq__ should compare values, not identities.

1. If you want to compare identities, you can always use is. There
is currently no easy way to compare your user-defined classes by
value, in case they happen to be value objects, in Michael's
terminology - you have to compare every single member. (Comparing the
__dict__ attributes is ugly, and will not always work). If the default
were to compare the objects by value, and they happen to be identity
objects, you can always do:
def __eq__(self, other):
return self is other

2. I believe that counter to what Michael said, value objects are
more common than identity objects, at least when talking about
user-defined classes, and especially when talking about simple
user-defined classes, where the defaults are most important, since the
writer wouldn't care to define all the appropriate protocols. (this
was a long sentence) Can you give examples of common identity
objects? I believe that they are usually dealing with some
input/output, that is, with things that interact with the environment
(files, for example). I believe almost all algorithmic classes are
value objects. And I think that usually, comparison based on value
will give the correct result for identity objects too, since if they
do I/O, they will usually hold a reference to an I/O object, like
file, which is an identity object by itself. This means that the
comparison will compare those objects, and return false, since the I/O
objects they hold are not the same one.

3. I think that value-based comparison is also quite easy to explain:
user-defined classes combine functions with a data structure. In
Python, the data structure is simply member names which reference
other objects. The default, value-based, comparison, checks if two
objects have the same member names, and that they are referencing
equal (by value) objects, and if so, returns True. I think that
explaining this is not harder than explaining the current dict
comparison.


Now, for Josiah's reply:

On 11/2/05, Josiah Carlson [EMAIL PROTECTED] wrote:
  This leads me to another question: why should the default __eq__
  method be the same as is? If someone wants to check if two objects
  are the same object, that's what the is operator is for. Why not
  make the default __eq__ really compare the objects, that is, their
  dicts and their slot-members?

 Using 'is' makes sense when the default hash is id (and actually in
 certain other cases as well). Actually comparing the contents of an
 object is certainly not desireable with the default hash, and probably
 not desireable in the general case because equality doesn't always
 depend on /all/ attributes of extension objects.

Explicit is better than implicit.
In the face of ambiguity, refuse the temptation to guess.

I hope that the default hash would stop being id, as Josiah showed
that Guido decided, so let's don't discuss it.

Now, about the good point that sometimes the state doesn't depend on
all the attributes. Right. But the current default doesn't compare
them well too - you have no escape from writing an equality operator
by yourself. And I think this is not the common case.

I think that the meaning of in the face of ambiguity, refuse the
temptation to guess is that you should not write code that changes
its behaviour according to what the user will do, based on your guess
as to what he meant. This is not the case - the value-based comparison
is strictly defined. It may just not be what the user would want - and
in most cases, I think it will.

Explicit is better than implicit says only better. identity-based
comparison is just as implicit as value-based comparison.

(I want to add that there is a simple way to support value-based
comparison when some members don't count, by writing a metaclass that
will check if your class 

Re: [Python-Dev] Should the default equality operator compare values instead of identities?

2005-11-02 Thread Noam Raphael
I've looked for classes in my /usr/lib/python2.4 directory. I won't go
over all the 7346 classes that were found there, but let's see:

identity objects that will continue to work because they contain
other identity objects

SocketServer, and everything which inherits from it (like HTTPServer)
Queue
csv (contains _csv objects)

value objects that would probably gain a meaningful equality operator

StringIO
ConfigParser
markupbase, HTMLParser
HexBin, BinHex
cgi.FieldStorage
AST Nodes

others
==
Cookie - inherits from dict its __eq__ method.

I'll stop here. I was not strictly scientific, because I chose classes
that I thought that I might guess what they do easily, and perhaps
discarded classes that didn't look interesting to me. But I didn't
have any bad intention when choosing the classes. I have seen no class
that the change would damage its equality operator. I have seen quite
a lot of classes which didn't define an equality operator, and that a
value-based comparison would be the right way to compare them.

I'm getting more convinced in my opinion.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Should the default equality operator compare valuesinstead of identities?

2005-11-02 Thread Noam Raphael
On 11/2/05, Raymond Hettinger [EMAIL PROTECTED] wrote:
  Should the default equality operator compare valuesinstead of
 identities?

 No.  Look back into last year's python-dev postings where we agreed that
 identity would always imply equality.  There were a number of practical
 reasons.  Also, there are a number of places in CPython where that
 assumption is implicit.

Perhaps you've meant something else, or I didn't understand? Identity
implying equality is true also in value-based comparison. If the
default __eq__ operator compares by value, I would say that it would
do something like:

def __eq__(self, other):
if self is other:
return True
if type(self) is not type(other):
return False
(compare the __dict__ and any __slots__, and if they are all ==,
return True.)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)

2005-11-01 Thread Noam Raphael
On 11/1/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...

 I am an advocate for PEP 351.  However, I am against your proposed
 implementation/variant of PEP 351 because I don't believe it ads enough
 to warrant the additional complication and overhead necessary for every
 object (even tuples would need to get a .frozen_cache member).

 Give me a recursive freeze from PEP 351 (which handles objects that are
 duplicated, but errors out on circular references), and I'll be happy.

That's fine - but it doesn't mean that I must be happy with it.

...
 
  This isn't correct - freezing a set won't require a single copy to be
  performed, as long as the frozen copy isn't saved after the original
  is changed. Copy+cache always requires one copy.

 You are wrong, and you even say you are wrong...freezing a set doesn't
 require a COPY, IF the frozen COPY isn't saved after the original is
 CHANGED. Creating an immutable set IS CREATING A COPY, so it ALSO
 copies, and you admit as much, but then say the equivalent of copying
 isn't copying because I say so.

No, I am not wrong. I am just using misleading terms. I will call a
frozen copy a frozen image. Here it goes: freezing a set doesn't
require a COPY, IF the frozen IMAGE isn't saved after the original is
CHANGED. I suggest that there would be a way to create a frozenset
without COPYING an O(n) amount of MEMORY. When a frozen set is created
by a call frozen(x), it would not copy all the data, but would rather
reference the existing data, which was created by the non-frozen set.
Only if the original set changes, when there's a frozen set
referencing the data, the MEMORY would be actually copied.

I call it a frozen copy because it behaves as a frozen copy, even
though not all the memory is being copied. When you call the COPY
function in the COPY module with a string, it doesn't really copy
memory - the same string is returned. When you copy a file inside
subversion, it doesn't actually copy all the data associated with it,
but does something smarter, which takes O(1). The point is, for the
user, it's a copy. Whether or not memory is actually being copied, is
an implementation detail.

...

 I think that adding an additional attribute to literally every single
 object to handle the caching of 'frozen' objects, as well as a list to
 every object to handle callbacks which should be called on object
 mutation, along with a _call_stuff_when_mutated() method that handles
 these callback calls, IN ADDITION TO the __freeze__ method which is
 necessary to support this, is a little much, AND IS CERTAINLY NOT A
 SIMPLIFICATION!

I don't agree. You don't need to add a list to every object, since you
can store all those relations in one place, with a standard function
for registering them. Anyway, code written in Python (which is the
language we are discussing) WON'T BE COMPLICATED! The frozen
mechanism, along with two new protocols (__frozen__ and __changed__),
would be added automatically! The internal state of a class written in
Python can be automatically frozen, since it's basically a dict. Now
let's see if it's a simplification:

1. No Python code would have to be made more complicated because of the change.
2. There would be no need to find workarounds, like cStringIO, for the
fact that strings and tuples are immutable.
3. You would be able to put any kind of object into a set, or use it
as a dict key.
4. Classes (like the graph example) would be able to give users things
without having to make a choice between risking their users with
strange bugs, making a complicated interface, making very inefficient
methods, and writing complicated wrapper classes.

I will ask you: Is this a complication?
The answer is: it requires a significent change of the CPython
implementation. But about the Python language: it's definitely a
simplification.

 Let us pause for a second and consider:
 Original PEP proposed 1 new method: __freeze__, which could be
 implemented as a subclass of the original object (now), and integrated
 into the original classes as time goes on.  One could /register/
 __freeze__ functions/methods a'la Pickle, at which point objects
 wouldn't even need a native freeze method.

 Your suggestion offers 2 new methods along with 2 new instance variables.
 Let's see, a callback handler, __freeze__, the cache, and the callback
 list.  Doesn't that seem a little excessive to you to support freezing?
 It does to me.  If Guido were to offer your implementation of freeze, or
 no freeze at all, I would opt for no freeze, as implementing your freeze
 on user-defined classes would be a pain in the ass, not to mention
 implementing them in C code would be more than I would care to do, and
 more than I would ask any of the core developers to work on.

As I said above: this suggestion would certainly require more change
in the Python implementation than your suggestion. But the Python
language would gain a lot more. Implementing my frozen on user-defined
classes would not be a pain 

Re: [Python-Dev] a different kind of reduce...

2005-11-01 Thread Noam Raphael
On 11/1/05, Reinhold Birkenfeld [EMAIL PROTECTED] wrote:
  Hmm, using the function's own namespace is an interesting idea.  It
  might also be a good place to put other functionals:
 
 results = f.map(data)
 newf = f.partial(somearg)

 And we have solved the map, filter and reduce are going away! Let's all
 weep together problem with one strike!

 Reinhold

I have no problems with map and filter goint away. About reduce -
please remember that you need to add this method to any callable,
including every type (I mean the constructor). I am not sure it is a
good trade for throwing away one builting, which is a perfectly
reasonable function.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)

2005-11-01 Thread Noam Raphael
On 11/1/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...

 I still consider it dead.
If the implementation is hard to explain, it's a bad idea.

It is sometimes true, but not always. It may mean two other things:
1. The one trying to explain is not talented enough.
2. The implementation is really not very simple. A hash table, used so
widely in Python, is really not a simple idea, and it's not that easy
to explain.

 Also, not all user-defined classes have a __dict__, and not all
 user-defined classes can have arbitrary attributes added to them.

 c class foo(object):
 ... __slots__ = ['lst']
 ... def __init__(self):
 ... self.lst = []
 ...
  a = foo()
  a.bar = 1
 Traceback (most recent call last):
  File stdin, line 1, in ?
 AttributeError: 'foo' object has no attribute 'bar'
 
It doesn't matter. It only means that the implementation would have to
make frozen copies also of __slots__ items, when freezing a
user-defined class.

I am afraid that this question proves that I didn't convey my idea to
you. If you like, please forgive my inability to explain it clearly,
and try again to understand my idea, by going over what I wrote again,
and thinking on it. You can also wait for the PEP that I intend to
write. And you can also forget about it, if you don't want to bother
with it - you've already helped a lot.

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Noam Raphael
Hello,

I have slept quite well, and talked about it with a few people, and I
still think I'm right.

About the users-changing-my-internal-data issue:

 Again, user semantics.  Tell your users not to modify entries, and/or
 you can make copies of objects you return.  If your users are too daft
 to read and/or follow directions, then they deserve to have their
 software not work.
...
 When someone complains that something doesn't work, I tell them to read
 the documentation.  If your users haven't been told to RTFM often enough
 to actually make it happen, then you need a RTFM-bat. Want to know how
 you make one?  You start wrapping the objects you return which segfaults
 the process if they change things. When they start asking, tell them it
 is documented quite clearly do not to modify objects returned, or else.
 Then there's the other option, which I provide below.

I disagree. I read the manual when I don't know what something does.
If I can guess what it does (and this is where conventions are good),
I don't read the manual. And let's say I ought to read the complete
manual for every method that I use, and that I deserve a death
punishment (or a segmentation fault) if I don't. But let's say that I
wrote a software, without reading the manual, and it worked. I have
gone to work on other things, and suddenly a bug arises. When the poor
guy who needs to fix it goes over the code, everything looks
absolutely correct. Should he also read the complete manuals of every
library that I used, in order to fix that bug? And remember that in
this case, the object could have traveled between several places
(including functions in other libraries), before it was changed, and
the original data structure starts behaving weird.

You suggest two ways for solving the problem. The first is by copying
my mutable objects to immutable copies:

 Also from the sounds of it, you are storing both source and destination
 values in the same table...hrm, that sounds quite a bit like a
 spreadsheet.  How does every spreadsheet handle that again?  Oh yeah,
 they only ever store immutables (generally strings which are interpreted).
 But I suppose since you are (of course) storing mutable objects, you
 need to work a bit harder...so store mutables, and return immutable
 copies (which you can cache if you want, and invalidate when your
 application updates the results...like a wx.Grid update on changed).

This is basically ok. It's just that in my solution, for many objects
it's not necessary to make a complete copy just to prevent changing
the value: Making frozen copies of objects which can't reference
nonfrozen objects (sets, for example), costs O(1), thanks to the
copy-on-write.

Now, about the graph:

 So let me get this straight: You've got a graph.  You want to be able to
 change the graph, but you don't want your users to accidentally change
 the graph. Sounds to me like an API problem, not a freeze()/mutable problem.
 Want an API?

 class graph:
...
def IterOutgoing(self, node):
...
def IterIncoming(self, node):
...
def IsAdjacent(self, node1, node2):
...
def IterNodes(self):
...
def AddEdge(self, f_node, t_node):
...
def RemEdge(self, node1, node2):
...
def AddNode(self):
...

 If you are reasonable in your implementation, all of the above
 operations can be fast, and you will never have to worry about your
 users accidentally mucking about with your internal data structures:
 because you aren't exposing them.  If you are really paranoid, you can
 take the next step and implement it in Pyrex or C, so that only a
 malicous user can muck about with internal structures, at which point
 you stop supporting them.

This will work. It's simply... well, not very beautiful. I have to
invent a lot of names, and my users need to remember them all. If I
give them a frozen set, with all the vertices than a vertex points to
(which is an absolutely reasonable API), they will know how to deal
with it without learning a lot of method names, thanks to the fact
that they are already familiar with sets, and that a lot of thought
has gone into the set interface.

  Now, about copy-on-write:
...

 What you have written here is fairly unintelligible, but thankfully you
 clarify yourself...pity it still doesn't work, I explain below.

I'm sorry if I am sometimes hard to understand. English is not my
mother tongue, and it degrades as the hour gets later - and sometimes,
things are hard to explain. If I don't explain myself, please say so
and I'll try again. This is an excellent example - I wrote about
callbacks, and went to sleep. Let me try to explain again how it
*does* work.

The frozen() function, and the __frozen__ protocol, would get another
optional argument - an object to be notified when the *nonfrozen*
object has changed. It may be called at most once - only on the first
change to the object, and only if the object which requested to be
notified is still 

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Noam Raphael
On 10/31/05, Josiah Carlson [EMAIL PROTECTED] wrote:
...
 And I'm going to point out why you are wrong.

I still don't think so. I think that I need to reclarify what I mean.

  About the users-changing-my-internal-data issue:
...
 You can have a printout before it dies:
 I'm crashing your program because something attempted to modify a data
 structure (here's the traceback), and you were told not to.

 Then again, you can even raise an exception when people try to change
 the object, as imdict does, as tuples do, etc.

Both solutions would solve the problem, but would require me to wrap
the built-in set with something which doesn't allow changes. This is a
lot of work - but it's quite similiar to what my solution would
actually do, in a single built-in function.

  You suggest two ways for solving the problem. The first is by copying
  my mutable objects to immutable copies:

 And by caching those results, then invalidating them when they are
 updated by your application.  This is the same as what you would like to
 do, except that I do not rely on copy-on-write semantics, which aren't
 any faster than freeze+cache by your application.

This isn't correct - freezing a set won't require a single copy to be
performed, as long as the frozen copy isn't saved after the original
is changed. Copy+cache always requires one copy.

...
 I never claimed it was beautiful, I claimed it would work.  And it does.
 There are 7 methods, which you can reduce if you play the special method
 game:

 RemEdge - __delitem__((node, node))
 RemNode - __delitem__(node) #forgot this method before
 IterNodes - __iter__()
 IterOutgoing,IterIncoming - IterAdjacent(node)

I just wanted to say that this game is of course a lot of fun, but it
doesn't simplify the interface.

 In any case, whether you choose to use freeze, or use a different API,
 this particular problem is solvable without copy-on-write semantics.

Right. But I think that a significant simplification of the API is a
nice bonus for my solution. And about those copy-on-write semantics -
it should be proven how complex they are. Remember that we are talking
about frozen-copy-on-write, which I think would simplify matters
considerably - for example, there are at most two instances sharing
the same data, since the frozen copy can be returned again and again.

Now, about copy-on-write:
  ...
 Thank you for the clarification (btw, your english is far better than
 any of the foreign languages I've been taught over the years).
Thanks! It seems that if you are forced to use a language from time to
time it improves a little...

...

 Even without validation, there are examples that force a high number of
 calls, which are not O(1), ammortized or otherwise.

[Snap - a very interesting example]

 Now, the actual time analysis on repeated freezings and such gets ugly.
 There are actually O(k) objects, which take up O(k**2) space.  When you
 modify object b[i][j] (which has just been frozen), you get O(k)
 callbacks, and when you call freeze(b), it actually results in O(k**2)
 time to re-copy the O(k**2) pointers to the O(k) objects.  It should be
 obvious that this IS NOT AMMORTIZABLE to original object creation time.

That's absolutely right. My ammortized analysis is correct only if you
limit yourself to cases in which the original object doesn't change
after a frozen() call was made. In that case, it's ok to count the
O(k**2) copy with the O(k**2) object creation, because it's made only
once.

Why it's ok to analyze only that limited case? I am suggesting a
change in Python: that every object you would like be mutable, and
would support the frozen() protocol. When you evaluate my suggestion,
you need to take a program, and measure its performance in the current
Python and in a Python which implements my suggestion. This means that
the program should work also on the current Python. In that case, my
assumption is true - you won't change objects after you have frozen
them, simply because these objects (strings which are used as dict
keys, for example) can't be changed at all in the current Python
implementation!

I will write it in another way: I am proposing a change that will make
Python objects, including strings, mutable, and gives you other
advantages as well. I claim that it won't make existing Python
programs run slower in O() terms. It would allow you to do many things
that you can't do today; some of them would be fast, like editing a
string, and some of them would be less fast - for example, repeatedly
changing an object and freezing it.

I think that the performance penalty may be rather small - remember
that in programs which do not change strings, there would never be a
need to copy the string data at all. And since I think that usually
most of the dict lookups are for method or function names, there would
almost never be a need to constuct a new object on dict lookup,
because you search for the same names again and again, and a new
object is created only on the 

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-31 Thread Noam Raphael
I thought about something -

 I think that the performance penalty may be rather small - remember
 that in programs which do not change strings, there would never be a
 need to copy the string data at all. And since I think that usually
 most of the dict lookups are for method or function names, there would
 almost never be a need to constuct a new object on dict lookup,
 because you search for the same names again and again, and a new
 object is created only on the first frozen() call. You might even gain
 performance, because s += x would be faster.

Name lookups can take virtually the same time they take now - method
names can be saved from the beginning as frozen strings, so finding
them in a dict will take just another bit test - is the object frozen
- before doing exactly what is done now. Remember, the strings we are
familiar with are simply frozen strings...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-30 Thread Noam Raphael
Hello,

It seems that we both agree that freezing is cool (-; . We disagree on
whether a copy-on-write behaviour is desired. Your arguments agains
copy-on-write are:
1. It's not needed.
2. It's complicated to implement.

But first of all, you didn't like my use cases. I want to argue with that.

 In reading your description of a 'table of values', I can't help but be
 reminded of the wxPython (and wxWidget) wx.Grid and its semantics.  It
 offers arbitrary tables of values (whose editors and viewers you can
 change at will), which offers a mechanism by which you can listen to
 changes that occur to the contents of a cell.  I can't help but think
 that if you offered a protocol by which a user can signal that a cell
 has been changed, perhaps by writing the value to the table itself
 (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351
 freeze), etc., that both you and the users of your table would be much
 happier.

Perhaps I didn't make it clear. The difference between wxPython's Grid
and my table is that in the table, most values are *computed*. This
means that there's no point in changing the values themselves. They
are also used frequently as set members (I can describe why, but it's
a bit complicated.)

I want to say that even if sets weren't used, the objects in the table
should have been frozen. The fact the sets (and dicts) only allow
immutable objects as members/keys is just for protecting the user.
They could have declared, you shouldn't change anything you insert -
as long as you don't, we'll function properly. The only reason why
you can't compute hash values of mutable objects is that you don't
want your user to make mistakes, and make the data structure
inconsistent.

 As for the graph issue, you've got a bigger problem than users just
 being able to edit edge lists, users can clear the entire dictionary of
 vertices (outgoing.clear()).  It seems to me that a more reasonable
 method to handle this particular case is to tell your users don't
 modify the dictionaries or the edge lists, and/or store your edge lists
 as tuples instead of lists or dictionaries, and/or use an immutable
 dictionary (as offered by Barry in the PEP).

As I wrote before, telling my users don't modify the edge lists is
just like making lists hashable, and telling all Python users, dont
modify lists that are dictionary keys. There's no way to tell the
users that - there's no convention for objects which should not be
changed. You can write it in the documentation, but who'll bother
looking there?

I don't think that your other suggestions will work: the data
structure of the graph itself can't be made of immutable objects,
because of the fact that the graph is a mutable object - you can
change it. It can be made of immutable objects, but this means copying
all the data every time the graph changes.


Now, about copy-on-write:

 There's also this little issue of copy on write semantics with Python.
 Anyone who tells you that copy on write is easy, is probably hanging
 out with the same kind of people who say that threading is easy.  Of
 course both are easy if you limit your uses to some small subset of
 interesting interactions, but copy on write gets far harder when you
 start thinking of dictionaries, lists, StringIOs, arrays, and all the
 possible user-defined classes, which may be mutated beyond obj[key] =
 value and/or obj.attr = value (some have obj.method() which mutates the
 object). As such, offering a callback mechanism similar to weak
 references is probably pretty close to impossible with CPython.

Let's limit ourselves to copy-on-write of objects which do not contain
nonfrozen objects. Perhaps it's enough - the table, the graph, and
strings, are perfect examples of these. Implementation doesn't seem to
complicated to me - whenever the object is about to change, and there
is a connected frozen copy, you make a shallow copy of the object,
point the frozen copy to it, release the reference to the frozen copy,
and continue as usual. That's all.

I really think that this kind of copy-on-write is correct. The
temporary freezing of sets in order to check if they are members of
other sets is a not-very-nice way of implementing it. This kind of
copy-on-write would allow, in principle, for Python strings to become
mutable, with almost no speed penalty. It would allow my table, and
other containers, to automatically freeze the objects that get into
it, without having to trust the user on not changing the objects - and
remember that there's no way of *telling* him not to change the
objects.

Now, the computer scientist in me wants to explain (and think about)
freezing containers of nonfrozen objects. What I actually want is that
as long as an object doesn't change after it's freezed, the cost of
freezing would be nothing - that is, O(1). Think about a mutable
string object, which is used in the same way as the current, immutable
strings. It is constructed once, and then may be used as a key in a

Re: [Python-Dev] PEP 351, the freeze protocol

2005-10-29 Thread Noam Raphael
Hello,

I have thought about freezing for some time, and I think that it is a
fundamental need - the need to know, sometimes, that objects aren't
going to change.

This is mostly the need of containers. dicts need to know that the
objects that are used as keys aren't going to change, because if they
change, their hash value changes, and you end up with a data structure
in an inconsistent state. This is the need of sets too, and of heaps,
and binary trees, and so on.

I want to give another example: I and my colleges designed something
which can be described as an electronic spreadsheet in Python. We
called it a table. The values in the table are Python objects, and
the functions which relate them are written in Python. Then comes the
problem: the user has, of course, access to the objects stored in the
table. What would happen if he changes them? The answer is that the
table would be in an inconsistent state, since something which should
be the return value of a function is now something else, and there's
no way for the table to know about that.

The solution is to have a freeze protocol. It may be called frozen
(like frozen(set([1,2,3]))), so that it will be clear that it does not
change the object itself. The definition of a frozen object is that
its value can't change - that is, if you compare it with another
object, you should get the same result as long as the other object
hasn't changed. As a rule, only frozen objects should be hashable.

I want to give another, different, use case for freezing objects. I
once thought about writing a graph package in Python - I mean a graph
with vertices and edges. The most obvious way to store a directed
graph is as a mapping (dict) from a node to the set of nodes that it
points to. Since I want to be able to find also which nodes point to a
specific node, I will store another mapping, from a node to the set of
nodes that point to it. Now, I want a method of the graph which will
return the set of nodes that a given node points to, for example to
let me write if y in graph.adjacent_nodes(x) then. The question is,
what will the adjacent_nodes method return? If it returns the set
which is a part of the data structure, there is nothing (even no
convention!) that will prevent the user from playing with it. This
will corrupt the data structure, since the change won't be recorded in
the inverse mapping. adjacent_nodes can return a copy of the set, it's
a waste if you only want to check whether an object is a member of the
set.

I gave this example to say that the frozen protocol should (when
possible) return an object which doesn't really contain a copy of the
data, but rather gives an image of the original object. If the
original object changes while there are frozen copies of it, the data
will be copied, and all the frozen objects will then reference a
version of the data that will never change again.

This will solve the graph problem nicely - adjacent_nodes would simply
return a frozen copy of the set, and a copy operation would happen
only in the rare cases when the returned set is being modified. This
would also help the container use cases: they may call the frozen()
method on objects that should be inserted into the container, and
usually the data won't be copied. Some objects can't be created in
their final form, but can only be constructed step after step. This
means that they must be non-frozen objects. Sometimes they are
constructed in order to get into a container. Unless the frozen()
method is copy-on-change the way I described, all the data would have
to be copied again, just for the commitment that it won't change.

I don't mean to frighten, but in principle, this may mean that
immutable strings might be introduced, which will allow us to get rid
of all the cStringIO workarounds. Immutable strings would be
constructed whenever they are needed, at a low performance cost
(remember that a frozen copy of a given object has to be constructed
only once - once it has been created, the same object can be returned
on additional frozen() calls.)

Copy-on-change of containers of non-frozen objects requires additional
complication: it requires frozen objects to have a way for setting a
callback that will be called when the original object was changed.
This is because the change makes the container of the original object
change, so it must drop its own frozen copy. This needs to happen only
once per frozen object, since after a change, all the containers drop
their frozen copies. I think this callback is conceptually similar to
the weakref callback.

Just an example that copy-on-change (at least of containers of frozen
objects) is needed: sets. It was decided that you can test whether a
non-frozen set is a member of a set. I understand that it is done by
temporarily freezing the set, and that it caused some threading
issues. A copy-on-change mechanism might solve it more elegantly.

What do you think?

Noam
___
Python-Dev 

Re: [Python-Dev] [Python-checkins] commit of r41352 - in python/trunk: . Lib Lib/distutils Lib/distutils/command Lib/encodings

2005-10-29 Thread Noam Raphael
 That might be reasonable. I just noticed that it is convenient to do

 svn propset svn:ignore -F .cvsignore .

 Without a file, I wouldn't know how to edit the property, so I would
 probably do

 svn propget svn:ignore .  ignores
 vim ignores
 svn propset svn:ignore -F ignores .
 rm ignores


Won't svn propedit svn:ignore . do the trick?

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] IDLE development

2005-09-10 Thread Noam Raphael
Hello,

More than a year and a half ago, I posted a big patch to IDLE which
adds support for completion and much better calltips, along with some
other improvements. Since then, I had some mail conversations with
Kurt B. Kaiser, who is responsible for IDLE, which resulted in
nothing. My last mail, from Jul 10, saying (with more details) I made
the minor changes you asked for, let's get it in, it's not very
complicated was unanswered.

This is just an example of the fact that IDLE development was
virtually nonexistent in the last months, because most patches were
simply ignored.

I and my colleges use IDLE intensively - that is, a heavily patched
IDLE. It includes my patch and many other improvements made by me and
my friends.

The improved IDLE is MUCH better than the standard IDLE, especially
for interactive work. Since we would like to share our work with the
rest of the world, if nothing is changed we would start a new IDLE
fork soon, perhaps at python-hosting.com.

I really don't like that - maintaining a fork requires a lot of extra
work, and it is certain that many more people will enjoy our work if
it integrated in the standard Python distribution. But sending patches
and watching them stay open despite a continuous nagging is worse.

Please, either convince KBK to invest more time in IDLE development,
or find someone else who would take care of it. If you like, I would
happily help in the development.

I hope I am not sounding offensive. It's actually quite simple: if the
excellent development environment IDLE can't develop inside standard
Python, it should be developed outside it. As I said, I prefer the
first option.

Have a good week,
Noam Raphael
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] IDLE development

2005-09-10 Thread Noam Raphael
On 9/11/05, Guido van Rossum [EMAIL PROTECTED] wrote:
 On 9/10/05, Noam Raphael [EMAIL PROTECTED] wrote:
  I and my colleges use IDLE intensively - that is, a heavily patched
  IDLE. It includes my patch and many other improvements made by me and
  my friends.
 
  The improved IDLE is MUCH better than the standard IDLE, especially
  for interactive work.
 
 Could it be that this is a rather subjective judgement? It wouldn't be
 the first time that someone pushing for their personal set of
 functionality changes is overlooking the needs of other user groups.
 
I don't think so, since:
1. These are added features, not functionality changes.
2. There are quite a lot of people using the improved IDLE where I
work, and I never heard anyone saying he prefers the standard IDLE -
on the contrary, many are asking how they can use the improved IDLE in
their homes.
3. Kurt agreed to integrate the change - he just didn't do it.

  Since we would like to share our work with the
  rest of the world, if nothing is changed we would start a new IDLE
  fork soon, perhaps at python-hosting.com.
 
 I have no problem with this. You might be able to save yourself some
 maintenance work by structuring your version as a set of subclasses
 rather than a set of patches (even if you distribute it as a complete
 working program). Many people have needs that aren't met by standard
 Python; they write their own modules or extensions and distribute
 these independently from Python; your case probably isn't all that
 different.
 
I think that rewriting the patches as subclasses will be a lot of
work, and won't be a very good idea - if you change one line in a
function, copy-pasting it to a subclass and changing the line seems a
little weird for me - not to mention the cases where some refactoring
needs to be done. I think we will be talking about a separate package
- say, idleforklib instead of idlelib. You can always run diff to find
the differences between the two packages.

 Often the needs of certain user groups and the development speeds of
 such 3rd party modules are so different that it simply doesn't make
 sense to fold them in the Python distribution anyway -- consider what
 you would have to do if Kurt accepted your patches: you'll still have
 to wait until Python 2.5 is released before others can benefit from
 your changes, and if you come up with an improvement after that
 release, your next chance will be 18 months later...
 
I don't think so - if IDLE is developed on the Python CVS, we can
still distribute a stand-alone package with IDLE from the CVS head,
for eager people. All others will get the changes a year later, which
isn't that bad. Perhaps it can even be less than a year - since IDLE
is a GUI application and not a library, so there isn't a lot of
backward compatibility to maintain, it seems to me that updated
versions can be shipped also with new minor versions of Python.

The advantages of developing IDLE on the Python CVS are that there is
no need to synchronize two versions, and a wider audience. Of course,
after you see the improved IDLE you will surely decide to immediately
import it into the Python CVS, so there's not much of a problem... :)

Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split MIME headers into multiple lines near a space

2005-06-05 Thread Noam Raphael
On 5/30/05, Nick Coghlan [EMAIL PROTECTED] wrote:
 Noam's suggestion seems reasonable to me, but I'm not
 sure what the performance implications are.

I think that they are not critical. The number of lines can grow by at
most twice, because shorter words would not have a line of their own.
The added rfind call seems not very significant to me, since it is
preceded by about log2n string encodings, to test if an encoded prefix
fits the required line length.

Have a good day,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-20 Thread Noam Raphael
Hello,

I would like to add here another small thing which I encountered this
week, and seems to follow the same logic as does Guido's proposal.

It's about staticmethods. I was writing a class, and its
pretty-printing method got a function for converting a value to a
string as an argument. I wanted to supply a default function. I
thought that it should be in the namespace of the class, since its
main use lies there. So I made it a staticmethod.

But - alas! After I declared the function a staticmethod, I couldn't
make it a default argument for the method, since there's nothing to do
with staticmethod instances.

The minor solution for this is to make staticmethod objects callable.
This would solve my problem. But I suggest a further step: I suggest
that if this is done, it would be nice if classname.staticmethodname
would return the classmethod instance, instead of the function itself.
I know that this things seems to contradict Guido's proposal, since he
suggests to return the function instead of a strange object, and I
suggest to return a strange object instead of a function. But this is
not true; Both are according to the idea that class attributes should
be, when possible, the same objects that were created when defining
the class. This is more consistent with the behaviour of modules
(module attributes are the objects that were created when the code was
run), and is more consistent with the general convention, that running
A = B
causes
A == B
to be true. Currently, Class.func = staticmethod(func), and Class.func
= func, don't behave by this rule. If the suggestions are accepted,
both will.

I just think it's simpler and cleaner that way. Just making
staticmethods callable would solve my practical problem too.

Noam Raphael
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Getting rid of unbound methods: patch available

2005-01-20 Thread Noam Raphael
  and is more consistent with the general convention, that running
  A = B
  causes
  A == B
  to be true. Currently, Class.func = staticmethod(func), and Class.func
  = func, don't behave by this rule. If the suggestions are accepted,
  both will.
 
 Well, given that attribute assignment can be overloaded, you can't
 depend on that requirement all the time.
 
Yes, I know. For example, I don't know how you can make this work for
classmethods. (although I have the idea that if nested scopes were
including classes, and there was a way to assign names to a different
scope, then there would be no need for them. But I have no idea how
this can be done, so never mind.)

I just think of it as a very common convention, and I don't find the
exceptions aesthetically pleasing. But of course, I accept practical
reasons for not making it that way.

 I recommend that you work around it by setting the default to None and
 substituting the real default in the function.

That's a good idea, I will probably use it. (I thought of a different
way: don't use decorators, and wrap the function in a staticmethod
after defining the function that uses it. But this is really ugly.)

Thanks for your reply,
Noam
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] An import hook which does nothing

2004-12-02 Thread Noam Raphael
Hello,
I'm currently writing an import hook which will cache information on the 
local disk, instead of retrieving it every time from the slow NFS where 
I work.
To make sure that I understand how import hooks work, and as a starting 
point, I wrote a dummy import hook, which is supposed to behave like the 
python's built-in import mechanism.
I post it here, for two reasons:
1. So that developers who understand the import mechanism better than I 
do may review it and find things which I didn't do exactly right.
2. Because I think other people might find it helpful when writing new 
import hooks, as a starting point as well as for better understanding -- 
there's nothing like a working example to help you settle up on what 
does which where. (Although perhaps a few more comments, in addition to 
those which I wrote, might help...)

(Note: I wrote the DEBUG parts in order to make sure that my importer 
works, because if it fails things might be done by the built-in importer 
and I won't notice. These parts can be removed, of course.)

Do you think that it might be useful? Maybe something like that can go 
into the examples section of the imp module?

Thanks,
Noam Raphael
import imp
import sys

DEBUG = True

if DEBUG:
myimports = []


class InPackageFinder(object):
Find a module/package in a package.

def find_module(self, fullname, path=None):
if path is None:
# Not in a package - don't handle it here.
return None

try:
f, fn, desc = imp.find_module(fullname[fullname.rfind('.')+1:],
  path)
except ImportError:
# Well, I don't find it, maybe others will.
return None

return Loader(f, fn, desc)


class TopLevelFinder(object):
Find a top level module/package.

def __init__(self, path):
self._path = path

def find_module(self, fullname):
try:
f, fn, desc = imp.find_module(fullname, [self._path])
except ImportError:
# It is not in this path. Maybe in another one.
return None

return Loader(f, fn, desc)


class Loader(object):
Load a module/package.

def __init__(self, f, fn, desc):
self._f, self._fn, self._desc = f, fn, desc

def load_module(self, fullname):
if DEBUG:
myimports.append(fullname)

try:
return imp.load_module(fullname, self._f, self._fn, self._desc)
finally:
if self._f is not None:
# For packages we have None instead of a file object.
self._f.close()


def install():
sys.meta_path.append(InPackageFinder())
sys.path_hooks.append(TopLevelFinder)
sys.path_importer_cache.clear()

if DEBUG:
myimports.extend(sys.modules.iterkeys())

if DEBUG:
def checkok():
return not [x for x in sys.modules if
sys.modules[x] is not None
and hasattr(sys.modules[x], __file__)
and x not in myimports]
___
Python-Dev mailing list
[EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com