Re: what is the difference between the two kinds of brackets?

2007-10-21 Thread Alex Martelli
James Stroud <[EMAIL PROTECTED]> wrote:
   ...
> > I wonder if its the philosophical difference between:
> > 
> > "Anything not expressly allowed is forbidden"
> > 
> > and
> > 
> > "Anything not expressly forbidden is allowed"  ?
> > 
> > - Hendrik
> 
> The latter is how I interpret any religious moral code--life is a lot
> more fun that way. Maybe that percolates to how I use python?

FYI, in Security the first approach is also known as "Default Deny", the
second one as "Default Permit".

explains why "default permit" is THE very dumbest one of the "six
dumbest ideas in computer security" which the article is all about.

But then, the needs of Security are often antithetical to everything
else we wish for -- security and convenience just don't mix:-(


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: C++ version of the C Python API?

2007-10-21 Thread Alex Martelli
"Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
   ...
> The most popular ones are Boost.Python, CXX, and PySTL.

I think SIP is also pretty popular (see
).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Appending a list's elements to another list using a list comprehension

2007-10-18 Thread Alex Martelli
Debajit Adhikary <[EMAIL PROTECTED]> wrote:
   ...
> How does "a.extend(b)" compare with "a += b" when it comes to
> performance? Does a + b create a completely new list that it assigns
> back to a? If so, a.extend(b) would seem to be faster. How could I
> verify things like these?

That's what the timeit module is for, but make sure that the snippet
you're timing has no side effects (since it's repeatedly executed).
E.g.:

brain:~ alex$ python -mtimeit -s'z=[1,2,3];b=[4,5,6]'
'a=z[:];a.extend(b)'
100 loops, best of 3: 0.769 usec per loop
brain:~ alex$ python -mtimeit -s'z=[1,2,3];b=[4,5,6]' 'a=z[:];a+=b'
100 loops, best of 3: 0.664 usec per loop
brain:~ alex$ python -mtimeit -s'z=[1,2,3];b=[4,5,6]'
'a=z[:];a.extend(b)'
100 loops, best of 3: 0.769 usec per loop
brain:~ alex$ python -mtimeit -s'z=[1,2,3];b=[4,5,6]' 'a=z[:];a+=b'
100 loops, best of 3: 0.665 usec per loop
brain:~ alex$ 

The repetition of the measurements show them very steady, so now you
know that += is about 100 nanoseconds faster (on my laptop) than extend
(the reason is: it saves the tiny cost of looking up 'extend' on a; to
verify this, use much longer lists and you'll notice that while overall
times for both approaches increase, the difference between the two
approaches remains about the same for lists of any length).

But the key point to retain is: make sure that the snippet is free of
side effects, so that each of the MANY repetitions that timeit does is
repeating the SAME operation.  If we initialized a in the -s and then
just extended it in the snippet, we'd be extending a list that keeps
growing at each repetition -- a very different operation than extending
a list of a certain fixed starting length (here, serendipitously, we'd
end up measuring the same difference -- but in the general case, where
timing difference between approaches DOES depend on the sizes of the
objects involved, our measurements would instead become meaningless).

Therefore, we initialize in -s an auxiliary list, and copy it in the
snippet.  That's better than the more natural alternative:

brain:~ alex$ python -mtimeit 'a=[1,2,3];a+=[4,5,6]'
100 loops, best of 3: 1.01 usec per loop
brain:~ alex$ python -mtimeit 'a=[1,2,3];a.extend([4,5,6])'
100 loops, best of 3: 1.12 usec per loop
brain:~ alex$ python -mtimeit 'a=[1,2,3];a+=[4,5,6]'
100 loops, best of 3: 1.02 usec per loop
brain:~ alex$ python -mtimeit 'a=[1,2,3];a.extend([4,5,6])'
100 loops, best of 3: 1.12 usec per loop

as in this "more natural alternative" we're also paying each time
through the snippet the cost of building the literal lists; this
overhead (which is a lot larger than the difference we're trying to
measure!) does not DISTORT the measurement but it sure OBSCURES it to
some extend (losing us about one significant digit worth of difference
in this case).  Remember, the WORST simple operation you can do in
measurement is gauging a small number delta as the difference of two
much larger numbers X and X+delta... so, make X as small as feasible to
reduce the resulting loss of precision!-)

You can find more details on commandline use of timeit at
 (see adjacent nodes in Python
docs for examples and details on the more advanced use of timeit inside
your own code) but I hope these indications may be of help anyway.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to generate alternate toggling values in a loop?

2007-10-18 Thread Alex Martelli
Grant Edwards <[EMAIL PROTECTED]> wrote:
   ...
> I like the solution somebody sent me via PM:
> 
> def toggle():
> while 1:
> yield "Even"
> yield "Odd"

I think the itertools-based solution is more elegant:

toggle = itertools.cycle(('Even', 'Odd'))

and use toggle rather than toggle() later; or, just use that
itertools.cycle call inside the expression instead of toggle().


Alex
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Inheriting automatic attributes initializer considered harmful?

2007-10-17 Thread Alex Martelli
Andrew Durdin <[EMAIL PROTECTED]> wrote:

> On 10/17/07, Thomas Wittek <[EMAIL PROTECTED]> wrote:
> >
> > Writing such constructors for all classes is very tedious.
> > So I subclass them from this base class to avoid writing these constructors:
> >
> >   class AutoInitAttributes(object):
> >   def __init__(self, **kwargs):
> >   for k, v in kwargs.items():
> >   getattr(self, k) # assure that the attribute exits
> >   setattr(self, k, v)
> >
> > Is there already a standard lib class doing (something like) this?
> > Or is it even harmful to do this?
> 
> It depends on your kwargs and where they're coming from.  You could do
> something like this, for example:
> 
> def fake_str(self):
> return "not a User"
> 
> u = User(__str__=fake_str)
> str(u)

...and, if you did, that would be totally harmless (in a new-style class
as shown by the OP):

>>> class AutoInitAttributes(object):
...   def __init__(self, **kwargs):
...   for k, v in kwargs.items():
...   getattr(self, k) # assure that the attribute exits
...   setattr(self, k, v)
... 
>>> class User(AutoInitAttributes): pass
... 
>>> def fake_str(self):
... return "not a User"
... 
>>> u = User(__str__=fake_str)
>>> str(u)
'<__main__.User object at 0x635f0>'
>>> 

fake_str is not called, because special-method lookup occurs on the
TYPE, *NOT* on the instance.

The OP's idea is handy for some "generic containers" (I published it as
the "Bunch" class back in 2001 in
, and I
doubt it was original even then); it's not particularly recommended for
classes that need to have some specific *NON*-special methods, because
then the "overwriting" issue MIGHT possibly be a (minor) annoyance.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: int to str in list elements..

2007-10-14 Thread Alex Martelli
Abandoned <[EMAIL PROTECTED]> wrote:

> Hi..
> I have a list as a=[1, 2, 3  ] (4 million elements)
> and
> b=",".join(a)
> than
> TypeError: sequence item 0: expected string, int found
> I want to change list to  a=['1','2','3'] but i don't want to use FOR
> because my list very very big.
> I'm sorry my bad english.
> King regards

Try b=','.join(map(str, a)) -- it WILL take up some memory (temporarily)
to build the huge resulting string, but there's no real way to avoid
that.

It does run a bit faster than a genexp with for...:

brain:~ alex$ python -mtimeit -s'a=range(4000*1000)'
'b=",".join(map(str,a))'
10 loops, best of 3: 3.37 sec per loop

brain:~ alex$ python -mtimeit -s'a=range(4000*1000)' 'b=",".join(str(x)
for x i
n a)'
10 loops, best of 3: 4.36 sec per loop


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python on imac

2007-10-14 Thread Alex Martelli
Raffaele Salmaso <[EMAIL PROTECTED]> wrote:

> Alex Martelli wrote:
> > I use Mac OSX 10.4 and this assertion seems unfounded -- I can't see any
> > wx as part of the stock Python (2.3.5).  Maybe you mean something else?
> Very old version, see
> /System/Library/Frameworks/Python.framework/Versions/2.3/Extras/lib/python
> /wx-2.5.3-mac-unicode
>

Ah, I see it now, thanks.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python on imac

2007-10-14 Thread Alex Martelli
James Stroud <[EMAIL PROTECTED]> wrote:
   ...
> For OS X 10.4, wx has come as part of the stock python install. You may

I use Mac OSX 10.4 and this assertion seems unfounded -- I can't see any
wx as part of the stock Python (2.3.5).  Maybe you mean something else?


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: The fundamental concept of continuations

2007-10-13 Thread Alex Martelli
Matthias Benkard <[EMAIL PROTECTED]> wrote:

> continuations.  There used to be a project called Stackless Python that
> tried to add continuations to Python, but as far as I know, it has always
> been separate from the official Python interpreter.  I don't know whether
> it's still alive.  You may want to check http://stackless.com/

Alive and well, but it has removed continuations (which were indeed in
early versions, as per the paper at
).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem of Readability of Python

2007-10-07 Thread Alex Martelli
Steven D'Aprano <[EMAIL PROTECTED]> wrote:

> On Sun, 07 Oct 2007 13:24:14 -0700, Alex Martelli wrote:
> 
> > And yes, you CAN save about 1/3 of those 85 nanoseconds by having
> > '__slots__=["zop"]' in your class A(object)... but that's the kind of
> > thing one normally does only to tiny parts of one's program that have
> > been identified by profiling as dramatic bottlenecks
> 
> Seems to me that:
> 
> class Record(object):
> __slots__ = ["x", "y", "z"]
> 
> 
> has a couple of major advantages over:
> 
> class Record(object):
> pass
> 
> 
> aside from the micro-optimization that classes using __slots__ are faster
> and smaller than classes with __dict__.
> 
> (1) The field names are explicit and self-documenting;
> (2) You can't accidentally assign to a mistyped field name without Python
> letting you know immediately.
> 
> 
> Maybe it's the old Pascal programmer in me coming out, but I think 
> they're big advantages.

I'm also an old Pascal programmer (ask anybody who was at IBM in the
'80s who was the most active poster on the TURBO FORUM about Turbo
Pascal, and PASCALVS FORUM about Pascal/Vs...), and yet I consider these
"advantages" to be trivial in most cases compared to the loss in
flexibility, such as the inability to pickle (without bothering to code
an explicit __getstate__) and the inability to "monkey-patch" instances
on the fly -- not to mention the bother of defining a separate 'Record'
class for each and every combination of attributes you might want to put
together.

If you REALLY pine for Pascal's records, you might choose to inherit
from ctypes.Structure, which has the additional "advantages" of
specifying a C type for each field and (a real advantage;-) creating an
appropriate __init__ method.

>>> import ctypes
>>> class Record(ctypes.Structure):
...  _fields_ =
(('x',ctypes.c_float),('y',ctypes.c_float),('z',ctypes.c_float)
)
... 
>>> r=Record()
>>> r.x
0.0
>>> r=Record(1,2,3)
>>> r.x
1.0
>>> r=Record('zip','zop','zap')
Traceback (most recent call last):
  File "", line 1, in 
TypeError:  float expected instead of str instance

See?  You get type-checking too -- Pascal looms closer and closer!-)

And if you need an array of 1000 such Records, just use as the type
Record*1000 -- think of the savings in memory (no indirectness, no
overallocations as lists may have...).

If I had any real need for such things, I'd probably use a metaclass (or
class decorator) to also add a nice __repr__ function, etc...


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie packages Q

2007-10-07 Thread Alex Martelli
MarkyMarc <[EMAIL PROTECTED]> wrote:
   ...
> > As long as '/python' comes in the list before any other directory that
> > might interfere (by dint of having a Test.py or Test/__init__.py), and
> > in particular in the non-pathological case where there are no such
> > possible interferences, my assertion here quoted still holds.
> >
> > If you're having problems in this case, run with python -v to get
> > information about all that's being imported, print sys.path and
> > sys.modules just before the import statement that you think is failing,
> > and copy and paste all the output here, incuding the traceback from said
> > failing import.
> >
> > Alex
> 
> OK thank you, with some help from the -v option and debugging I found
> a test package in some package. I now renamed it and load it with
> sys.path.append.
> And now the btest.py works.

Good.

> BUT does this mean I have to set the path too the package in every
> __init__.py class?
> Or have do I tell a subpackage that it is part of a big package ?

The package directory (the one containing __init__.py) must be on some
directory in sys.path, just like a plain something.py module would have
to be in order to be importable.  How you arrange for this is up to you
(I normally install all add-ons in the site-packages directory of my
Python installation: that's what Python's distutils do by default, ).
As for conflict in names (of modules and/or packages), they're of course
best avoided than worked-around; not naming any module test.py, nor any
package (directory containing an __init__.py) Test, is a good start.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie packages Q

2007-10-07 Thread Alex Martelli
MarkyMarc <[EMAIL PROTECTED]> wrote:
   ...
> > > > > And sys.path is  /python/Test/bpack
> >
> > sys.path must be a LIST.  Are you saying you set yours to NOT be a list,
> > but, e.g., a STRING?!  (It's hard to tell, as you show no quotes there).
   ...
> > > I also tried to put /python/ and /python/Test in the sys.path same
> > > result.
> >
> > If the only ITEM in the list that is sys.path is the string '/python',
> > then any Python code you execute will be able to import Test.apack (as
> > well as Test.bpack, or just Test).
> 
> Of course I have more than just the /python string in the sys.path.
> I have a list of paths, depending on which system the code run on.

As long as '/python' comes in the list before any other directory that
might interfere (by dint of having a Test.py or Test/__init__.py), and
in particular in the non-pathological case where there are no such
possible interferences, my assertion here quoted still holds.

If you're having problems in this case, run with python -v to get
information about all that's being imported, print sys.path and
sys.modules just before the import statement that you think is failing,
and copy and paste all the output here, incuding the traceback from said
failing import.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Problem of Readability of Python

2007-10-07 Thread Alex Martelli
Licheng Fang <[EMAIL PROTECTED]> wrote:
   ...
> Python Tutorial says an empty class can be used to do this. But if
> namespaces are implemented as dicts, wouldn't it incur much overhead
> if one defines empty classes as such for some very frequently used
> data structures of the program?

Just measure:

$ python -mtimeit -s'class A(object):pass' -s'a=A()' 'a.zop=23'
100 loops, best of 3: 0.241 usec per loop

$ python -mtimeit -s'a=[None]' 'a[0]=23'
1000 loops, best of 3: 0.156 usec per loop

So, the difference, on my 18-months-old laptop, is about 85 nanoseconds
per write-access; if you have a million such accesses in a typical run
of your program, it will slow the program down by about 85 milliseconds.
Is that "much overhead"?  If your program does nothing else except those
accesses, maybe, but then why are your writing that program AT ALL?-)

And yes, you CAN save about 1/3 of those 85 nanoseconds by having
'__slots__=["zop"]' in your class A(object)... but that's the kind of
thing one normally does only to tiny parts of one's program that have
been identified by profiling as dramatic bottlenecks, to shave off the
last few nanoseconds in the very last stages of micro-optimization of a
program that's ALMOST, but not QUITE, fast enough... knowing about such
"extreme last-ditch optimization tricks" is of very doubtful value (and
I think I'm qualified to say that, since I _do_ know many of them...:-).
There ARE important performance things to know about Python, but those
worth a few nanoseconds don't matter much.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Newbie packages Q

2007-10-07 Thread Alex Martelli
MarkyMarc <[EMAIL PROTECTED]> wrote:
   ...
> > > And sys.path is  /python/Test/bpack

sys.path must be a LIST.  Are you saying you set yours to NOT be a list,
but, e.g., a STRING?!  (It's hard to tell, as you show no quotes there).

> > The 'Test' package is *not* in your sys.path.
> 
> I can say yes to the first:
> The atest.py is in the right dir/package.
> And the third. If it is not good enough that this /python/Test/bpack
> is in the path.
> Then I can not understand the package thing.
> 
> I also tried to put /python/ and /python/Test in the sys.path same
> result.

If the only ITEM in the list that is sys.path is the string '/python',
then any Python code you execute will be able to import Test.apack (as
well as Test.bpack, or just Test).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: weakrefs and bound methods

2007-10-07 Thread Alex Martelli
Mathias Panzenboeck <[EMAIL PROTECTED]> wrote:

> Marc 'BlackJack' Rintsch wrote:
> > ``del b`` just deletes the name `b`.  It does not delete the object.
> > There's still the name `_` bound to it in the interactive interpreter.
> > `_` stays bound to the last non-`None` result in the interpreter.
> 
> Actually I have the opposite problem. The reference (to the bound method)
> gets lost but it shouldn't!

weakrefs to bound methods require some subtlety, see
 (or what
I believe is the better treatment of this recipe in the printed edition
of the Python Cookbook -- of course, being the latter's editor, I'm
biased;-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: weakrefs and bound methods

2007-10-07 Thread Alex Martelli
Mathias Panzenboeck <[EMAIL PROTECTED]> wrote:
   ...
> I only inserted them so I can see if the objects are really freed. How can
> I see that without a __del__ method?

You can use weakref.ref instances with finalizer functions - see the
long post I just made on this thread for a reasonably rich and complex
example.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: weakrefs and bound methods

2007-10-07 Thread Alex Martelli
Steven D'Aprano <[EMAIL PROTECTED]> wrote:
   ...
> Without __del__, what should I have done to test that my code was 
> deleting objects and not leaking memory?

See module gc in the Python standard library.


> What should I do when my objects need to perform some special processing
> when they are freed, if I shouldn't use __del__?

The solid, reliable way is:

from __future__ import with_statement

and use module contextlib from the Python standard library (or handcode
an __exit__ method, but that's rarely needed), generating these special
objects that require special processing only in 'with' statements.  This
"resource acquisition is initialization" (RAII) pattern is the RIGHT way
to ensure timely finalization (particularly but not exclusively in
garbage-collected languages, and particularly but not exclusively to
ease portability to different garbage collection strategies -- e.g.,
among CPython and future versions of IronPython and/or Jython that will
support the with statement).

An alternative that will work in pre-2.5 Python (and, I believe but I'm
not sure, in Jython and IronPython _today_) is to rely on the weakref
module of the standard Python library.  If your finalizer, in order to
perform "special processing", requires access to some values that depend
on the just-freed object, you'll have to carefully stash those values
"elsewhere", because the finalizer gets called _after_ the object is
freed (this crucial bit of sequencing semantics is what allows weak
references to work while "strong finalizers" [aka destructors] don't
play well with garbage collection when reference-loops are possible).
E.g., weakref.ref instances are hashable, so you can keep a per-class
dict keyed by them to hold the special values that are needed for
special processing at finalization, and use accessors as needed to make
those special values still look like attributes of instances of your
class.

E.g., consider:

import weakref

class ClosingAtDel(object):
_xs = {}
def __init__(self, x):
self._r = weakref.ref(self, self._closeit)
self._xs[self._r] = x
@property
def x(self):
return self._xs[self._r]
@classmethod
def _closeit(cls, theweakref):
cls._xs[theweakref].close()
del cls._xs[theweakref]

This will ensure that .close() is called on the object 'wrapped' in the
instance of ClosingAtDel when the latter instance goes away -- even when
the "going away" is due to a reference loop getting collected by gc.  If
ClosingAtDel had a __del__ method, that would interfere with the garbage
collection.  For example, consider adding to that class the following
test/example code:

class Zap(object):
def close(self): print 'closed', self

c = ClosingAtDel(Zap())
d = ClosingAtDel(Zap())
print c.x, d.x
# create a reference loop
c.xx = d; d.xx = c
# garbage-collect it anyway
import gc
del c; del d; gc.collect()
print 'done!'

you'll get a pleasant, expected output:

$ python wr.py
<__main__.Zap object at 0x6b430> <__main__.Zap object at 0x6b490>
closed <__main__.Zap object at 0x6b430>
closed <__main__.Zap object at 0x6b490>
done!

Suppose that ClosingAtDel was instead miscoded with a __del__, e.g.:


class ClosingAtDel(object):
def __init__(self, x):
self.x = x
def __del__(self):
self.x.close()


Now, the same test/example code would emit a desolating...:

$ python wr.py
<__main__.Zap object at 0x6b5b0> <__main__.Zap object at 0x6b610>
done!

I.e., the assumed-to-be-crucial calls to .close() have NOT been
performed, because __del__ inhibits collection of reference-looping
garbage.  _Ensuring_ you always avoid reference loops (in intricate
real-life cases) is basically unfeasible (that's why we HAVE gc in the
first place -- for non-loopy cases, reference counting suffices;-), so
the best strategy is to avoid coding __del__ methods, just as Marc
recommends.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can you please give me some advice?

2007-09-30 Thread Alex Martelli
Byung-Hee HWANG <[EMAIL PROTECTED]> wrote:

> Hi there,
> 
> What is different between Ruby and Python?

Not all that much; Python is more mature, Ruby more fashionable.

 I am wondering what language
> is really mine for work. Somebody tell me Ruby is clean or Python is
> really easy! Anyway I will really make decision today what I have to
> study from now on. What I make the decision is more difficult than to
> know why I have to learn English. Yeah I do not like to learn English
> because it is just very painful..

www.python.or.kr/
http://wiki.python.org/moin/KoreanPythonBooks


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3.0 migration plans?

2007-09-28 Thread Alex Martelli
John Nagle <[EMAIL PROTECTED]> wrote:

> TheFlyingDutchman wrote:
> > It seems that Python 3 is more significant for what it removes than
> > what it adds.
> > 
> > What are the additions that people find the most compelling?
> 
> I'd rather see Python 2.5 finished, so it just works.

And I'd rather see peace on Earth and goodwill among men than _either_
Python 3 or your cherished "finished" 2.5 -- the comparison and implied
tradeoff make about as much sense as yours.

> All the major third-party libraries working and available with
> working builds for all major platforms.  That working set
> of components in all the major distros used on servers.
> The major hosting companies having that up and running on
> their servers.  Windows installers that install a collection
> of components that all play well together.
> 
> That's what I mean by "working".

I.e., you mean tasks appropriate for maintainers of all the major
third-party libraries, distros, and hosting companies -- great, go
convince them, or go convince all warmongers on Earth to make peace if
you want an even harder tasks with even better potential impact on the
state of the world, then.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Google and Python

2007-09-24 Thread Alex Martelli
Bryan Olson <[EMAIL PROTECTED]> wrote:
   ...
> > YouTube (one of Google's most valuable properties) is essentially
> > all-Python (except for open-source infrastructure components such as
> > lighttpd).  Also, at Google I'm specifically "Uber Tech Lead, Production
> > Systems": while I can't discuss details, my main responsibilities relate
> > to various software projects that are part of our "deep infrastructure",
> > and our general philosophy there is "Python where we can, C++ where we
> > must". 
> 
> Good motto. So is most of Google's code base now in
> Python? About what is the ratio of Python code to C++
> code? Of course lines of code is kine of a bogus measure.
> Of all those cycles Google executes, about what portion
> are executed by a Python interpreter?

I don't have those numbers at hand, and if I did they would be
confidential: you know that Google doesn't release many numbers at all
about its operations, most particularly not about our production
infrastructure (not even, say, how many server we have, in how many data
centers, with what bandwidth, and so on).

Still, I wouldn't say that "most" of our codebase is in Python: there's
a lot of Java, a lot of C++, a lot of Python, a lot of Javascript (which
may not correspond to all that many "cycles Google executes" since the
main point of coding in Javascript is having it execute in the user's
browser, of course, but it's still code that gets developed, debugged,
deployed, maintained), and a lot of other languages including ones that
Google developed in-house such as
 .


> > Python is definitely not "just a tiny little piece" nor (by a
> > long shot) used only for "scripting" tasks; 
> 
> Ah, sorry. I meant the choice of scripting language was
> a tiny little piece of Google's method of operation.

In the same sense in which other such technology choices (C++, Java,
what operating systems, what relational databases, what http servers,
and so on) are similarly "tiny pieces", maybe.  Considering the number
of technology choices that must be made, plus the number of other
choices that aren't directly about technology but, say, about
methodology (style guides for each language in use, mandatory code
reviews before committing to the shared codebase, release-engineering
practices, standards for unit-tests and other kinds of tests, and so on,
and so forth), one could defensibly make a case that each and every such
choice must of necessity be "but a tiny little piece" of the whole.

> "Scripting language" means languages such as Python,
> Perl, and Ruby.

A widespread terminology, but nevertheless a fundamentally bankrupt one:
when a language is used to develop an application, it's very misleading
to call it a "scripting language", as it implies that it's instead used
only to "script" something else.  When it comes time to decide which mix
of languages to use to develop a new application, it's important to
avoid being biased by having tagged some languages as "scripting" ones,
some (say Java) as "application" ones, others yet (say C++) as "system"
ones -- the natural subconscious process would be to say "well I'm
developing an X, I should use an X language, not a Y language or a Z
language", which is most likely to lead to wrong choices.


> > if the mutant space-eating
> > nanovirus should instantly stop the execution of all Python code, the
> > powerful infrastructure that has been often described as "Google's
> > secret weapon" would seize up.
> 
> And the essence of the Google way is to employ a lot of
> smart programmers to build their own software to run on
> Google's infrastructure. Choice of language is triva.

No, it's far from trivial, any more than choice of operating system, and
so on.  Google is a technology company: exactly which technologies to
use and/or develop for the various necessary tasks, far from being
trivial, is the very HEART of its operation.

Your ludicrous claim is similar to saying that the essence of a certain
hedge fund is to employ smart traders to make a lot of money by
sophisticated trades (so far so reasonable) and (here comes the idiocy)
"choice of currencies and financial instruments is trivia" (?!?!?!) --
it's the HEART of such a fund, to pick and choose which positions to
build, unwind, or sell-on, and which (e.g.) currencies should be
involved in such positions is obviously *crucial*, one of the many
important decisions those "smart traders" make every day, and far from
the least important of the many.  And similarly, OF COURSE, for choices
of technologies (programming languages very important among those) for a
technology company, just like, say, what horticultural techniques and
chemicals to employ would be for a company whose "essence" was
cultivating artichokes for sale on the market, and so on.


> I think both Python Google are great. What I find
> ludicrous is the idea that the bits one hears about how
> Google builds its software make a case for how others
>

Re: Google and Python

2007-09-20 Thread Alex Martelli
Bryan Olson <[EMAIL PROTECTED]> wrote:
   ...
> TheFlyingDutchman asked of someone:
> > Would you know what technique the custom web server uses
> > to invoke a C++ app 
> 
> No, I expect he would not know that. I can tell you
> that GWS is just for Google, and anyone else is almost
> certainly better off with Apache.

Or lighttpd, like YouTube (cfr
).


> How does Google use Python? As their scripting-language
> of choice. A fine choice, but just a tiny little piece.
> 
> Maybe Alex will disagree with me. In my short time at
> Google, I was uber-nobody.

YouTube (one of Google's most valuable properties) is essentially
all-Python (except for open-source infrastructure components such as
lighttpd).  Also, at Google I'm specifically "Uber Tech Lead, Production
Systems": while I can't discuss details, my main responsibilities relate
to various software projects that are part of our "deep infrastructure",
and our general philosophy there is "Python where we can, C++ where we
must".  Python is definitely not "just a tiny little piece" nor (by a
long shot) used only for "scripting" tasks; if the mutant space-eating
nanovirus should instantly stop the execution of all Python code, the
powerful infrastructure that has been often described as "Google's
secret weapon" would seize up.

The internal web applications needed to restore things, btw, would seize
up too; as I already said I can't give details of the ones I'm
responsible for (used by Google's network specialists, reliability
engineers, hardware technicians, etc), but Guido did manage to get
permission to talk about his work, Mondrian
() -- that's what we all use to review code, whatever language it's in,
before it can be submitted to the Google codebase (code reviews are a
mandatory step of development at Google).  Internal web applications are
the preferred way at Google to make any internal functionality
available, of course.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using pseudonyms

2007-09-18 Thread Alex Martelli
Aahz <[EMAIL PROTECTED]> wrote:

> For that matter, there are plenty of people who are better known by some
> nickname that is not their legal name.

Yep.  For example, some people whose legal name is "Alessandro" (which
no American is ever going to be able to spell right -- ONE L, TWO S's,
NOT an X or a J instead, "DRO" ending rather than "DER", etc), might
choose to avoid the hassle and go by "Alex" (just to make up a case...).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: super() doesn't get superclass

2007-09-17 Thread Alex Martelli
Ben Finney <[EMAIL PROTECTED]> wrote:

> Am I mistaken in thinking that "superclass of foo" is equivalent to
> "parent class of foo"? If so, I'd lay heavy odds that I'm not alone in
> that thinking.

"That thinking" (confusing "parent" with "ancestor") makes sense only
(if at all) in a single-inheritance world.  Python's super() exists to
support MULTIPLE inheritance.

In general, "a superclass of foo" means "a class X such that foo is a
sublass of X" and thus applies to all parents, all parents of parents,
and so on ("issubclass" does NOT mean "is a DIRECT AND IMMEDIATE
subclass", but "is a subclass"; check the Python builtin function of
that name).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can Python be useful as functional?

2007-09-17 Thread Alex Martelli
Rustom Mody <[EMAIL PROTECTED]> wrote:

> Can someone help? Heres the non-working code
> 
> def si(l):
> p = l.next()
> yield p
> (x for x in si(l) if x % p != 0)
> 
> There should be an yield or return somewhere but cant figure it out

Change last line to

for x in (x for x in si(l) if x % p != 0): yield x

if you wish.


Alex

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to join array of integers?

2007-09-16 Thread Alex Martelli
Paul Rudin <[EMAIL PROTECTED]> wrote:
   ...
> Isn't it odd that the generator isn't faster, since the comprehension
> presumably builds a list first and then iterates over it, whereas the
> generator doesn't need to make a list?

The generator doesn't, but the implementation of join then does
(almost).  See Objects/stringobject.c line 1745:

seq = PySequence_Fast(orig, "");

As per ,
"""
PyObject* PySequence_Fast(PyObject *o, const char *m)

Return value: New reference.

Returns the sequence o as a tuple, unless it is already a tuple or list,
in which case o is returned. Use PySequence_Fast_GET_ITEM() to access
the members of the result. Returns NULL on failure. If the object is not
a sequence, raises TypeError with m as the message text.
"""

If orig is neither a list nor a tuple, but for example a generator,
PySequence_Fast builds a list from it (even though its docs which I just
quoted says it builds a tuple -- building the list is clearly the right
choice, so I'd say it's the docs that are wrong, not the code;-)... so
in this particular case the usual advantage of the generator disappears.

PySequence_fast is called in 13 separate spots in 8 C files in the
Python 2.5 sources, so there may a few more surprises like this;-).


Alex
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Just bought Python in a Nutshell

2007-09-15 Thread Alex Martelli
7stud <[EMAIL PROTECTED]> wrote:

> Used copies of computer books for out of date editions are always
> cheap.  "Python in a Nutshell (2nd ed)" is a reference book with a
> frustratingly poor index--go figure.  It also contains errors not
> posted in the errata.

You can always enter errata at
 and thus help
all future readers of the book (if your errata are confirmed to be
valid).  Vague mentions of "errors not posted in the errata" are far
less useful (and unconfirmed, too).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "once" assigment in Python

2007-09-14 Thread Alex Martelli
Lorenzo Di Gregorio <[EMAIL PROTECTED]> wrote:

> When employing Python it's pretty straightforward to translate the
> instance to an object.
> 
> instance = Component(input=wire1,output=wire2)
> 
> Then you don't use "instance" *almost* anymore: it's an object which
> gets registered with the simulator kernel and gets called by reference
> and event-driven only by the simulator kernel.  We might reuse the
> name for calling some administrative methods related to the instance
> (e.g. for reporting) but that's a pretty safe thing to do.  Of course
> all this can be done during initialization, but there are some good
> reasons (see Verilog vs VHDL) why it's handy do be able to do it
> *anywhere*.  The annoying problem was that every time the program flow
> goes over the assignment, the object gets recreated.

If you originally set, e.g.,

  instance = None

then using in your later code:

  instance = instance or Component(...)

will stop the multiple creations.  Other possibilities include using a
compound name (say an.instance where 'an' is an instance of a suitable
container class) and overriding the __new__ method of class Component so
that it will not return multiple distinct objects with identical
attributes.  "Has this *plain* name ever been previously assigned to
anything at all" is simply not a particularly good condition to test for
(you COULD probably write a decorator that ensures that all
uninitialized local variables of a function are instead initialized to
None, but I'd DEFINITELY advise against such "black magic").


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3K or Python 2.9?

2007-09-13 Thread Alex Martelli
TheFlyingDutchman <[EMAIL PROTECTED]> wrote:

> > >>> Foo.bar(foo, "spam")
> > >>> foo.bar("spam")
> 
> That looks like a case of "There's more than one way to do it". ;)
> The first form is definitely consistent with the
> method declaration, so there's a lot to be said for using that style
> when teaching people to make classes -> send self, receive self.

On the other hand, the second form is not polymorphic: it doesn't allow
for foo to be an instance of some OTHER class (possibly subclassing Foo
and overriding bar) -- it will call the Foo version of bar anyway.

type(foo).bar(foo, "spam") *IS* almost semantically equivalent to the
obviousy simpler foo.bar("spam") -- but doesn't cover the possibility
for foo to do a *per-instance* override of 'bar'.

getattr(foo, 'bar', functools.partial(type(foo).bar, foo))("spam") is
getting closer to full semantic equivalence.  And if you think that's
"another OBVIOUS way of doing it" wrt foo.bar("spam"), I think your
definition of "obvious" may need a reset (not to mention the fact that
the "equivalent" version is way slower;-).

Foo.bar(foo, "spam")'s different semantics are important when any
implementation of type(foo).bar (or other method yet) wants to BYPASS
polymorphism to redirect part of the functionality to a specific type's
implementation of bar ('super' may help in some cases, but it keeps some
polymorphic aspects and pretty often you just want to cut all
polymorphism off and just redirect to ONE specific implementation).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newbie: self.member syntax seems /really/ annoying

2007-09-12 Thread Alex Martelli
Carl Banks <[EMAIL PROTECTED]> wrote:
   ...
> How about this?  The decorator could generate a bytecode wrapper that
> would have the following behavior, where __setlocal__ and
> __execute_function__ are special forms that are not possible in
> Python.  (The loops would necessarily be unwrapped in the actual
> bytecode.)

I'm not entirely sure how you think those "special forms" would work.

Right now, say, if the compiler sees somewhere in your function
z = 23
print z
it thereby knows that z is a local name, so it adds a slot to the
function's locals-array, suppose it's the 11th slot, and generates
bytecode for "LOAD_FAST 11" and "STORE_FAST 11" to access and bind that
'z'.  (The string 'z' is stored in f.func_code.co_varnames but is not
used for the access or storing, just for debug/reporting purposes; the
access and storing are very fast because they need no lookup).

If instead it sees a "print z" with no assignment to name z anywhere in
the function's body, it generates instead bytecode "LOAD_GLOBAL `z`"
(where the string `z` is actually stored in f.func_code.co_names).  The
string (variable name) gets looked up in dict f.func_globals each and
every time that variable is accessed or bound/rebound.

If the compiler turns this key optimization off (because it sees an exec
statement anywhere in the function, currently), then the bytecode it
generates (for variables it can't be sure are local, but can't be sure
otherwise either as they MIGHT be assigned in that exec...) is different
again -- it's LOAD_NAME (which is like LOAD_GLOBAL in that it does need
to look up the variable name string, but often even slower because it
needs to look it up in the locals and then also in the globals if not
currently found among the locals -- so it may often have to pay for two
lookups, not just one).

So it would appear that to make __setlocal__ work, among other minor
revolutions to Python's code objects (many things that are currently
tuples, built once and for all by the compiler at def time, would have
to become lists so that __setlocal__ can change them on the fly), all
the LOAD_GLOBAL occurrences would have to become LOAD_NAME instead (so,
all references to globals would slow down, just as they're slowed down
today when the compiler sees an exec statement in the function body).
Incidentally, Python 3.0 is moving the OTHER way, giving up the chore of
dropping optimization to support 'exec' -- the latter will become a
function instead of a statement and the compiler will NOT get out of its
way to make it work "right" any more; if LOAD_NAME remains among Python
bytecodes (e.g. it may remain in use for class-statement bodies) it
won't be easy to ask the compiler to emit it instead of LOAD_GLOBAL (the
trick of just adding "exec 'pass'" will not work any more;-).

So, "rewriting" the bytecode on the fly (to use LOAD_NAME instead of
LOAD_GLOBAL, despite the performance hit) seems to be necessary; if
you're willing to take those two performance hits (at decoration time,
and again each time the function is called) I think you could develop
the necessary bytecode hacks even today.

> This wouldn't be that much slower than just assigning local variables
> to locals by hand, and it would allow assignments in the
> straightforward way as well.

The big performance hit comes from the compiler having no clue about
what you're doing (exactly the crucial hint that "assigning local
variables by hand" DOES give the compiler;-)

> There'd be some gotchas, so extra care is required, but it seems like
> for the OP's particular use case of a complex math calculation script,
> it would be a decent solution.

Making such complex calculations even slower doesn't look great to me.

 
> I understand where the OP is coming from.  I've done flight
> simulations in Java where there are lot of complex calculations using
> symbols.  This is a typical formula (drag force calculation) that I
> would NOT want to have to use self.xxx for:
> 
> FX_wind = -0.5 * rho * Vsq * Sref * (C_D_0 + C_D_alphasq*alpha*alpha +
> C_D_esq*e*e)

If ALL the names in every formula always refer to nothing but instance
variables (no references to globals or builtins such as sin, pi, len,
abs, and so on, by barenames) then there might be better tricks, ones
that rely on that knowledge to actually make things *faster*, not
slower.  But they'd admittedly require a lot more work (basically a
separate specialized compiler to generate bytecode for these cases).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newbie: self.member syntax seems /really/ annoying

2007-09-12 Thread Alex Martelli
Chris Mellon <[EMAIL PROTECTED]> wrote:
   ...
> This is terrible and horrible, please don't use it. That said,
> presenting the magic implicit_self context manager!

...which doesn't work in functions -- just try changing your global
code:

> with implicit_self(t):
> print a
> print b
> a = 40
> b = a * 2

into a function and a call to it:

def f():
with implicit_self(t):
print a
print b
a = 40
b = a * 2
f()

...even with different values for the argument to _getframe.  You just
can't "dynamically" add local variables to a function, beyond the set
the compiler has determined are local (and those are exactly the ones
that are *assigned to* in the function's body -- no less, no more --
where "assigned to" includes name-binding statements such as 'def' and
'class' in addition to plain assignment statements, of course).

Making, say, 'a' hiddenly mean 'x.a', within a function, requires a
decorator that suitably rewrites the function's bytecode... (after
which, it WOULD still be terrible and horrible and not to be used, just
as you say, but it might at least _work_;-).  Main problem is, the
decorator needs to know the set of names to be "faked out" in this
terrible and horrible way at the time the 'def' statement executes: it
can't wait until runtime (to dynamically determine what's in var(self))
before it rewrites the bytecode (well, I guess you _could_ arrange a
complicated system to do that, but it _would_ be ridiculously slow).

You could try defeating the fundamental optimization that stands in your
way by adding, say,
exec 'pass'
inside the function-needing-fakeouts -- but we're getting deeper and
deeper into the mire...;-)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3K or Python 2.9?

2007-09-12 Thread Alex Martelli
Chris Mellon <[EMAIL PROTECTED]> wrote:
   ...
> > Actually you could do the "magic first-parameter insertion" just when
> > returning a bound or unbound method object in the function's __get__
> > special method, and that would cover all of the technical issues you
   ...
> This would mean that mixing functions and methods would have to be
> done like you do it in C++, with lots of careful knowledge and
> inspection of what you're working with.

Not particularly -- it would not require anything special that's not
required today.

> What would happen to stuff
> like inspect.getargspec?

It would return the signature of the function, if asked to analyze a
function, and the signature of the method, if asked to analyze a method.
Not exactly rocket science, as it happens.


> Besides, if self isn't in the argument spec, you know that the very
> next thing people will complain about is that it's not implicitly used
> for locals,

Whether 'self' needs to be explicit as a function's first argument, and
whether it needs to be explicit (as a "self." ``prefix'') to access
instance variables (which is what I guess you mean here by "locals",
since reading it as written makes zero sense), are of course separate
issues.

> and I'll punch a kitten before I accept having to read
> Python code guessing if something is a global, a local, or part of
> self like I do in C++.

Exactly: the technical objections that are being raised are bogus, and
the REAL objections from the Python community boil down to: we like it
better the way it is now.  Bringing technical objections that are easily
debunked doesn't _strengthen_ our real case: in fact, it _weakens_ it.
So, I'd rather see discussants focus on how things SHOULD be, rather
than argue they must stay that way because of technical difficulties
that do not really apply.

The real advantage of making 'self' explicit is that it IS explicit, and
we like it that way, just as much as its critics detest it.  Just like,
say, significant indentation, it's a deep part of Python's culture,
tradition, preferences, and mindset, and neither is going to go away (I
suspect, in fact, that, even if Guido somehow suddenly changed his mind,
these are issues on which even he couldn't impose a change at this point
without causing a fork in the community).  Making up weak technical
objections (ones that ignore the possibilities of __get__ or focus on
something so "absolutely central" to everyday programming practice as
inspect.getargspec [!!!], for example;-) is just not the right way to
communicate this state of affairs.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3K or Python 2.9?

2007-09-12 Thread Alex Martelli
Duncan Booth <[EMAIL PROTECTED]> wrote:
   ...
> As for omitting 'self' from method definitions, at first site you might
> think the compiler could just decide that any 'def' directly inside a
> class could silently insert 'self' as an additional argument. This 
> doesn't work though because not everything defined in a class has to be
> an instance method: static methods don't have a self parameter at all,
> class methods traditionally use 'cls' instead of 'self' as the name of
> the first parameter and it is also possible to define a function inside
> a class block and use it as a function. e.g.

Actually you could do the "magic first-parameter insertion" just when
returning a bound or unbound method object in the function's __get__
special method, and that would cover all of the technical issues you
raise.  E.g.:

> class Weird:
>def factory(arg):
>"""Returns a function based on its argument"""
> 
>foo = factory("foo")
>bar = factory("bar")
>del factory
> 
> When factory is called, it is a simple function not a method. If it had

Sure, that's because the function object itself is called, not a bound
or unbound method object -- indeed. factory.__get__ never gets called
here.

> class C:
>def method(self): pass
> 
> and
> 
> def foo(self): pass
> class C: pass
> C.method = foo
> 
> both of these result in effectively the same class (although the second
> one has a different name for the method in tracebacks).

And exactly the same would occur if the self argument was omitted from
the signature and magically inserted when __get__ does its job.

> That consistency really is important. Whenever I see a 'def' I know 
> exactly what parameters the resulting function will take regardless of
> the context.

And this non-strictly-technical issue is the only "true" one.

> Another area to consider is what happens when I do:
> 
> foo = FooClass()
> 
> foo.bar(x)
> # versus
> f = foo.bar
> f(x)
> 
> Both of these work in exactly the same way in Python: the self parameter

And so they would with the "__get__ does magic" rule, NP.

> My point here is that in Python the magic is clearly defined and 
> overridable (so we can have static or class methods that act 
> differently).

And so it would be with that rule, since staticmethod &c create
different descriptor objects.

Really, the one and only true issue is that the Python community doesn't
like "magic".  It would be perfectly feasible, we just don't wanna:-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: concise code (beginner)

2007-09-09 Thread Alex Martelli
bambam <[EMAIL PROTECTED]> wrote:

> > O(n) to find the element you wish to remove and move over
> > everything after it,
> 
> Is that how lists are stored in cPython? It seems unlikely?

So-called "lists" in Python are stored contiguously in memory (more like
"vectors" in some other languages), so e.g. L[n] is O(1) [independent
from n] but removing an element is O(N) [as all following items need to
be shifted 1 place down].


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using s.sort([cmp[, key[, reverse]]]) to sort a list of objects based on a attribute

2007-09-09 Thread Alex Martelli
Stefan Arentz <[EMAIL PROTECTED]> wrote:

> Miki <[EMAIL PROTECTED]> writes:
> 
> > >   steps.sort(key = lambda s: s.time)
> > This is why attrgetter in the operator module was invented.
> > from operator import attrgetter
> > ...
> > steps.sort(key=attrgettr("time"))
> 
> Personally I prefer the anonymous function over attrgettr :)

However, Python disagrees with you...:

brain:~ alex$ python -mtimeit -s'from operator import attrgetter;
L=map(complex,xrange(999))' 'sorted(L, key=lambda x:x.real)'
1000 loops, best of 3: 567 usec per loop

brain:~ alex$ python -mtimeit -s'from operator import attrgetter;
L=map(complex,xrange(999))' 'sorted(L, key=attrgetter("real"))'
1000 loops, best of 3: 367 usec per loop

A speed-up of 35% is a pretty clear indicator of what _Python_ "prefers"
in this situation:-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unexpected behavior: did i create a pointer?

2007-09-09 Thread Alex Martelli
Arnaud Delobelle <[EMAIL PROTECTED]> wrote:
   ...
> > >>> def lower_list(L):
> >
> > ... for i, x in enumerate(L):
> > ... L[i] = x.lower()
> > ...>>> s = ['STRING']
> > >>> lower_list(s)
> > >>> print s == ['string']
> > True
> >
> > >>> def lower_string(s):
> >
> > ... s = s.lower()
> > ...>>> s = "STRING"
> > >>> lower_string(s)
> 
> Let's see what happens here:  when lower_string(s) is called, the 's'
> which is local to lower_string is made to point to the same object as
> the global s (i.e. the string object with value "STRING").  In the
> body of the function, the statement s=s.lower() makes the local 's'
> point to a new string object returned s.lower().  Of course this has
> not effect on what object the global 's' points to.

Yep, the analogy with C pointers would work fine here:

void lower_string(char* s) {
s = 
}

would fail to have the intended effect in C just as its equivalent does
in Python (in both Python and C, rebinding the local name s has no
effect on the caller of lower_string).  Add an indirection:

void lower_list(item* L) {
   ...
   L[i] = 
}

this indirection (via indexing) *does* modify the memory area (visible
by the caller) to which L points.

The difference between "name=something" and "name[i]=something" is so
*HUGE* in C (and in Python) that anybody who doesn't grok that
difference just doesn't know or understand any C (nor any Python).


> What I think is a more dangerous misconception is to think that the
> assignement operator (=) has the same meaning in C and python.

I've seen the prevalence of that particular misconception drop
dramatically over the years, as a growing fraction of the people who
come to Python after some previous programming experience become more
and more likely to have been exposed to *Java*, where assignment
semantics are very close to Python (despite Java's unfortunate
complication with "unboxed" elementary scalar types, in practice a vast
majority of occurrences of "a=b" in Java have just the same semantics as
they do in Python); teaching Python semantics to people with Java
exposure is trivially easy (moving from "ALMOST every variable is an
implicit reference -- excepting int and float ones" to "EVERY variable
is an implicit reference"...).


Alex
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Class design (information hiding)

2007-09-08 Thread Alex Martelli
Gregor Horvath <[EMAIL PROTECTED]> wrote:

> Alexander Eisenhuth schrieb:
> > 
> > I'm wodering how the information hiding in python is ment. As I 
> > understand there  doesn't exist public / protected / private  mechanism,
> > but a '_' and '__' naming convention.
> > 
> > As I figured out there is only public and private possible as speakin in
> > "C++ manner". Are you all happy with it. What does "the zen of python"
> > say to that design? (protected is useless?)
> 
> My favourite thread to this FAQ:
> 
>
> http://groups.google.at/group/comp.lang.python/browse_thread/thread/2c85
> d6412d9e99a4/b977ed1312e10b21#b977ed1312e10b21

Why, thanks for the pointer -- I'm particularly proud of having written
"""
The only really workable way to develop large software projects, just as
the only really workable way to run a large business, is a state of
controlled chaos.
"""
*before* I had read Brown and Eisenhardt's "Competing on the Edge:
Strategy as Structured Chaos" (at that time I had no real-world interest
in strategically managing a large business -- it was based on mere
intellectual curiosity and extrapolation that I wrote "controlled chaos"
where B & E have "structured chaos" so well and clearly explained;-).

BTW, if you want to read my entire post on that Austrian server, the
most direct URL is

...


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sort of an odd way to debug...

2007-09-04 Thread Alex Martelli
xkenneth <[EMAIL PROTECTED]> wrote:
   ...
> What I'd like to do, is define a base class. This base class would
> have a function, that gets called every time another function is
> called (regardless of whether in the base class or a derived class),
> and prints the doc string of each function whenever it's called. I'd
> like to be able to do this without explicitly specifying the function
> inside all of the other functions of a base class or derived class.

So you need to write a metaclass that wraps every function attribute of
the class into a wrapper performing such prints as you desire.  The
metaclass will be inherited by subclasses (unless metaclass conflicts
intervene in multiple-inheritance situation).

You don't appear to need the printing-wrapper to be a method, and it's
simpler to have it be a freestanding function, such as:

import functools
def make_printing_wrapper(f):
@functools.wraps(f)
def wrapper(*a, **k):
print f.__doc__
return f(*a, **k)
return wrapper

Now, the metaclass could be, say:

import inspect
class MetaWrapFunctions(type):
def __init__(cls, name, bases, attrs):
for k, f in attrs.iteritems():
if inspect.isfunction(f):
attrs[k] = make_printing_wrapper(f)
type.__init__(cls, name, bases, attrs)

and the base class:

class Base:
__metaclass__ = MetaWrapFunctions

Now, the code:

> class Derived(Base):
> """This function prints something"""
> def printSometing(something)
>  #ghost function get's called here
>  print something
> 
> Output would be:
> This function prints something
> something

Should behave as you described.  I have not tested the code I'm
suggesting (so there might be some errors of detail) but the general
idea should work.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list index()

2007-09-04 Thread Alex Martelli
Neil Cerutti <[EMAIL PROTECTED]> wrote:

> It's probable that a simpler implementation using slice
> operations will be faster for shortish lengths of subseq. It was
> certainly easier to get it working correctly. ;)
> 
> def find(seq, subseq):
>   for i, j in itertools.izip(xrange(len(seq)-len(subseq)),
>  xrange(len(subseq), len(seq))):
> if subseq == seq[i:j]:
>   return i
>   return -1

Simpler yet (though maybe slower!-):

def find(seq, subseq):
L = len(subseq)
for i in xrange(0, len(seq)-L):
if subseq == seq[i:i+L]: return i
return -1

also worth trying (may be faster in some cases, e.g. when the first item
of the subsequence occurs rarely in the sequence):

def find(seq, subseq):
L = len(subseq)
firstitem = subseq[0]
end = len(seq) - len(subseq)
i = -1
while 1:
try: i = seq.index(firstitem, i+1, end)
except ValueError: return -1
if subseq == seq[i:i+L]: return i

For particularly long sequences (with hashable items) it might even be
worth trying variants of Boyer-Moore, Horspool, or Knuth-Morris-Pratt;
while these search algorithms are mostly intended for text strings,
since you need tables indexed by the item values, using dicts for such
tables might yet be feasible (however, the program won't be quite that
simple).  Benchmarking of various possibilities on typical input data
for your application is recommended, as performance may vary!


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can you use -getattr- to get a function in the current module?

2007-09-03 Thread Alex Martelli
Sergio Correia <[EMAIL PROTECTED]> wrote:

> This works:
> 
> # Module spam.py
> 
> import eggs
> 
> print getattr(eggs, 'omelet')(100)
> 
> That is, I just call the function omelet inside the module eggs and
> evaulate it with the argument 100.
> 
> But what if the function 'omelet' is in the module where I do the
> getattr (that is, in spam.py).  If I do any of this
> 
> print getattr(spam, 'omelet')(100)
> print getattr('','omelet')(100)
> print getattr('omelet')(100)
> 
> It wont work. Any ideas?

globals() returns a dict of all globals defined so far, so, _after_ 'def
omelet ...' has executed, globals()['omelet'](100) should be OK.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Adding attributes stored in a list to a class dynamically.

2007-09-02 Thread Alex Martelli
Nathan Harmston <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> Sorry if the subject line of post is wrong, but I think that is what
> this is called. I want to create objects with
> 
> class Coconuts(object):
> def __init__(self, a, b, *args, **kwargs):
>   self.a = a
>   self.b = b
> 
> def spam( l )
>return Coconuts( l.a, l.b, l.attributes )
> 
> l in a parse line of a file which is a tuple wrapped with
> attrcol..with attributes a, b and attributes (which is a list of
> strings in the format key=value ie...
>[ "id=bar", "test=1234", "doh=qwerty" ]  ).
> 
> I want to add attributes to Coconuts so that I can do
> print c.id, c.test, c.doh
> 
> HOwever I m not sure how to do this:
> 
> how can i assign args, kwargs within the constructor of coconuts and
> how can I deconstruct the list to form the correct syntax to be able
> to be used for args, kwargs.

If you want to pass the attributes list it's simpler to do that
directly, avoiding *a and **k constructs.  E.g.:

  def __init__(self, a, b, attrs):
self.a = a
self.b = b
for attr in attrs:
  name, value = attr.split('=')
  setattr(self, name, value)

You may want to add some better error-handling (this code just raises
exceptions if any item in attrs has !=1 occurrences of the '=' sign,
etc, etc), but I hope this gives you the general idea.

Note that you'll have trouble accessing attributes that just happen to
be named like a Python keyword, e.g. if you have "yield=23" as one of
your attributes you will NOT be able to just say c.yield to get at that
attribute.  Also, I'm assuming it's OK for all of these attributes'
values to be strings, etc, etc.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

2007-09-02 Thread Alex Martelli
Mark Dickinson <[EMAIL PROTECTED]> wrote:

> On Sep 2, 12:55 pm, [EMAIL PROTECTED] (Alex Martelli) wrote:
> > Mark Dickinson <[EMAIL PROTECTED]> wrote:
> > > Well, for one thing, you're creating half a million xrange objects in
> > > the course of the search.  All the C code has
> > > to do is increment a few integers.
> >
> > I don't think the creation of xrange objects is a meaningful part of
> > Python's execution time here.  Consider:
> > [...]
> 
> Agreed---I just came to the same conclusion after doing some tests.
> So maybe it's the billion or so integer objects being created that
> dominate the running time?  (Not sure how many integer objects
> actually are created here: doesn't Python cache *some* small
> integers?)

Yep, some, say -5 to 100 or thereabouts; it also caches on a free-list
all the "empty" integer-objects it ever has (rather than returning the
memory for the system), so I don't think there's much optimization to be
had on that score either.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

2007-09-02 Thread Alex Martelli
Paul Rubin <http://[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] (Alex Martelli) writes:
> > ...which suggests that creating an xrange object is _cheaper_ than
> > indexing a list...
> 
> Why not re-use the xrange instead of keeping a list around?
> 
> Python 2.4.4 (#1, Oct 23 2006, 13:58:00) 
> >>> a = xrange(3)
> >>> print list(a)
> [0, 1, 2]
> >>> print list(a)
> [0, 1, 2]

Reusing xranges is exactly what my code was doing -- at each for loop
you need an xrange(1, k) for a different value of k, which is why you
need some container to keep them around (and a list of xrange objects is
the simplest applicable container).

Your suggestion doesn't appear to make any sense in the context of the
optimization problem at hand -- what list(...) calls are you thinking
of?!  Please indicate how your suggestion would apply in the context of:

def f3(M=M, solutions=solutions):
"pull out all the stops"
xrs = [xrange(1, k) for k in xrange(0, M+1)]
for a in xrs[M]:
a2 = a*a
for b in xrs[M-a]:
s = a2 + b*b
for c in xrs[M-a-b]:
if s == c*c:
solutions[a+b+c] += 1


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Google spreadsheets

2007-09-02 Thread Alex Martelli
iapain <[EMAIL PROTECTED]> wrote:

> On Aug 31, 5:40 pm, Michele Simionato <[EMAIL PROTECTED]>
> wrote:
> > I would like to upload a tab-separated file to a Google spreadsheet
> > from Python. Does anybody
> > have a recipe handy? TIA,
> >
> > Michele Simionato
> 
> Probably its irrelevant to python. Use should see Google Spreadsheet
> API and use it in your python application.
> 
> http://code.google.com/apis/spreadsheets/

For Python-specific use, you probably want to get the Python version of
the GData client libraries,
 ; an example of using it
with a spreadsheet is at
 .


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: code check for modifying sequence while iterating over it?

2007-09-02 Thread Alex Martelli
Neal Becker <[EMAIL PROTECTED]> wrote:

> After just getting bitten by this error, I wonder if any pylint, pychecker
> variant can detect this error?

I know pychecker can't (and I doubt pylint can, but I can't download the
latest version to check as logilab's website is temporarily down for
maintenance right now).  It's a very thorny problem to detect a
reasonable subset of likely occurrences of this bug by static analysis
only, i.e., without running the code:-(


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: localizing a sort

2007-09-02 Thread Alex Martelli
Ricardo Aráoz <[EMAIL PROTECTED]> wrote:
> Peter Otten wrote:
   ...
> > print ''.join(sorted(a, cmp=lambda x,y: locale.strcoll(x,y)))
> >> aeiouàáäèéëìíïòóöùúü
> > 
> > The lambda is superfluous. Just write cmp=locale.strcoll instead.
> 
> No it is not :
> >>> print ''.join(sorted(a, cmp=locale.strcoll(x,y)))
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: strcoll expected 2 arguments, got 0
> 
> You need the lambda to assign both arguments.

No, your mistake is that you're CALLING locale.strcoll, while as Peter
suggested you should just PASS it as the cmp argument.  I.e.,

''.join(sorted('ciao', cmp=locale.strcoll))

Using key=locale.strxfrm should be faster (at least when you're sorting
long-enough lists of strings), which is why strxfrm (and key=...:-)
exist in the first place, but cmp=locale.strcoll, while usually slower,
is entirely correct.  That lambda _IS_ superfluous, as Peter said.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why is this loop heavy code so slow in Python? Possible Project Euler spoilers

2007-09-02 Thread Alex Martelli
Mark Dickinson <[EMAIL PROTECTED]> wrote:

> On Sep 2, 9:45 am, [EMAIL PROTECTED] wrote:
> > [snip code]
> >
> > Thanks for that. I realise that improving the algorithm will speed
> > things up. I wanted to know why my less than perfect algorithm was so
> > much slower in python than exactly the same algorithm in C. Even when
> > turning off gcc's optimiser with the -O0 flag, the C version is still
> >
> > > 100 times quicker.
> 
> Well, for one thing, you're creating half a million xrange objects in
> the course of the search.  All the C code has
> to do is increment a few integers.

I don't think the creation of xrange objects is a meaningful part of
Python's execution time here.  Consider:

M = 1000
solutions = [0] * M

def f2():
"a*a + b*b precalculated"
for a in xrange(1, M):
a2 = a*a
for b in xrange(1, M - a):
s = a2 + b*b
for c in xrange(1, M - a - b):
if s == c*c:
solutions[a+b+c] += 1

def f3(M=M, solutions=solutions):
"pull out all the stops"
xrs = [xrange(1, k) for k in xrange(0, M+1)]
for a in xrs[M]:
a2 = a*a
for b in xrs[M-a]:
s = a2 + b*b
for c in xrs[M-a-b]:
if s == c*c:
solutions[a+b+c] += 1

import time

t = time.time()
f2()
e = time.time()
print e-t, max(xrange(M), key=solutions.__getitem__)

solutions = [0]*M
t = time.time()
f3(M, solutions)
e = time.time()
print e-t, max(xrange(M), key=solutions.__getitem__)


f2 is Arnaud's optimization of the OP's algorithm by simple hoisting; f3
further hoists the xrange creation -- it creates only 1000 such objects
rather than half a million.  And yet...:

brain:~/py25 alex$ python puz.py
34.6613101959 840
36.2000119686 840
brain:~/py25 alex$ 

...which suggests that creating an xrange object is _cheaper_ than
indexing a list...


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: status of Programming by Contract (PEP 316)?

2007-09-01 Thread Alex Martelli
Ricardo Aráoz <[EMAIL PROTECTED]> wrote:
   ...
> >> We should remember that the level
> >> of security of a 'System' is the same as the level of security of it's
> >> weakest component,
   ...
> You win the argument, and thanks you prove my point. You typically
> concerned yourself with the technical part of the matter, yet you
> completely ignored the point I was trying to make.

That's because I don't particularly care about "the point you were
trying to make" (either for or against -- as I said, it's a case of ROI
for different investments [in either security, or, more germanely to
this thread, reliability] rather than of useful/useless classification
of the investments), while I care deeply about proper system thinking
(which you keep failing badly on, even in this post).

> In the third part of your post, regarding security, I think you went off
> the road. The weakest component would not be one of the requisites of
> access, the weakest component I was referring to would be an actual
> APPLICATION,

Again, F- at system thinking: a system's components are NOT just
"applications" (what's the alternative to their being "actual", btw?),
nor is it necessarily likely that an application would be the weakest
one of the system's components (these wrong assertions are in addition
to your original error, which you keep repeating afterwards).

For example, in a system where access is gained *just* by knowing a
secret (e.g., a password), the "weakest component" is quite likely to be
that handy but very weak architectural choice -- or, seen from another
viewpoint, the human beings that are supposed to know that password,
remember, and keep it secret.  If you let them choose their password,
it's too likely to be "fred" or other easily guessable short word; if
you force them to make it at least 8 characters long, it's too likely to
be "fredfred"; if you force them to use length, mixed case and digits,
it's too likely to be "Fred2Fred".  If you therefore decide that
passwords chosen by humans are too weak and generate one for them,
obtaining, say, "FmZACc2eZL", they'll write it down (perhaps on a
post-it attached to their screen...) because they just can't commit to
memory a lot of long really-random strings (and nowadays the poor users
are all too likely to need to memorize far too many passwords).  A
clever attacker has many other ways to try to steal passwords, from
"social engineering" (pose as a repair person and ask the user to reveal
their password as a prerequisite of obtaining service), to keystroke
sniffers of several sorts, fake applications that imitate real ones and
steal the password before delegating to the real apps, etc, etc.

Similarly, if all that's needed is a physical token (say, some sort of
electronic key), that's relatively easy to purloin by traditional means,
such as pickpocketing and breaking-and-entering; certain kind of
electronic keys (such as the passive unencrypted RFID chips that are
often used e.g. to control access to buildings) are, in addition,
trivially easy to "steal" by other (technological) means.

Refusing to admit that certain components of a system ARE actually part
of the system is weak, blinkered thinking that just can't allow halfway
decent system design -- be that for purposes of reliability, security,
availability, or whatever else.  Indeed, if certain part of the system's
architecture are OUTSIDE your control (because you can't redesign the
human mind, for example;-), all the more important then to make them the
focus of the whole design (since you must design AROUND them, and any
amelioration of their weaknesses is likely to have great ROI -- e.g., if
you can make the users take a 30-minutes short course in password
security, and accompany that with a password generator that makes
reasonably memorable though random ones, you're going to get substantial
returns on investment in any password-using system's security).

> e.g. an ftp server. In that case, if you have several
> applications running your security will be the security of the weakest
> of them.

Again, false as usual, and for the same reason I already explained: if
your system can be broken by breaking any one of several components,
then it's generally WEAKER than the weakest of the components.  Say that
you're running on the system two servers, an FTP one that can be broken
into by 800 hackers in the world, and a SSH one that can only be broken
into by 300 hackers in the world; unless every single one of the hackers
who are able to break into the SSH server is *also* able to break into
the FTP one (a very special case indeed!), there are now *MORE* than 800
hackers in the world that can break into your system as a whole -- in
other words, again and no matter how often you repeat falsities to the
contraries without a shred of supporting argument, your assertion is
*FALSE*, and in this case your security is *WEAKER* than the security of
the weaker of the two components.

I do not really much care what point

Re: status of Programming by Contract (PEP 316)?

2007-09-01 Thread Alex Martelli
Russ <[EMAIL PROTECTED]> wrote:
   ...
> > > the inputs. To test the
> > > post-conditions, you just need a call at the bottom of the function,
> > > just before the return,
   ...
> > there's nothing to stop you putting the calls before every return.
> 
> Oops! I didn't think of that. The idea of putting one before every
> return certainly doesn't appeal to me. So much for that idea.

try:
  blah blah with as many return statements as you want
finally:
  something that gets executed unconditionally at the end

You'll need some convention such as "all the return statements are of
the same form ``return result''" (where the result may be computed
differently each time), but that's no different from the conventions you
need anyway to express such things as ``the value that foobar had at the
time the function was called''.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: status of Programming by Contract (PEP 316)?

2007-09-01 Thread Alex Martelli
Ricardo Aráoz <[EMAIL PROTECTED]> wrote:
   ...
> We should remember that the level
> of security of a 'System' is the same as the level of security of it's
> weakest component,

Not true (not even for security, much less for reliability which is
what's being discussed here).

It's easy to see how this assertion of yours is totally wrong in many
ways...

Example 1: a toy system made up of subsystem A (which has a probability
of 90% of working right) whose output feeds into subsystem B (which has
a probability of 80% of working right).  A's failures and B's faliures
are statistically independent (no common-mode failures, &c).

The ``level of goodness'' (probability of working right) of the weakest
component, B, is 80%; but the whole system has a ``level of goodness''
(probability of working right) of just 72%, since BOTH subsystems must
work right for the whole system to do so.  72 != 80 and thus your
assertion is false.

More generally: subsystems "in series" with independent failures can
produce a system that's weaker than its weakest component.


Example 2: another toy system made up of subsystems A1, A2 and A3, each
trying to transform the same input supplied to all of them into a 1 bit
result; each of these systems works right 80% of the time, statistically
independently (no common-mode failures, &c).  The three subsystems'
results are reconciled by a simple majority-voting component M which
emits as the system's result the bit value that's given by two out of
three of the Ai subsystems (or, of course, the value given unanimously
by all) and has extremely high reliability thanks to its utter
simplicity (say 99.9%, high enough that we can ignore M's contribution
to system failures in a first-order analysis).

The whole system will fail when all Ai fail together (probability
0.2**3) or when 2 out of them fail while the hird one is working
(probability 3*0.8*0.2**2):

>>> 0.2**3+3*0.2**2*0.8
0.10404

So, the system as a whole has a "level of goodness" (probability of
working right) of almost 90% -- again different from the "weakest
component" (each of the three Ai's), in this case higher.

More generally: subsystems "in parallel" (arranged so as to be able to
survive the failure of some subset) with indipendent failures can
produce a system that's stronger than its weakest component.


Even in the field of security, which (changing the subject...) you
specifically refer to, similar considerations apply.  If your assertion
was correct, then removing one component would never WEAKEN a system's
security -- it might increase it if it was the weakest, otherwise it
would leave it intact.  And yet, a strong and sound tradition in
security is to require MULTIPLE components to be all satisfied e.g. for
access to secret information: e.g. the one wanting access must prove
their identity (say by retinal scan), possess a physical token (say a
key) AND know a certain secret (say a password).  Do you really think
that, e.g., removing the need for the retinal scan would make the
system's security *STRONGER*...?  It would clearly weaken it, as a
would-be breaker would now need only to purloin the key and trick the
secret password out of the individual knowing it, without the further
problem of falsifying a retinal scan successfully.  Again, such security
systems exist and are traditional exactly because they're STRONGER than
their weakest component!


So, the implication accompanying your assertion, that strenghtening a
component that's not the weakest one is useless, is also false.  It may
indeed have extremely low returns on investment, depending on system's
structure and exact circumstances, but then again, it may not; nothing
can be inferred about this ROI issue from the consideration in question.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list index()

2007-09-01 Thread Alex Martelli
Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:

> On Sat, 01 Sep 2007 13:44:28 -0600, Michael L Torrie wrote:
> 
> > Alex Martelli wrote:
> > 
> >> is the "one obvious way to do it" (the set(...) is just a simple and
> >> powerful optimization -- checking membership in a set is roughly O(1),
> >> while checking membership in a list of N items is O(N)...).
> > 
> > Depending on a how a set is stored, I'd estimate any membership check in
> > a set to be O(log N).
> 
> Sets are stored as hash tables so membership check is O(1) just like Alex
> said.

"Roughly" O(1), as I said, because of the usual issues with cost of
hashing, potential hashing conflicts, re-hashing (which requires
thinking in terms of *amortized* big-O, just like, say, list appends!),
etc, just like for any hash table implementation (though Python's, long
used and finely tuned in dicts then adopted for sets, is an exceedingly
good implementation, it IS possible to artificially construct a "worst
case" -- e.g., set(23+sys.maxint*i*2+i for i in xrange(24,199))...)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list index()

2007-08-31 Thread Alex Martelli
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
   ...
> Why wouldn't "the one obvious way" be:
> 
>  def inAnotB(A, B):
>  inA  = set(os.listdir(A))
>  inBs = set(os.listdir(B))
>  return inA.difference(inBs)

If you want a set as the result, that's one possibility (although
possibly a bit wasteful as you're building one more set than necessary);
I read the original request as implying a sorted list result is wanted,
just like os.listdir returns (possibly sorted in case-independent order
depending on the underlying filesystem).  There's no real added value in
destroying inA's ordering by making it a set, when the list
comprehension just "naturally keeps" its ordering.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: status of Programming by Contract (PEP 316)?

2007-08-31 Thread Alex Martelli
Michele Simionato <[EMAIL PROTECTED]> wrote:
   ...
> I would not call that an attack. If you want to see an attack, wait
> for
> Alex replying to you observations about the low quality of code at
> Google! ;)

I'm not going to deny that Google Groups has glitches, particularly in
its user interface (that's why I'm using MacSOUP instead, even though
Groups, were it perfect, would offer me a lot of convenience).

We have a LOT of products (see
, plus a few more at
;
 for an overview,
 for a list of more
lists...), arguably too many in the light of the "It's best to do one
thing really, really well" ``thing we've found to be true''; given the
70-20-10 rule we use (we spend 70% of our resources on search and ads
[and of course infrastructure supporting those;-)], 20% on "adjacent
businesses" such as News, Desktop and Maps, 10% on all the rest
combined), products in the "other" (10%) category may simply not receive
sufficient time, resources and attention.

We've recently officially raised "Apps" to the status of a third pillar
for Google (after Search and Ads), but I don't know which of our many
products are officially within these pillar-level "Apps" -- maybe a good
starting hint is what's currently included in the Premier Edition of
Google Apps, i.e.: Gmail (with 99.9% uptime guarantee), Google Talk,
Google Calendar, Docs & Spreadsheets, Page Creator and Start Page.

I do notice that Google Groups is currently not in that "elite" (but
then, neither are other products we also offer in for-pay editions, such
as Google Earth and Sketchup) but I have no "insider information" as to
what this means or portends for the future (of course not: if I _did_
have insider information, I could not talk about the subject!-).

Notice, however, that none of these points depend on use of Python vs
(or side by side with) other programming languages, DbC vs (or side by
side with) other methodologies, and other such technical and
technological issues: rather, these are strategical problems in the
optimal allocation of resources that (no matter how abundant they may
look on the outside) are always "scarce" compared to the bazillion ways
in which they _could_ be employed -- engineers' time and attention,
machines and networking infrastructure, and so forth.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: status of Programming by Contract (PEP 316)?

2007-08-31 Thread Alex Martelli
Paul Rubin  wrote:
   ...
> Hi Alex, I'm a little confused: does Production Systems mean stuff
> like the Google search engine, which (as you described further up in
> your message) achieves its reliability at least partly by massive
> redundancy and failover when something breaks?

The infrastructure supporting that engine (and other things), yes.

>  In that case why is it
> so important that the software be highly reliable?  Is a software

Think "common-mode failures": if a program has a bug, so do all
identical copies of that program.  Redundancy works for cheap hardware
because the latter's "unreliability" is essentially free of common-mode
failures (when properly deployed): it wouldn't work against a design
mistake in the hardware units.  Think of the famous Pentium division
bug: no matter how many redundant but identical such buggy CPUs you
place in parallel to compute divisions, in the error cases they'll all
produce the same wrong results.  Software bugs generally work (or,
rather, fail to work;-) similarly to hardware design bugs.

There are (for both hw and sw) also classes of mistakes that don't quite
behave that way -- "occasional glitches" that are not necessarily
repeatable and are heavily state-dependent ("race conditions" in buggy
multitasking SW, for example; and many more examples for HW, where flaky
behavior may be triggered by, say, temperature situations).  Here, from
a systems viewpoint, you might have a system that _usually_ says that
10/2 is 5, but once in a while says it's 4 instead (as opposed to the
"Pentium division bug" case where it would always say 4) -- this is much
more likely to be caused by flaky HW, but might possibly be caused by
the SW running on it (or the microcode in between -- not that it makes
much of a difference one way or another from a systems viewpoint).

Catching such issues can, again, benefit from redundancy (and
monitoring, "watchdog" systems, health and sanity checks running in the
background, &c).  "Quis custodiet custodes" is an interesting problem
here, since bugs or flakiness in the monitoring/watchdog infrastructure
have the potential to do substantial global harm; one approach is to
invest in giving that infrastructure an order of magnitude more
reliability than the systems it's overseeing (for example by using more
massive and *simple* redundancy, and extremely straightforward
architectures).  There's ample literature in the matter, but it
absolutely needs a *systems* approach: focusing just on the HW, just on
the SW, or just on the microcode in-between;-), just can't help much.

> some good hits they should display) but the server is never actually
> down, can you still claim 100% uptime?

I've claimed nothing (since all such measurements and methodologies
would no doubt be considered confidential unless and until cleared for
publication -- this has been done for a few whitepapers about some
aspects of Google's systems, but never to the best of my knowledge for
the "metasystem" as a whole), but rather pointed to
,
a publically available site which does publish its methodology (at
); summarizing, as they
have no way to check that the results are "right" for the many sites
they keep an eye on, they rely on the HTTP result codes (as well as
validity of HTTP headers returned, and of course whether the site does
return a response at all).

> problem.  Of course then there's a second level system to manage the
> restarts that has to be very reliable, but it doesn't have to deal
> with much weird concocted input the way that a public-facing internet
> application has to.

Indeed, Production Systems' software does *not* "have to deal" with
input from the general public -- it's infrastructure, not user-facing
applications (except in as much as the "users" are Google engineers or
operators, say).  IOW, it's *exactly* the code that "has to be very
reliable" (nice to see that we agree on this;-), and therefore, if as
you then said "Russ's point stands", would NOT be in Python -- but it
is. So, I disagree about the "standing" status of his so-called "point".
 
> Therefore I think Russ's point stands, that we're talking about a
> different sort of reliability in these highly redundant systems, than
> in the systems Russ is describing.

Russ specifically mentioned *mission-critical applications* as being
outside of Python's possibilities; yet search IS mission critical to
Google.  Yes, reliability is obtained via a "systems approach",
considering HW, microcode, SW, and other issues yet such as power
supplies, cooling units, network cables, etc, not as a single opaque big
box but as an articulated, extremely complex and large system that needs
testing, monitoring, watchdogging, etc, at many levels -- there is no
other real way to make systems reliable (you can't do it by just looking
at components in isolation).  Note t

Re: list index()

2007-08-30 Thread Alex Martelli
Ricardo Aráoz <[EMAIL PROTECTED]> wrote:
   ...
> Alex Martelli wrote:
> > <[EMAIL PROTECTED]> wrote:
> >...
> >> In my case of have done os.listdir() on two directories. I want to see
> >> what files are in directory A that are not in directory B.
> > 
> > So why would you care about WHERE, in the listdir of B, are to be found
> > the files that are in A but not B?!  You should call .index only if you
> > CARE about the position.
> > 
> > def inAnotB(A, B):
> > inA = os.listdir(A)
> > inBs = set(os.listdir(B))
> > return [f for f in inA if f not in inBs]
> > 
> > is the "one obvious way to do it" (the set(...) is just a simple and
> > powerful optimization -- checking membership in a set is roughly O(1),
> > while checking membership in a list of N items is O(N)...).
> 
> And what is the order of passing a list into a set? O(N)+?

Roughly O(N), yes (with the usual caveats about hashing costs, &c;-).
So, when A has M files and B has N, your total costs are roughly O(M+N)
instead of O(M*N) -- a really juicy improvement for large M and N!


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list index()

2007-08-30 Thread Alex Martelli
<[EMAIL PROTECTED]> wrote:
   ...
> In my case of have done os.listdir() on two directories. I want to see
> what files are in directory A that are not in directory B.

So why would you care about WHERE, in the listdir of B, are to be found
the files that are in A but not B?!  You should call .index only if you
CARE about the position.

def inAnotB(A, B):
inA = os.listdir(A)
inBs = set(os.listdir(B))
return [f for f in inA if f not in inBs]

is the "one obvious way to do it" (the set(...) is just a simple and
powerful optimization -- checking membership in a set is roughly O(1),
while checking membership in a list of N items is O(N)...).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: status of Programming by Contract (PEP 316)?

2007-08-30 Thread Alex Martelli
Russ <[EMAIL PROTECTED]> wrote:
   ...
> programs." Any idea how much Python is used for flight control systems
> in commercial
> transport aircraft or jet fighters?

Are there differences in reliability requirements between the parts of
such control systems that run on aircraft themselves, and those that run
in airports' control towers?  Because Python *IS* used in the latter
case, cfr  ... if
on-plane control SW requires hard-real-time response, that might be a
more credible reason why Python would be inappropriate (any garbage
collected language is NOT a candidate for hard-real-time SW!) than your
implied aspersions against Python's reliability.

According to
,
Google's current uptime is around 99.99%, with many months at 100% and a
few at 99.98% -- and that's on *cheap*, not-that-reliable commodity HW,
and in real-world conditions where power can go away, network cables can
accidentally get cut, etc.  I'm Uber Tech Lead for Production Systems at
Google -- i.e., the groups I uber-lead are responsible for some software
which (partly by automating things as much as possible) empowers our
wondrous Site Reliability Engineers and network specialists to achieve
that uptime in face of all the Bad Stuff the world can and does throw at
us.  Guess what programming language I'm a well-known expert of...?


> The important question is this: why do I waste my time with bozos like
> you?

Yeah, good question indeed, and I'm asking myself that -- somebody who
posts to this group in order to attack the reliability of the language
the group is about (and appears to be supremely ignorant about its use
in air-traffic control and for high-reliability mission-critical
applications such as Google's "Production Systems" software) might well
be considered not worth responding to.  OTOH, you _did_ irritate me
enough that I feel happier for venting in response;-)

Oh, FYI -- among the many tasks I undertook in my quarter-century long
career was leading and coordinating pilot projects in Eiffel for one
employer, many years ago.  The result of the pilot was that Eiffel and
its DbC features didn't really let us save any of the extensive testing
we performed for C++-coded components, and the overall reliability of
such extensively tested components was not different in a statistically
significant way whether they were coded in C++ or Eiffel; Eiffel did let
us catch a few bugs marginally earlier (but then, I'm now convinced
that, at that distant time, we used by far too few unit-tests for early
bug catching and relied too much on regression and acceptance tests),
but that definitely was _not_ enough to pay for itself.  DbC and
allegedly rigorous compile-time typechecking (regularly too weak due to
Eiffel's covariant vs countervariant approach, btw...), based on those
empirical results, appear to be way overhyped.

A small decorator library supporting DbC would probably be a nice
addition to Python, but it should first prove itself in the field by
being released and supported as an add-on and gaining wide acceptance:
"arguments" such as yours are definitely NOT going to change that.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What's the difference ?

2007-08-29 Thread Alex Martelli
<[EMAIL PROTECTED]> wrote:
   ...
> Weird. Hetland's book, "Beginning Python" states that it's a matter of
> taste.

If your taste is for more verbose AND slower notation without any
compensating advantage, sure.

> Martelli's "Python Cookbook 2nd Ed." says to use the get()
> method instead as you never know if a key is in the dict.  However, I
> can't seem to find any reference to has_key in his book.

.get is not a direct alternative to ``in'' (it's an alternative to an
idiom where you key into the dict if the key is present and otherwise
supply a default value, and it's MUCH better in that case).  has_key is
probably not even mentioned in the Cookbook (2nd edition) since there is
never a good need for it in the Python versions it covers (2.2 and up),
but you can probably find traces in the 1st edition (which also covered
Python back to 1.5.2, where has_key *was* needed); the Nutshell (2nd ed)
mentions it briefly in a table on p. 60.


> According to Chun in "Core Python Programming", has_key will be
> obsoleted in future versions of Python, so he recommends using "in" or
> "not in".

Yes, we're removing has_key in Python 3.0 (whose first alpha will be out
reasonably soon, but is not going to be ready for production use for
quite a bit longer), among other redundant things that exist in 2.* only
for legacy and backwards compatibility reasons.  This makes 3.0 simpler
(a little closer to the "only one obvious way" ideal).

But you should use ``in'' and ``not in'' anyway, even if you don't care
about 3.* at all, because they only have advantages wrt has_key, without
any compensating disadvantage.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What's the difference ?

2007-08-29 Thread Alex Martelli
Alex <[EMAIL PROTECTED]> wrote:

> Hye,
> 
> I was just wondering what is the difference between
> 
> >> if my_key in mydict:
> >> ...
> 
> and
> 
> >> if mydict.has_keys(my_key):

Mis-spelled (no final s in the method name).

> >> ...
> 
> I've search a bit in the python documentation, and the only things I
> found was that they are "equivalent".

Semantically they are, but `in' is faster, more concise, & readable.


> But in this (quiet old) sample ( "http://aspn.activestate.com/ASPN/
> Cookbook/Python/Recipe/59875" ), there is difference between the two
> notation.

What that example is pointing to as "wrong way" is a NON-equivalent
approach that's extremely slow:
  if my_key in mydict.keys():

The call to keys() takes time and memory to build a list of all keys,
after which the ``in'' operator, having a list as the RHS operand, is
also quite slow (O(N), vs O(1)!).  So, never use that useless and silly
call to keys() in this context!


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to replace a method in an instance.

2007-08-27 Thread Alex Martelli
Bruno Desthuilliers <[EMAIL PROTECTED]>
wrote:

> >>> Of course, a function in a
> >>> class is also know as a method.
> >> Less obvious but still wrong !-)
> > 
> > I wish the authors of the Python books would get a clue then.
> 
> I'd think that at least some authors of some Python books would explain
> all this much better than I did. But FWIW, all these rules are clearly
> documented in the Fine Manual.

Speaking as one such author, I think I do a reasonable job of this in
"Python in a Nutshell" (2nd ed): on p. 82 and 85 I have brief mentions
that "class attributes bound to functions are also known as methods of
the class" (p.82) and again that "functions (called methods in this
context) are important attributes for most class objects" (p.85); on
p.91-94, after explaining descriptors, instances, and the basics of
attribute reference, I can finally cover the subject thoroughly in
"Bound and Unbound Methods".  I realize that a beginner might be
confused into believing that "class attributes bound to functions" means
"function in a class", if they stop reading before p.91;-), but I don't
literally make that wrong assertion...;-)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to free memory ( ie garbage collect) at run time with Python 2.5.1(windows)

2007-08-27 Thread Alex Martelli
rfv-370 <[EMAIL PROTECTED]> wrote:

> have made the following small test:
> 
> Before starting my test my UsedPhysicalMemory(PF): 555Mb
> 
> >>>tf=range(0,1000)PF: 710Mb ( so 155Mb for my List)
> >>>tf=[0,1,2,3,4,5] PF: 672Mb (Why? Why the remaining 117Mb is
> >>>not freed?) del tfPF: 672Mb (unused memory
> >>>not freed)

Integer objects that are once generated are kept around in a "free list"
against the probability that they might be needed again in the future (a
few other types of objects similarly keep a per-type free-list, but I
think int is the only one that keeps an unbounded amount of memory
there).  Like any other kind of "cache", this free-list (in normal
cases) hoards a bit more memory than needed, but results in better
runtime performance; anomalous cases like your example can however
easily "bust" this too-simple heuristic.

> So how can I force Python to clean the memory and free the memory that
> is not used?

On Windows, with Python 2.5, I don't know of a good approach (on Linux
and other Unix-like systems I've used a strategy based on forking, doing
the bit that needs a bazillion ints in the child process, ending the
child process; but that wouldn't work on Win -- no fork).

I suggest you enter a feature request to let gc grow a way to ask every
type object to prune its cache, on explicit request from the Python
program; this will not solve the problem in Python 2.5, but work on 3.0
is underway and this is just the right time for such requests.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: SCF released GPL

2007-08-26 Thread Alex Martelli
hg <[EMAIL PROTECTED]> wrote:
   ...
> I am looking for a free subversion server resource to put the code ...
> if you know of any.

Check out code.google.com -- it has a hosting service for open source
code, too, these days (and it IS subversion).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: beginner, idiomatic python

2007-08-26 Thread Alex Martelli
bambam <[EMAIL PROTECTED]> wrote:
   ...
> Bags don't seem to be built in to my copy of Python, and

A "bag" is a collections.defaultdict(int) [[you do have to import
collections -- it's in the standard library, NOT built-in]].


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: beginner, idiomatic python

2007-08-26 Thread Alex Martelli
bambam <[EMAIL PROTECTED]> wrote:

> Is it safe to write
> 
> A = [x for x in A if x in U]
> 
> or is that undefined? I understand that the slice operation

It's perfectly safe and well-defined, as the assignment rebinds the LHS
name only AFTER the RHS list comprehension is done.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need a better understanding on how MRO works?

2007-08-26 Thread Alex Martelli
Steven W. Orr <[EMAIL PROTECTED]> wrote:
   ...
> =>accepts whatever dictionary you give it (so you can, though shouldn't,
> =>do strange things such as pass globals()...:-).
> 
> In fact, I wanted to make a common routine that could be called from 
> multiple modules. I have classes that need to be created from those 
> multiple modules. I did run into trouble when I created a common routine
> even though I passed globals() as one of the args. The """though 
> shouldn't""" is prompting me to ask why, and where I might be able to read
> more.

The dictionary you pass to new.classobj should be specifically
constructed for the purpose -- globals() will contains all sort of odds
and ends that have nothing much to do with the case.

You appear to be trying to embody lot of black magic in your "common
routine", making it communicate with its callers by covert channels; the
way you use globals() to give that routine subtle "side effects" (making
the routine stick entries there) as well as pass it an opaque,
amorphous, unknown blobs of input information, strongly suggests that
the magic is running away with you (a good general reference about that
is ).

"Explicit is better than implicit", "simple is better than complex",
etc, can be read by typing ``import this'' at an interactive Python
prompt.

The best book I know about the do's and don't's of large-scale software
architecture is Lakos' "Large-Scale C++ Software Design",
 -- very C++ specific, but even though some of the issues only apply
to C++ itself, many of its crucial lessons will help with large scale SW
architecture in just about any language, Python included.

What I had to say about the lures and pitfalls of black magic in Python
specifically is spread through the Python Cookbook 2nd edition (and, to
a lesser extent, Python in a Nutshell).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does shuffle() produce uniform result ?

2007-08-25 Thread Alex Martelli
tooru honda <[EMAIL PROTECTED]> wrote:
   ...
> def rand2():
> while True:
> randata = urandom(2*1024)
> for i in xrange(0, 2*1024, 2):
> yield int(hexlify(randata[i:i+2]),16)# integer
> in [0,65535]

another equivalent possibility, which might probably be faster:

import array
   ...
def rand2():
while True:
x = array.array("H")
x.fromstring(urandom(2*4000))
for i in x: yield i


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need a better understanding on how MRO works?

2007-08-25 Thread Alex Martelli
Steven W. Orr <[EMAIL PROTECTED]> wrote:
   ...
> Thanks Alex. I am humbled, though I was before I started.
> I really don't have a lot of understanding of what you're saying so I'll
> probably have to study this for about a year or so.
> 
> * (I need to look up what dictproxy is.) I don't have any idea what the
> ramifications are of your use of the word DISTINCT. Are you somehow 
> suggesting that new.classobj does a deep copy of the globals copy that's
> passed to it?

No, most definitely NOT deep!!!, but type.__new__ does "a little" of
what you've said (a shallow copy, which is not quite "a copy" because it
embeds [some of] the entries in slots).  new.classobj determines the
metaclass (from the bases, or a __metaclass__ entry in the dictionary)
and calls it to generate the new class.  For modern style classes, the
class is type; for old-style legacy classes, it's types.ClassType, and
they're not exactly identical in behavior (of course not, or there would
no point in having both:-).

> 
> * Also, I'd like to understand what the difference is between 
> nclass = new.classobj(name,(D1,),globals())
> vs. 
> def classfactory():
>   class somename(object):
>   def somestuff():
>   pass
> return somename
> G1 = classfactory()
> globals()[name] = G1
> 
> Does new.classobj do anything special?

No, new.classobj does essentially the same thing that Python does after
evaluating a class statement to prepare the class's name, bases and
dictionary: finds the metaclass and calls it with these arguments.

A key difference of course is that a class statement prepares the class
dictionary as a new, ordinary, distinct dictionary, while new.classobj
accepts whatever dictionary you give it (so you can, though shouldn't,
do strange things such as pass globals()...:-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need a better understanding on how MRO works?

2007-08-25 Thread Alex Martelli
Steven W. Orr <[EMAIL PROTECTED]> wrote:
   ...
>  name = 'C1'
>  nclass = new.classobj(name,(D1,),globals())
>  globals()[name] = nclass

Here, you're creating a VERY anomalous class C1 whose __dict__ is
globals(), i.e. the dict of this module object;

>  name = 'C2'
>  nclass = new.classobj(name,(D1,),globals())
>  globals()[name] = nclass

and here you're creating another class with the SAME __dict__;

>  globals()[name].m1 = m1replace

So of course this assignment affects the 'm1' entries in the dict of
both classes, since they have the SAME dict object (a la Borg) -- that
is, IF they're old-style classes (i.e. if D1 is old-style), since in
that case a class's __dict__ is in fact a dict object, plain and simple.

However, if D1 is new-style, then C1.__dict__ and C2.__dict__ are in
fact instances of  -- each with a copy of the entries that
were in globals() when you called new.classobj, but DISTINCT from each
other and from globals(), so that further changes in one (or globals)
don't affect globals (nor the other).

I guess this might be a decent interview question if somebody claims to
be a "Python guru": if they can make head or tails out of this mess, boy
the *ARE* a Python guru indeed (in fact I'd accord minor guruhood even
to somebody who can get a glimmer of understanding of this with ten
minutes at a Python interactive prompt or the like, as opposed to
needing to understand it "on paper" without the ability to explore:-).

Among the several "don't"s to learn from this: don't use old-style
classes, don't try to make two classes share the same dictionary, and
don't ask about MRO in a question that has nothing to do with MRO
(though I admit that was a decent attempt at misdirection, it wouldn't
slow down even the minor-guru in any appreciable way:-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does shuffle() produce uniform result ?

2007-08-25 Thread Alex Martelli
tooru honda <[EMAIL PROTECTED]> wrote:

> At the end, I think it is worthwhile to implement my own shuffle and 
> random methods based on os.urandom.  Not only does the resulting code
> gets rid of the minuscule bias, but the program also runs much faster.
> 
> When using random.SystemRandom.shuffle, posix.open and posix.close from
> calling os.urandom account for almost half of the total execution time
> for my program.  By implementing my own random and getting a much larger
> chunk of random bytes from os.urandom each time, I am able to reduce the
> total execution time by half.

If I were in your shoes, I would optimize by subclassing
random.SystemRandom and overriding the random method to use os.urandom
with some large block size and then parcel it out, instead of the
_urandom(7) that it now uses.  E.g., something like:

class SystemBlockRandom(random.SystemRandom):

def __init__(self):
random.SystemRandom.__init__(self)
def rand7():
while True:
randata = os.urandom(7*1024)
for i in xrange(0, 7*1024, 7):
yield long(binascii.hexlify(randata[i:i+7]),16)
self.rand7 = rand7().next

def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (self.rand7() >> 3) * random.RECIP_BPF

(untested code).  No need to reimplement anything else, it seems to me.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: yet another indentation proposal

2007-08-20 Thread Alex Martelli
Michael Tobis <[EMAIL PROTECTED]> wrote:

> On Aug 19, 11:51 pm, James Stroud <[EMAIL PROTECTED]> wrote:
> 
> > What's wrong with just saying the current indent level? I'd much rather
> > hear "indent 4" than "tab tab tab tab".
> 
> Alternatively, you might also consider writing a simple pre and
> postprocessor so that you could read and write python the way you
> would prefer

As I pointed out in another post to this thread, that's essentially what
Tools/scripts/pindent.py (part of the Python source distribution) does.
Just use and/or adapt that...


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: yet another indentation proposal

2007-08-20 Thread Alex Martelli
Aaron <[EMAIL PROTECTED]> wrote:
   ...
> That's probably what I'll end up doing.  The only drawback to that is that
> it solves the problem for me only.  Perhaps I will open source the scripts
> and write up some documentation so that other folks in a similar situation
> don't have to reinvent the wheel.  

As I pointed out in another post to this thread,
Tools/scripts/pindent.py IS open-source, indeed it's part of the Python
source distribution.  Why not use and/or adapt that?

> The only unfortunate aspect to that is 
> that blind newbies to the language will have to figure out setting up a
> shell script or batch file to pipe the output of the filter into Python on
> top of learning the language.  I admit, it's probably not that much work,
> but it is one more stumblingblock that blind newcomers will have to 
> overcome.

pindent.py's approach ensures that the tool's output is also entirely
valid Python (as it only adds comments to mark and "explain" block
ends!) so no "piping the output into Python" is at all needed; you only
need (editor-dependent) to ensure pindent.py is run when you LOAD a
Python source file into your editor.  If anything, pindent.py and/or the
screen reader of choice might be tweaked to "read out" Python sources
more clearly (e.g. by recognizing block-end comments and reading them
differently than other comments are read).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: yet another indentation proposal

2007-08-20 Thread Alex Martelli
Jakub Stolarski <[EMAIL PROTECTED]> wrote:

> Why not just use comments and some filter. Just write # _{ at the
> beginning and # _} at the end. Then filter just before runing
> indenting with those control sequences? Then there's no need to change
> interpreter.

As I pointed out in another post to this thread, that's essentially what
Tools/scripts/pindent.py (part of the Python source distribution) does
(no need to comment the beginning of a block since it's always a colon
followed by newline; block-end comments in pindent.py are more
informative).  Just use and/or adapt that...


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parser Generator?

2007-08-19 Thread Alex Martelli
Jack <[EMAIL PROTECTED]> wrote:

> Thanks for the suggestion. I understand that more work is needed for natural
> language
> understanding. What I want to do is actually very simple - I pre-screen the
> user
> typed text. If it's a simple syntax my code understands, like, Weather in
> London, I'll
> redirect it to a weather site. Or, if it's "What is ... " I'll probably
> redirect it to wikipedia.
> Otherwise, I'll throw it to a search engine. So, extremelyl simple stuff ...



"""
NLTK — the Natural Language Toolkit — is a suite of open source Python
modules, data sets and tutorials supporting research and development in
natural language processing.
"""


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sorting a list of Unicode strings?

2007-08-19 Thread Alex Martelli
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
   ...
> > > Maybe I'm missing something fundamental here, but if I have a list of
> > > Unicode strings, and I want to sort these alphabetically, then it
> > > places those that begin with unicode characters at the bottom.
   ...
> Anyway, I know _why_ it does this, but I really do need it to sort
> them correctly based on how humans would look at it.

Depending on the nationality of those humans, you may need very
different sorting criteria; indeed, in some countries, different sorting
criteria apply to different use cases (such as sorting surnames versus
sorting book titles, etc; sorry, I don't recall specific examples, but
if you delve on sites about i18n issues you'll find some).

In both Swedish and Danish, I believe, A-with-ring sorts AFTER the
letter Z in the alphabet; so, having Åaland (where I'm using Aa for
A-with-ring, since this newsreader has some problem in letting me enter
non-ascii characters;-) sort "right at the bottom", while it "doesn't
look right" to YOU (maybe an English-speaker?) may look right to the
inhabitants of that locality (be they Danes or Swedes -- but I believe
Norwegian may also work similarly in terms of sorting).

The Unicode consortium does define a standard collation algorithm (UCA)
and table (DUCET) to use when you need a locale-independent ordering; at

you'll be able to obtain James Tauber's Python implementation of UCA, to
work with the DUCET found at
.

I suspect you won't like the collation order you obtain this way, but
you might start from there, subsetting and tweaking the DUCET into an
OUCET (Oliver Unicode Collation Element Table;-) that suits you better.

A simpler, rougher approach, if you think the "right" collation is
obtained by ignoring accents, diacritics, etc (even though the speakers
of many languages that include diacritics, &c, disagree;-) is to use the
key=coll argument in your sorting call, passing a function coll that
maps any Unicode string to what you _think_ it should be like for
sorting purposes.  The .translate method of Unicode string objects may
help there: it takes a dict mapping Unicode ordinals to ordinals or
string (or None for characters you want to delete as part of the
translation).

For example, suppose that what we want is the following somewhat silly
collation: we only care about ISO-8859-1 characters, and want to ignore
for sorting purposes any accent (be it grave, acute or circumflex),
umlauts, slashes through letters, tildes, cedillas.  htmlentitydefs has
a useful dict called codepoint2name that helps us identify those "weirdy
decorated foreign characters".

def make_transdict():
import htmlentitydefs
cp2n = htmlentitydefs.codepoint2name
suffixes = 'acute crave circ uml slash tilde cedil'.split()
td = {}
for x in range(128, 256):
if x not in cp2n: continue
n = cp2n[x]
for s in suffixes:
if n.endswith(s):
td[x] = unicode(n[-len(s)])
break
return td

def coll(us, td=make_transdict()):
return us.translate(td)

listofus.sort(key=coll)


I haven't tested this code, but it should be reasonably easy to fix any
problems it might have, as well as making make_transdict "richer" to
meet your goals.  Just be aware that the resulting collation (e.g.,
sorting a-ring just as if it was a plain a) will be ABSOLUTELY WEIRD to
anybody who knows something about Scandinavian languages...!!!-)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: yet another indentation proposal

2007-08-19 Thread Alex Martelli
Paddy <[EMAIL PROTECTED]> wrote:
   ...
> Can screen reaaderss be customized?

Open-source ones surely can (e.g., NVDA is an open-source reader for
Windows written in Python,  -- alas, if
you search for NVDA Google appears to be totally convinced you mean
NVidia instead, making searches pretty useless, sigh).

> Maybe their is a way to get the screen reader to say indent and dedent
> at thee appropriate places?

There definitely should be.

> Or maybe a filter to put those wordds into the source?

.../Tools/scripts/pindent.py (comes with the Python source distribution,
and I hope that, like the whole Tools directory, it would also come with
any sensible packaged Python distribution) should already be sufficient
for this particular task.  The "indent" always happens (in correct
Python sources) on the next line after one ending with a colon;
pindent.py can add or remove "block-closing comments" at each dedent
(e.g., "# end for" if the dedent is terminating a for-statement), and
can adjust the indentation to make it correct if given a Python source
with such block-closing comments but messed-up indentation.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: clarification

2007-08-19 Thread Alex Martelli
samwyse <[EMAIL PROTECTED]> wrote:
   ...
> > brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
> > range(0, 100, 3)]' 'r=set()' 'for x in sos: r.update(x)'
> > 10 loops, best of 3: 18.8 usec per loop
> > 
> > brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
> > range(0, 100, 3)]' 'r=reduce(set.union, sos, set())'
> > 1 loops, best of 3: 87.2 usec per loop
> > 
> > Even in a case as tiny as this one, "reduce" is taking over 4 times
> > longer than the loop with the in-place mutator -- and it only gets
> > worse, as we're talking about O(N squared) vs O(N) performance!  Indeed,
> > this is part of what makes reduce an "attractive nuisance"...;-).  [[And

The set-union case, just like the list-catenation case, is O(N squared)
(when approached in a functional way) because the intermediate result
often _grows_ [whenever a new set or list operand adds items], and thus
a new temporary value must be allocated, and the K results-so-far
"copied over" (as part of constructing the new temp value) from the
previous temporary value; and sum(range(N)) grows quadratically in N.
The in-place approach avoids that fate by a strategy of proportional
over-allocation (used by both set and lists) that makes in-place
operations such as .update(X) and .extend(X) amortized O(K) where K is
len(X).

In the set-intersection case, the intermediate result _shrinks_ rather
than growing, so the amount of data "copied over" is a diminishing
quantity at each step, and so the analysis showing quadratic behavior
for the functional approach does not hold; behavior may be roughly
linear, influenced in minor ways by accidental regularities in the sets'
structure and order (especially likely for short sequences of small
sets, as in your case).  Using a slightly longer sequence of slightly
larger sets, with little structure to each, e.g.:

random.seed(12345)  # set seed to ensure total repeatability
los=[set(random.sample(range(1000), 990)) for x in range(200)]

at the end of the setup (the intersection of these 200 sets happens to
contain 132 items), I measure (as usual on my 18-months-old Macbook Pro
laptop):

stmt = 'reduce(set.intersection,los)'
best of 3: 1.66e+04 usec per loop
stmt = 'intersect_all(los)'
best of 3: 1.66e+04 usec per loop

and occasionally 1.65 or 1.67 instead of 1.66 for either or both,
whether with 10,000 or 100,000 loops.  (Not sure whether your
observations about the reduce-based approach becoming faster with more
loops may be due to anomalies in Windows' scheduler, or the accidental
regularities mentioned above; my timings are probably more regular since
I have two cores, one of which probably ends up dedicated to whatever
task I'm benchmarking while the other one runs all "background" stuff).

> turn indicates that both implementations actually work about same and
> your "O(n squared)" argument is irrelevant.

It's indeed irrelevant when the behavior _isn't_ quadratic (as in the
case of intersections) -- but unfortunately it _is_ needlessly quadratic
in most interesting cases involving containers (catenation of sequences,
union of sets, merging of dictionaries, merging of priority-queues,
...), because in those cases the intermediate temporary values tend to
grow, as I tried to explain in more detail above.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to call module functions inside class instance functions?

2007-08-18 Thread Alex Martelli
beginner <[EMAIL PROTECTED]> wrote:
   ...
> testmodule.py
> -
> """Test Module"""
> 
> def __module_level_func():
> print "Hello"
> 
> class TestClass:
> def class_level_func(self):
> __module_level_func()
> 
> 
> main.py
> --
> import testmodule
> 
> x=testmodule.TestClass()
> x.class_level_func()
> 
> 
> The error message I am encountering is: NameError: global name
> '_TestClass__module_level_func' is not defined
> 
> I think it has something to do with the two underscores for
> __module_level_func. Maybe it has something to do with the python
> implementation of the private class level functions.
> 
> By the way, the reason I am naming it __module_level_func() is because
> I'd like __module_level_func() to be private to the module, like the C
> static function. If the interpreter cannot really enforce it, at least
> it is some sort of naming convention for me.

The two underscores are exactly the cause of your problem: as you see in
the error message, the compiled has inserted the CLASS name (not MODULE
name) implicitly there.  This "name mangling" is part of Python's rules.

Use a SINGLE leading underscore (NOT double ones) as the "sort of naming
convention" to indicate privacy, and Python will support you (mostly by
social convention, but a little bit technically, too); use a different
convention (particularly one that fights against the language rules;-)
and you're "fighting city hall" to no good purpose and without much hope
of achieving anything whatsoever thereby.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Understanding closures

2007-08-18 Thread Alex Martelli
Ramashish Baranwal <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> I want to use variables passed to a function in an inner defined
> function. Something like-
> 
> def fun1(method=None):
> def fun2():
> if not method: method = 'GET'
> print '%s: this is fun2' % method
> return
> fun2()
> 
> fun1()
> 
> However I get this error-
> UnboundLocalError: local variable 'method' referenced before
> assignment
> 
> This however works fine.
> 
> def fun1(method=None):
> if not method: method = 'GET'
> def fun2():
> print '%s: this is fun2' % method
> return
> fun2()
> 
> fun1()
> 
> Is there a simple way I can pass on the variables passed to the outer
> function to the inner one without having to use or refer them in the
> outer function?

Sure, just don't ASSIGN TO those names in the inner function.  Any name
ASSIGNED TO in a given function is local to that specific function (save
for global statements, which bypass variable of containing functions
anyway).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: clarification

2007-08-18 Thread Alex Martelli
samwyse <[EMAIL PROTECTED]> wrote:
   ...
> Finally, does anyone familar with P3K know how best to do the reduction
> without using 'reduce'?  Right now, sets don't support the 'add' and 
> 'multiply' operators, so 'sum' and (the currently ficticious) 'product'
> won't work at all; while 'any' and 'all' don't work as one might hope.
> Are there an 'intersection' and 'union' built-ins anywhere?

For intersection and union of a sequence of sets, I'd use:

def union_all(sos):
result = set()
for s in sos: result.update(s)
return result

def intersect_all(sos):
it = iter(sos)
result = set(it.next())
for s in it: result.intersection_update(s)
return result

The latter will raise an exception if sos is empty -- I don't think the
"intersection of no sets at all" has a single natural interpretation
(while the "union of no sets at all" appears to be naturally interpreted
as an empty set)... if you disagree, just wrap a try/except around the
initialization of result, and return whatever in the except clause.

Of course, hoisting the unbound method out of the loops can afford the
usual small optimization.  But my point is that, in Python, these
operations (like, say, the concatenation of a sequence of lists, etc)
are best performed "in place" via loops calling mutator methods such as
update and intersection_update (or a list's extend, etc), rather than
"functionally" (building and tossing away many intermediate results).
E.g., consider:

brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
range(0, 100, 3)]' 'r=set()' 'for x in sos: r.update(x)'
10 loops, best of 3: 18.8 usec per loop

brain:~ alex$ python -mtimeit -s'sos=[set(range(x,x+4)) for x in
range(0, 100, 3)]' 'r=reduce(set.union, sos, set())'
1 loops, best of 3: 87.2 usec per loop

Even in a case as tiny as this one, "reduce" is taking over 4 times
longer than the loop with the in-place mutator -- and it only gets
worse, as we're talking about O(N squared) vs O(N) performance!  Indeed,
this is part of what makes reduce an "attractive nuisance"...;-).  [[And
so is sum, if used OTHERWISE than for the documented purpose, computing
"the sum of a sequence of numbers": a loop with r.extend is similarly
faster, to concatenate a sequence of lists, when compared to sum(sol,
[])...!!!]]


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can python threads take advantage of use dual core ?

2007-08-17 Thread Alex Martelli
Stefan Behnel <[EMAIL PROTECTED]> wrote:
   ...
> Which virtually all computation-intensive extensions do. Also, note the

gmpy doesn't (release the GIL), even though it IS computationally
intensive -- I tried, but it slows things down horribly even on an Intel
Core Duo.  I suspect that may partly be due to the locking strategy of
the underlying GMP 4.2 library (which I haven't analyzed in depth).  In
practice, when I want to exploit both cores to the hilt with gmpy-based
computations, I run multiple processes.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: using super() to call two parent classes __init__() method

2007-08-16 Thread Alex Martelli
7stud <[EMAIL PROTECTED]> wrote:

> When I run the following code and call super() in the Base class's
> __init__ () method,  only one Parent's __init__() method is called.
> 
> 
> class Parent1(object):
> def __init__(self):
> print "Parent1 init called."
> self.x = 10
> 
> class Parent2(object):
> def __init__(self):
> print "Parent2 init called."
> self.y = 15
> 
> class Base(Parent1, Parent2):
> def __init__(self):
> super(Base, self).__init__()
> self.z = 20
> 
> b = Base()
> 
> --output:--
> Parent1 init called.

Yep -- Parent1.__init__ doesn't call its own super's __init__, so it
doesn't participate in cooperative superclass delegation and "the buck
stops there".


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Opinions about this new Python book?

2007-08-15 Thread Alex Martelli
Neil Cerutti <[EMAIL PROTECTED]> wrote:

> On 2007-08-15, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > For some reason, the author makes the claim that the term
> > "Predicate" is "bandied about quite a bit in the literature" of
> > Python. I have 17 or so Python books and I don't think I've
> > ever seen this used in conjunction with Python...or in any of
> > the docs I've skimmed. What the!?
> 
> The document searching facility reveals that the term is bandied
> about in five places in the standard documentation. These uses
> seem approriate and uncontroversial to me.
> 
> These document functions accepting predicates as aruments:
> 
> 6.5.1 Itertools functions
> 6.5.3 Recipes
> 11.47 Creating a new Distutils command
> 26.10.1 Types and members
> 
> The following provides a few predicate functions (weird! I'd have
> never thought to look there for, e.g., ismodule):
> 
> 6.7 operator -- Standard operators as functions

Module inspect also provides useful predicates (though I don't remember
if its docs CALL them predicates;-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: closing StringIO objects

2007-08-15 Thread Alex Martelli
Neil Cerutti <[EMAIL PROTECTED]> wrote:

> The documentation says the following about StringIO.close:
> 
>   close( ) 
>   Free the memory buffer. 
> 
> Or else... what? 

Or else the memory buffer sticks around, so you can keep calling
getvalue as needed.  I believe the freeing will happen anyway,
eventually, if and when the StringIO instance is garbage collected (just
like, say, a file object's underlying fd gets closed when the file
object is garbage collected), but relying on such behavior is often
considered a dubious practice nowadays (given the existence of many
Python implementations whose GC strategies differ).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help with optimisation

2007-08-13 Thread Alex Martelli
special_dragonfly <[EMAIL PROTECTED]> wrote:
   ...
> dom=xml.dom.minidom.parseString(text_buffer)

If you need to optimize code that parses XML, use ElementTree (some
other parsers are also fast, but minidom ISN'T).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Something in the function tutorial confused me.

2007-08-13 Thread Alex Martelli
Neil Cerutti <[EMAIL PROTECTED]> wrote:
   ...
> > Then we get into unpacking assignments and augmented
> > assignments, but I don't really want to write two more pages
> > worth of summary...;-).
> 
> Thanks very much for taking the time to help clear up my
> erroneous model of assignment in Python. I'd taken a conceptual
> shortcut that's not justified.

You're very welcome, it's always a pleasure to help!


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: LRU cache?

2007-08-11 Thread Alex Martelli
Paul Rubin  wrote:

> Anyone got a favorite LRU cache implementation?  I see a few in google
> but none look all that good.  I just want a dictionary indexed by
> strings, that remembers the last few thousand entries I put in it.

So what's wrong with Evan Prodromou's lrucache.py module that's in pypi?
Haven't used it, but can't see anything wrong at a glance.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Something in the function tutorial confused me.

2007-08-11 Thread Alex Martelli
Neil Cerutti <[EMAIL PROTECTED]> wrote:
   ...
> OK, I've thought about this some more and I think the source of
> my confusion was I thought assignment in Python meant binding a
> name to something, not mutating an object. But in the case of
> augmented assignment, assignment no longer means that?

"Plain" assignment *to a plain name* does mean "binding a name" (the
LHS) to "something" (the RHS).

Other assignments (ones that are not "plain" assignments to names) may
have different meanings.  For example:

>>> class act(object):
...   def __init__(self, c): self._c = c
...   def getC(self): return self._c
...   def setC(self, *ignore): self._c += 1
...   c = property(getC, setC)
...   
>>> x = act(0)
>>> x.c
0
>>> x.c = 23
>>> x.c
1
>>> 

Here's an example where a plain assignment (to an attribute of x, not to
a plain name) obviously DOESN'T mean "binding a name to something": the
"something" (the RHS) is completely ignored, so the plain assignment is
mutating an object (x) and not binding any name to anything.

Plain assignments to items and slices can also often be best seen as
"mutating an object" (the one being indexed or sliced on the LHS) rather
than "binding a name".  For example:

>>> l=list('ciao')
>>> l[1:3]='app'
>>> l
['c', 'a', 'p', 'p', 'o']

If I was teaching Python and came upon this example, I would definitely
not try to weaselword the explanation of what's going on in terms of
"binding a name" (or several ``names'', including ``rebinding" a new
``name'' l[4] to the 'o' that was previously ``bound'' to l[3], etc:-):
it's just orders of magnitudes simpler to explain this as "mutating an
object", namely the list l.

I take almost 3 pages in "Python in a Nutshell" (47 to 49 in the second
edition) to summarily explain every kind assignment -- and that's in a
work in which I've tried (successfully, I believe from reviews) to be
very, *VERY* concise;-).

Summarizing that summary;-), a plain assignment to an identifier binds
that name; a plain assignment to an attribute reference x.y asks object
x (x can be any expression) to bind its attribute named 'y'; a plain
assignment to an indexing x[y] (x and y are arbitrary expressions) asks
object x to bind its item indicated by the value of y); a plain
assignment to a slicing is equivalent to the plain assignment to the
indexing with an index of slice(start, stop, stride) [[slice is a Python
builtin type]].

Plain assignment to an identifier "just happens"; all other cases of
plain assignment are requests to an object to bind one or more of its
attributes or items (i.e., requests for specific mutations of an object)
-- as for, say any method call (which might also be a request for some
kind of mutation), the object will do whatever it pleases with the
request (including, perhaps, "refusing" it, by raising an exception).

Then we get into unpacking assignments and augmented assignments, but I
don't really want to write two more pages worth of summary...;-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Destruction of generator objects

2007-08-10 Thread Alex Martelli
Stefan Bellon <[EMAIL PROTECTED]> wrote:

> On Thu, 09 Aug, Graham Dumpleton wrote:
> 
> > result = application(environ, start_response)
> > try:
> > for data in result:
> > if data:# don't send headers until body appears
> > write(data)
> > if not headers_sent:
> > write('')   # send headers now if body was empty
> > finally:
> > if hasattr(result,'close'):
> > result.close()
> 
> Hm, not what I hoped for ...
> 
> Isn't it possible to add some __del__ method to the generator object
> via some decorator or somehow else in a way that works even with Python
> 2.4 and can then be nicely written without cluttering up the logic
> between consumer and producer?

No, you cannot do what you want in Python 2.4.  If you can't upgrade to
2.5 or better, whatever the reason may be, you will have to live with
2.4's limitations (there ARE reasons we keep making new releases, after
all...:-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Something in the function tutorial confused me.

2007-08-10 Thread Alex Martelli
Neil Cerutti <[EMAIL PROTECTED]> wrote:
   ...
> The Python Language Reference seems a little confused about the
> terminology.
> 
>   3.4.7 Emulating numeric types
>   6.3.1 Augmented assignment statements
> 
> The former refers to "augmented arithmetic operations", which I
> think is a nice terminology, since assignment is not necessarily
> taking place. Then the latter muddies the waters.

Assignment *IS* "necessarily taking place"; if you try the augmented
assignment on something that DOESN'T support assignment, you'll get an
exception.  Consider:

>>> tup=([],)
>>> tup[0] += ['zap']
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'tuple' object does not support item assignment

Tuples don't support item ASSIGNMENT, and += is an ASSIGNMENT, so tuples
don't allow a += on any of their items.

If you thought that += wasn't an assignment, this behavior and error
message would be very problematic; since the language reference ISN'T
confused and has things quite right, this behavior and error message are
perfectly consistent and clear.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Something in the function tutorial confused me.

2007-08-10 Thread Alex Martelli
greg <[EMAIL PROTECTED]> wrote:

> Steve Holden wrote:
> 
> > For some reason your reply got right up my nose,
> 
> I'm sorry about that. Sometimes it's hard to judge the
> level of experience with Python that a poster has. In

Because of this, a Google search for

  " " python

may sometimes help; when you get 116,000 hits, as for "Steve Holden"
python, that may be a reasonable indication that the poster is one of
the world's Python Gurus (in fact, the winner of the 2007 Frank WIllison
Award -- congratulations, Steve!!!).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help with Dictionaries and Classes requested please.

2007-08-10 Thread Alex Martelli
Sion Arrowsmith <[EMAIL PROTECTED]> wrote:

> special_dragonfly <[EMAIL PROTECTED]> wrote:
> >if key in FieldsDictionary:
> >FieldsDictionary[key].append(FieldClass(*line.split(",")))
> >else:
> >FieldsDictionary[key]=[FieldClass(*line.split(","))]
> 
> These four lines can be replaced by:
> 
> FieldsDictionary.setdefault(key, []).append(FieldClass(*line.split(",")))

Even better might be to let FieldsDictionary be an instance of
collections.defaultdict(list) [[assuming Python 2.5 is in use]], in
which case the simpler

   FieldsDictionary[key].append(FieldClass(*line.split(",")))

will Just Work.  setdefault was a valiant attempt at fixing this
problem, but defaultdict is better.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Ipc mechanisms and designs.

2007-08-10 Thread Alex Martelli
king kikapu <[EMAIL PROTECTED]> wrote:

> Hi,
> 
> inspired of the topic "The Future of Python Threading", i started to
> realize that the only way to utilize the power of multiple cores using
> Python, is spawn processes and "communicate" with them.
> 
> If we have the scenario:
> 
> 1. Windows (mainly) development
> 2. Processes are running in the same machine
> 3. We just want to "pass" info from one process to another. Info may
> be simple data types or user defined Python objects.
> 
> what is the best solution (besides sockets) that someone can implement
> so to have 2 actually processes that interchanged data between them ?
> I looked at Pyro and it looks really good but i wanted to experiment
> with a simpler solution.

Check out 


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: boolean operations on sets

2007-08-06 Thread Alex Martelli
Michael J. Fromberger <[EMAIL PROTECTED]>
wrote:
   ...
> Also, it is a common behaviour in many programming languages for logical
> connectives to both short-circuit and yield their values, so I'd argue
> that most programmers are proabably accustomed to it.  The && and || 
> operators of C and its descendants also behave in this manner, as do the

Untrue, alas...:

brain:~ alex$ cat a.c
#include 

int main()
{
printf("%d\n", 23 && 45);
return 0;
}
brain:~ alex$ gcc a.c
brain:~ alex$ ./a.out 
1

In C, && and || _do_ "short circuit", BUT they always return 0 or 1,
*NOT* "yield their values" (interpreted as "return the false or true
value of either operand", as in Python).


Alex
  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Formatting Results so that They Can be Nicely Imported into a Spreadsheet.

2007-08-05 Thread Alex Martelli
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
   ...
> Even with the "if i" included, we end up with an
> empty list at the start. This because the first "blank"
> line wasn't blank, it was a space, so it passes the
> "if i" test.

...and you can fix that by changing the test to [... if i.split()].


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient Rank Ordering of Nested Lists

2007-08-04 Thread Alex Martelli
Cousin Stanley <[EMAIL PROTECTED]> wrote:
   ...
>   for i , item in reversed( enumerate( sorted( single_list ) ) ) :
   ...
> TypeError: argument to reversed() must be a sequence

Oops, right.  Well then,

aux_seq = list(enumerate(sorted(single_list)))
for i, item in reversed(aux_seq): ...

or the like.


Alex

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient Rank Ordering of Nested Lists

2007-08-03 Thread Alex Martelli
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> A naive approach to rank ordering (handling ties as well) of nested
> lists may be accomplished via:
> 
>def rankLists(nestedList):
>   def rankList(singleList):
>   sortedList = list(singleList)
>   sortedList.sort()
>   return map(sortedList.index, singleList)
>   return map(rankList, nestedList)
> 
>>>> unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2,
> 0, -1.1, 13 ] ]
>>>> print rankLists(unranked)
> 
>[[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]
> 
> This works nicely when the dimensions of the nested list are small.
> It is slow when they are big.  Can someone suggest a clever way to
> speed it up?

Each use of sortedList.index is O(N) [where N is len(singleList)], and
you have N such uses in the map in the inner function, so this approach
is O(N squared).  Neil's suggestion to use bisect replaces the O(N)
.index with an O(log N) search, so the overall performance is O(N log N)
[[and you can't do better than that, big-O wise, because the sort step
is also O(N log N)]].

"beginner"'s advice to use a dictionary is also good and may turn out to
be faster, just because dicts are SO fast in Python -- but you need to
try and measure both alternatives.  One way to use a dict (warning,
untested code):

  def rankList(singleList):
  d = {}
  for i, item in reversed(enumerate(sorted(singleList))):
  d[item] = i
  return [d[item] for item in singleList]

If you find the for-clause too rich in functionality, you can of course
split it up a bit; but note that you do need the 'reversed' to deal with
the corner case of duplicate items (without it, you'd end up with 1s
instead of 0s for the third one of the sublists in your example).  If
speed is of the essence you might try to measure what happens if you
replace the returned expression with map(d.__getitem__, singleList), but
I suspect the list comprehension is faster as well as clearer.  

Another potential small speedup is to replace the first 3 statements
with just one:

d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList

but THIS density of functionality is a bit above my personal threshold
of comfort ("sparse is better than dense":-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pythonic way for missing dict keys

2007-08-02 Thread Alex Martelli
Bruno Desthuilliers <[EMAIL PROTECTED]>
wrote:

> Alex Popescu a écrit :
> > Bruno Desthuilliers <[EMAIL PROTECTED]> wrote in
> > news:[EMAIL PROTECTED]: 
> (snip)
> >> if hasattr(obj, '__call__'):
> >># it's a callable
> >>
> >> but I don't find it so Pythonic to have to check for a __magic__
> >> method. 
> > 
> > It looks like Python devs have decided it is Pythonic, because it is
> > already in the PEP. 
> 
> I do know, and I disagree with this decision.
> 
> FWIW, repr(obj) is mostly syntactic sugar for obj.__repr__(), 
> getattr(obj, name) for obj.__getattr__(name), type(obj) for 
> obj.__class__  etc... IOW, we do have a lot of builtin functions that
> mostly encapsulate calls to __magic__ methods, and I definitively don't
> understand why this specific one (=> callable(obj)) should disappear. I

Maybe because it DOESN'T "encapsulate a call" to a magic method, but
rather the mere check for the presence of one?

> usually have lot of respect for Guido's talent as a language designer
> (obviously since Python is still MFL), but I have to say I find this 
> particular decision just plain stupid. Sorry.

The mere check of whether an object possesses some important special
method is best accomplished through the abstract-base-classes machinery
(new in Python 3.0: see ).  At
this time there is no Callable ABC, but you're welcome to argue for it
on the python-3000 mailing list (please do check the archives and/or
check privately with the PEP owner first to avoid duplication).


Alex
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where do they tech Python officialy ?

2007-08-01 Thread Alex Martelli
Alex Popescu <[EMAIL PROTECTED]> wrote:
   ...
> Have you seen/heard of Jim lately? Cause I haven't. By the time he was
> the lead of the AspectJ team his charismatic presence was everywhere (at
> least around that project).

He wasn't at OSCON this year, but I hope to see him at Pycon next year.
I don't see this as a deep dark M$ plot to kidnap and hide the best and
brightest Open Sourcers, because I know what it means to get a wonderful
challenging new job and pour all you have into it (I've had to skip a
couple Pycons, myself, though I hope to be back next year).


> However I do agree with you. The only remark is that US trends are not
> hitting my part of Eu so quickly ;-) (things are indeed changing).

About 3 years ago I was also getting sick and tired about my own part of
the EU, which is part of why I emigrated:-).  I do see things getting
better in Southern Europe, albeit from a distance.


> > These are the ones you don't wan't to work for anyway !-)
> 
> Well... this is sometimes debatable :-).

A totally clueless employer may still be a way to make some quick and
dirty money right now, but it will barely be enough to pay for the extra
Maalox and Zantac you'll need.  Looking back on your life when you're
closer to retirement than to when you started working, you'll see what a
mistake it was to accept clueless-employers' offers, and how much
happier your life would have been if you'd known that up front:-).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where do they tech Python officialy ?

2007-08-01 Thread Alex Martelli
NicolasG <[EMAIL PROTECTED]> wrote:

> > Open source projects do not require previous professional experience to
> > accept volunteers.  So, one way out of your dilemma is to make a name
> > for yourself as an open source contributor -- help out with Python
> > itself and/or with any of the many open source projects that use Python,
> > and you will both learn a lot _and_ acquire "professional experience"
> > that any enlightened employer will recognize as such.  That will take a
> > while, but not as long as getting a college degree (and it will be far
> > cheaper than the degree).
> >
> > Alex
> 
> I think this is the best idea to escape the python amateur circle and
> go in to open source project that are considered to be professional
> projects. I don't know if it will be better to find a project to
> contribute or to start a new one .. Will have a look around and think
> about.

Unless you have some specific new idea that you're keen to address and
can't be met by existing projects, joining an existing project would
normally be a better bet.  One-person projects are rarely as important
as larger ones, and it's quite hard to get other collaborators to a new
project; working in a project with existing code and contributors will
also be more instructive.  As for which OS projects are "considered to
be professional", just about all large successful ones are so
considered: after all, even games, say, are "professional projects" from
the POV of firms that develop and sell them, such as EA!-)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Floats as keys in dict

2007-08-01 Thread Alex Martelli
Brian Elmegaard <[EMAIL PROTECTED]> wrote:

> I am making a script to optimiza by dynamic programming. I do not know
> the vertices and nodes before the calculation, so I have decided to
> store the nodes I have in play as keys in a dict.
> 
> However, the dict keys are then floats and I have to round the values
> of new possible nodes in each step. When profiling I see that the most
> time consuming part of my script is rounding.
> 
> Is there a faster way than round() or is there a better way to test
> than 'in' or should I store the keys in another way than a dict?

You may want to consider keeping a sorted list and using standard
library module bisect for searches and insertions -- its behavior is
O(log N) for a search, O(N) for an insertion, but it might be that in
your case saving the rounding could be worth it.

Otherwise, you need to consider a different container, based either on
comparisons (e.g. AVL trees, of which there are several 3rd party
implementations as Python extensions) or on a hashing function that will
give the same hash for two numbers that are "close enough" (e.g., hash
ignoring the lowest N bits of the float's mantissa for some N).

round() operates on decimals and that may not be as fast as working on
binary representations, but, to be fast, a helper function giving the
"hash of a binary-rounded float" would have to be coded in C (or maybe
use psyco).


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Where do they tech Python officialy ?

2007-08-01 Thread Alex Martelli
Alex Popescu <[EMAIL PROTECTED]> wrote:
   ...
> > and you will both learn a lot _and_ acquire "professional experience"
> > that any enlightened employer will recognize as such.  
> 
> It depends :-). In my experience I met employers being concerned by my
> implication in the oss world :-).

Considering that even the King of Proprietary Software, Microsoft, now
happily hires major Open Source figures such as Jim Hugunin (MS was also
a top-tier sponsor at the recent OSCON, with both managerial and senior
technical employees giving keynotes and tech talks), it boggles the mind
to think about which kind of company would instead be "concerned" by a
candidate's OS experience.


> > That will take a
> > while, but not as long as getting a college degree (and it will be far
> > cheaper than the degree).
> 
> I don't know much about the open community in Python world, but in Java
> world becoming a project member may be more difficult than getting a 
> degree (or close to :-)) ).

In a major project, you will of course have to supply useful
contributions as well as proving to have a reasonable personality &c
before being granted committer privileges; and a few projects (centered
on a group of committers employed by a single firm or on an otherwise
close-knit small clique) are not very open to the outside world at all.
But (at least wrt projects using Python, C, C++ -- I have no experience
of opensource projects focused on Java instead) that is the exception,
not the rule.


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >