Re: Py2.7/FreeBSD: maximum number of open files

2011-11-30 Thread Chris Torek
In article mailman.2711.1321299276.27778.python-l...@python.org
Christian Heimes  li...@cheimes.de wrote:
Am 14.11.2011 19:28, schrieb Tobias Oberstein:
 Thanks! This is probably the most practical option I can go.
 
 I've just tested: the backported new IO on Python 2.7 will indeed
 open 32k files on FreeBSD. It also creates the files much faster.
 The old, non-monkey-patched version was getting slower and
 slower as more files were opened/created ..

I wonder what's causing the O(n^2) behavior. Is it the old file type or
BSD's fopen() fault?

It is code in libc.  My old stdio (still in use on FreeBSD) was
never designed to be used in situations with more than roughly 256
file descriptors -- hence the short in the file number field.
(The OS used to be full of other places that kept the maximum
allowable file descriptor fairly small, such as the on-stack copies
of fd_set objects in select() calls.)

You will want to redesign the code that finds or creates a free
FILE object, and probably some of the things that work with
line-buffered FILEs (specifically calls to _fwalk() when reading
from a line-buffered stream).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to check that you are at the beginning (the end) of an iterable?

2011-09-08 Thread Chris Torek
In article mailman.854.1315441399.27778.python-l...@python.org
Cameron Simpson  c...@zip.com.au wrote:
Facilities like feof() in C and eof in Pascal already lead to lots of
code that runs happily with flat files and behaves badly in interactive
or piped input. It is _so_ easy to adopt a style like:

  while not eof(filehandle):
line = filehandle.nextline()
...

Minor but important point here: eof() in Pascal is predictive (uses
a crystal ball to peer into the future to see whether EOF is is
about to occur -- which really means, reads ahead, causing that
interactivity problem you mentioned), but feof() in C is post-dictive.
The feof(stream) function returns a false value if the stream has
not yet encountered an EOF, but your very next attempt to read from
it may (or may not) immediately encounter that EOF.

Thus, feof() in C is sort of (but not really) useless.  (The actual
use cases are to distinguish between EOF and error after a
failed read from a stream -- since C lacks exceptions, getc() just
returns EOF to indicate failed to get a character due to end of
file or error -- or in some more obscure cases, such as the
nonstandard getw(), to distinguish between a valid -1 value and
having encountered an EOF.  The companion ferror() function tells
you whether an earlier EOF value was due to an error.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do class methods always need 'self' as the first parameter?

2011-09-05 Thread Chris Torek
Chris Torek nos...@torek.net writes:
[snip]
 when you have [an] instance and call [an] instance or class method:

[note: I have changed the names very slightly here, and removed
additional arguments, on purpose]

 black_knight = K()
 black_knight.spam()
 black_knight.eggs()

 the first parameters ... are magic, and invisible.

 Thus, Python is using the explicit is better than implicit rule
 in the definition, but not at the call site. ...

In article m2wrdnf53u@cochabamba.vanoostrum.org
Piet van Oostrum  p...@vanoostrum.org wrote:
It *is* explicit also at the call site. It only is written at the left
of the dot rather than at the right of the parenthesis.

It cannot possibly be explicit.  The first parameter to one of the
method functions is black_knight, but the first parameter to the
other method is black_knight.__class__.

Which one is which?  Is spam() the instance method and eggs() the
class method, or is spam() the class method and eggs the instance
method?  (One does not, and should not, have to *care*, which is
kind of the point here. :-) )

And that is necessary to locate which definition of the method
applies.

By that I assume you mean the name black_knight here.  But the
name is not required to make the call; see the last line of the
following code fragment:

funclist = []
...
black_knight = K()
funclist.append(black_knight.spam)
funclist.append(black_knight.eggs)
...
# At this point, let's say len(funclist)  2,
# and some number of funclist[i] entries are ordinary
# functions that have no special first parameter.
random.choice(funclist)()

It would be silly to repeat this information after the parenthesis.
Not only silly, it would be stupid as it would be a source of errors,
and an example of DRY.

Indeed.  But I believe the above is a demonstration of how the
self or cls parameter is in fact implicit, not explicit.

(I am using python 2.x, and doing this in the interpreter:

random.choice(funclist)

-- without the parentheses to call the function -- produces:

bound method K.[name omitted] of __main__.K object at 0x249f50
bound method type.[name omitted] of class '__main__.K'
function ordinary at 0x682b0

The first is the instance method, whose name I am still keeping
secret; the second is the class method; and the third is the ordinary
function I added to the list.  The actual functions print their
own name and their parameters if any, and one can see that the
class and instance methods get one parameter, and the ordinary
function gets none.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why doesn't threading.join() return a value?

2011-09-02 Thread Chris Torek
On Sep 2, 2:23 pm, Alain Ketterlin al...@dpt-info.u-strasbg.fr
wrote:
 Sorry, you're wrong, at least for POSIX threads:

 void pthread_exit(void *value_ptr);
 int pthread_join(pthread_t thread, void **value_ptr);

 pthread_exit can pass anything, and that value will be retrieved with
 pthread_join.

In article bf50c8e1-1476-41e1-b2bc-61e329bfa...@s12g2000yqm.googlegroups.com
Adam Skutt  ask...@gmail.com wrote:
No, it can only pass a void*, which isn't much better than passing an
int.

It is far better than passing an int, although it leaves you with
an annoying storage-management issue, and sidesteps any reasonable
attempts at type-checking (both of which are of course par for
the course in C).  For instance:

struct some_big_value {
... lots of stuff ...
};
struct some_big_value storage_management_problem[SIZE];
...
void *func(void *initial_args) {
...
#ifdef ONE_WAY_TO_DO_IT
pthread_exit(storage_management_problem[index]);
/* NOTREACHED */
#else /* the other way */
return storage_management_problem[index];
#endif
}
...
int error;
pthread_t threadinfo;
pthread_attr_t attr;
...
pthread_attr_init(attr);
/* set attributes if desired */
error = pthread_create(threadinfo, attr, func, args_to_func);
if (error) {
... handle error ...
} else {
...
void *rv;
result = pthread_join(threadinfo, rv);
if (rv == PTHREAD_CANCELED) {
... the thread was canceled ...
} else {
struct some_big_value *ret = rv;
... work with ret-field ...
}
}

(Or, do dynamic allocation, and have a struct with a distinguishing
ID followed by a union of multiple possible values, or a flexible
array member, or whatever.  This means you can pass any arbitrary
data structure back, provided you can manage the storage somehow.)

Passing a void* is not equivalent to passing anything, not even
in C.  Moreover, specific values are still reserved, like
PTHREAD_CANCELLED.

Some manual pages are clearer about this than others.  Here is one
that I think is not bad:

The symbolic constant PTHREAD_CANCELED expands to a constant
expression of type (void *), whose value matches no pointer to
an object in memory nor the value NULL.

So, provided you use pthread_exit() correctly (always pass either
NULL or the address of some actual object in memory), the special
reserved value is different from all of your values.

(POSIX threads are certainly klunky, but not all *that* badly designed
given the constraints.)

Re. the original question: since you can define your own Thread
subclass, with wathever attribute you want, I guess there was no need to
use join() to communicate the result. The Thread's run() can store its
result in an attribute, and the client can get it from the same
attribute after a successful call to join().

For that matter, you can use the following to get what the OP asked
for.  (Change all the instance variables to __-prefixed versions
if you want them to be Mostly Private.)

import threading

class ValThread(threading.Thread):
like threading.Thread, but the target function's return val is captured
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
super(ValThread, self).__init__(group, None, name, None, None, verbose)
self.value = None
self.target = target
self.args = args
self.kwargs = {} if kwargs is None else kwargs

def run(self):
run the thread
if self.target:
self.value = self.target(*self.args, **self.kwargs)

def join(self, timeout = None):
join, then return value set by target function
super(ValThread, self).join(timeout)
return self.value
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why doesn't threading.join() return a value?

2011-09-02 Thread Chris Torek
In article roy-030914.19162802092...@news.panix.com
Roy Smith  r...@panix.com wrote:
Thread.join() currently returns None, so there's 
no chance for [return value] confusion.

Well, still some actually.  If you use my example code (posted
elsethread), you need to know:

  - that there was a target function (my default return
value if there is none is None); and
  - that the joined thread really did finish (if you pass
a timeout value, rather than None, and the join times
out, the return value is again None).

Of course, if your target function always exists and never returns
None, *then* there's no chance for confusion. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Algorithms Library - Asking for Pointers

2011-09-02 Thread Chris Torek
In article 18fe4afd-569b-4580-a629-50f6c7482...@c29g2000yqd.googlegroups.com
Travis Parks  jehugalea...@gmail.com wrote:
[Someone] commented that the itertools algorithms will perform
faster than the hand-written ones. Are these algorithms optimized
internally?

They are written in C, so avoid a lot of CPython interpreter
overhead.  Mileage in Jython, etc., may vary...
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do class methods always need 'self' as the first parameter?

2011-08-31 Thread Chris Torek
In article 0dc26f12-2541-4d41-8678-4fa53f347...@g9g2000yqb.googlegroups.com
T. Goodchild asked, in part:
... One of the things that bugs me is the requirement that all class
methods have 'self' as their first parameter.

In article 4e5e5628$0$29977$c3e8da3$54964...@news.astraweb.com
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
[Comprehensive reply, noting that these are actually instance
methods, and that there are class and static methods as well]:

Python does have class methods, which receive the class, not the instance,
as the first parameter. These are usually written something like this:

class K(object):
@classmethod
def spam(cls, args):
print cls  # always prints class K, never the instance

Just like self, the name cls is a convention only. Class methods are usually
used for alternate constructors.

There are also static methods, which don't receive any special first
argument, plus any other sort of method you can invent, by creating
descriptors... but that's getting into fairly advanced territory. ...
[rest snipped]

I am not sure whether T. Goodchild was asking any of the above or
perhaps also one other possible question: if an instance method
is going to receive an automatic first self parameter, why require
the programmer to write that parameter in the def?  For instance
we *could* have:

class K(object):
def meth1(arg1, arg2):
self.arg1 = arg1 # self is magically available
self.arg2 = arg2

@classmethod
def meth2(arg):
use(cls) # cls is magically available

and so on.  This would work fine.  It just requires a bit of implicit
sneakiness in the compiler: an instance method magically creates
a local variable named self that binds to the invisible first
parameter, and a class method magically creates a local variable
named cls that binds to the invisible first parameter, and so
on.

Instead, we have a syntax where you, the programmer, write out the
name of the local variable that binds to the first parameter.  This
means the first parameter is visible.  Except, it is only visible
at the function definition -- when you have the instance and call
the instance or class method:

black_knight = K()
black_knight.meth1('a', 1)
black_knight.meth2(2)

the first parameters (black_knight, and black_knight.__class__,
respectively) are magic, and invisible.

Thus, Python is using the explicit is better than implicit rule
in the definition, but not at the call site.  I have no problem with
this.  Sometimes I think implicit is better than explicit.  In this
case, there is no need to distinguish, at the calls to meth1() and
meth2(), as to whether they are class or instance methods.  At
the *calls* they would just be distractions.

At the *definitions*, they are not as distraction-y since it is
important to know, during the definition, whether you are operating
on an instance (meth1) or the class itself (meth2), or for that
matter on neither (static methods).  One could determine this from
the absence or presence of @classmethod or @staticmethod, but
the minor redundancy in the def statement seems, well, minor.

Also, as a bonus, it lets you obfuscate the code by using a name
other than self or cls. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: try... except with unknown error types

2011-08-31 Thread Chris Torek
In article mailman.286.1313956388.27778.python-l...@python.org,
Terry Reedy  tjre...@udel.edu wrote:
I would expect that catching socket.error (or even IOError) should catch 
all of those.

exception socket.error
A subclass of IOError ...

Except that, as Steven D'Aprano almost noted elsethread, it isn't
(a subclass of IOError -- the note was that it is not a subclass
of EnvironmentError).  In 2.x anyway:

 import socket
 isinstance(socket.error, IOError)
False
 isinstance(socket.error, EnvironmentError)
False
 

(I just catch socket.error directly for this case.)

(I have also never been sure whether something is going to raise
an IOError or an OSError for various OS-related read or write
operation failures -- such as exceeding a resource limit, for
instance -- so most places that do I/O operations on OS files, I
catch both.  Still, it sure would be nice to have a static analysis
tool that could answer questions about potential exceptions. :-) )
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why no warnings when re-assigning builtin names?

2011-08-31 Thread Chris Torek
(I realize this thread is old.  I have been away for a few weeks.
I read through the whole thread, though, and did not see anyone
bring up this one particular point: there is already a linting
script that handles this.)

On Mon, Aug 15, 2011 at 10:52 PM, Gerrat Rickert
grick...@coldstorage.com wrote:
 With surprising regularity, I see program postings (eg. on StackOverflow)
 from inexperienced Python users accidentally re-assigning built-in names.

 For example, they'll innocently call some variable, `list', and assign a
 list of items to it.

In article mailman.22.1313446504.27778.python-l...@python.org
Chris Angelico  ros...@gmail.com wrote:
It's actually masking, not reassigning. That may make it easier or
harder to resolve the issue.

If you want a future directive that deals with it, I'd do it the other
way - from __future__ import mask_builtin_warning or something - so
the default remains as it currently is. But this may be a better job
for a linting script.

The pylint program already does this:

$ cat shado.py
module doc
def func(list):
func doc
return list
$ pylint shado.py
* Module shado
W0622:  2:func: Redefining built-in 'list'
...
Your code has been rated at 6.67/10

If your shadowing is done on purpose, you can put in a pylint
comment directive to suppress the warning.

Pylint is the American Express Card of Python coding: don't leave
$HOME without it! :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do class methods always need 'self' as the first parameter?

2011-08-31 Thread Chris Torek
 = X()
x.xset('value')
x.show()
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Early binding as an option

2011-08-04 Thread Chris Torek
Chris Angelico wrote:
[snippage]
 def func(x):
len = len  # localize len
for i in x:
len(i)  # use it exactly as you otherwise would

In article 4e39a6b5$0$29973$c3e8da3$54964...@news.astraweb.com
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
That can't work. The problem is that because len is assigned to in the body
of the function, the Python compiler treats it as a local variable. So the
line len=len is interpreted as locallen = locallen, which doesn't yet
exist. There's no way of saying locallen = globallen in the body of the
function.

So you must either:

(1) use a different name: length = len

(2) use a fully-qualified name: import builtins; len = builtins.len

(This is my preferred form, given what one has now, if one is going
to do this in the function body.  Of course in 2.x it is spelled
__builtin__.len instead...)

(3) do the assignment as a default parameter, which has slightly different
binding rules: def func(x, locallen=globallen)

(4) manual lookup: len = builtins.__dict__['len']  # untested


I don't recommend that last one, unless you're deliberately trying to write
obfuscated code :)

If Python *were* to have some kind of tie this symbol down now
operation / keyword / whatever, one could write:

def func(x):
snap len # here the new keyword is snap
for i in x:
   ... len(i) ... # use it exactly as you otherwise would

Of course there is no *need* for any new syntax with the other
construction:

def func(x,  len=len) # snapshots len at def() time
for i in x:
   ... len(i) ...

but if one were to add it, it might look like:

def func(x, snap len):

The advantage (?) of something like a snap or snapshot or whatever
keyword / magic-function / whatever is that you can apply it to
more than just function names, e.g.:

def func(arg):
# for this example, assume that arg needs to have the
# following attributes:
snap arg.kloniblatt, arg.frinkle, arg.tippy
...

Here, in the ... section, a compiler (whether byte-code, or JIT,
or whatever -- JIT makes the most sense in this case) could grab
the attributes, looking up their types and any associated stuff it
wants to, and then assume that for the rest of that function's
execution, those are not allowed to change.  (But other arg.whatever
items are, here.  If you want to bind *everything*, perhaps snap
arg or snap arg.* -- see below.)

Even a traditional (CPython) byte-code compiler could do something
sort of clever here, by making those attributes read-only to
whatever extent the snapshot operation is defined as fixing the
binding (e.g., does it recurse into sub-attributes? does it bind
only the name-and-type, or does it bind name-and-type-and-value,
or name-and-type-and-function-address-if-function, or ... -- all
of which has to be pinned down before any such suggestion is taken
too seriously :-) ).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How should I document exceptions thrown by a method?

2011-07-27 Thread Chris Torek
Arcadio arcadiosinc...@gmail.com writes:
 I have a Settings class that is used to hold application settings. A
 Settings object initializes itself from a ConfigParser that gets
 passed in as an argument to the constructor.

In article 87oc0fpg9o@benfinney.id.au
Ben Finney  ben+pyt...@benfinney.id.au wrote:
So the caller is aware of, and takes responsibility for, the
ConfigParser instance.

 If a setting isn't found in whatever the ConfigParser is reading
 settings from, the ConfigParser's get() method will raise an
 exception. Should I just say that clients of my Settings class should
 be prepared to catch exceptions thrown by ConfigParser? Do I even have
 to mention that as it might be just implied?

In this case IMO it is implied that one might get exceptions from the
object one passes as an argument to a callable.

Yes.  But on the other hand:

 import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
...

:-)

I would suggest that in this case, too, explicit is better than
implicit: the documentation should say will invoke x.get() and
therefore propagate any exception that x.get() might raise.

 Or should Setting's constructor catch any exceptions raised by the
 ConfigParser and convert it to a Settings- specific exception class
 that I then document?

Please, no. Since the ConfigParser object was created and passed in by
the calling code, the calling code needs to know about the exceptions
from that object.

In *some* cases (probably not applicable here), one finds a good
reason to transform one exception to another.  In this case I agree
with Ben Finney though, and so does import this:

...
Simple is better than complex.
...

Letting exceptions flow upward unchanged is (usually) simpler,
hence better.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use self.vars in class.method(parameters, self.vars)

2011-07-22 Thread Chris Torek
In article 0ddc2626-7b99-46ee-9974-87439ae09...@e40g2000yqn.googlegroups.com
caccolangrifata  caccolangrif...@gmail.com wrote:
I'm very very new with python, and I have some experience with java
programming, so probably you guys will notice.
Anyway this is my question:
I'd like to use class scope vars in method parameter ...

Others have answered what appears to have been your actual
question.  Here's an example of using an actual class scope
variable.

(Note: I have a sinus headache, which is probably the source
of some of the weirder names :-) )

class Florg(object):
_DEFAULT_IPPY = 17

@classmethod
def set_default_ippy(cls, ippy):
cls._DEFAULT_IPPY = ippy

def __init__(self, name, ippy = None):
if ippy is None:
ippy = self.__class__._DEFAULT_IPPY
self.name = name
self.ippy = ippy

def zormonkle(self):
print('%s ippy = %s' % (self.name, self.ippy))

def example():
flist = [Florg('first')]
flist.append(Florg('second'))
flist.append(Florg('third', 5))
Florg.set_default_ippy(-4)
flist.append(Florg('fourth'))
flist.append(Florg('fifth', 5))

for florg in flist:
florg.zormonkle()

if __name__ == '__main__':
example()
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python ++ Operator?

2011-07-19 Thread Chris Torek
In article mailman.1057.1310717193.1164.python-l...@python.org
Chris Angelico  ros...@gmail.com wrote:
I agree that [C's ++ operators are] often confusing (i+j) ...

For what it is worth, this has to be written as:

i++ + ++j /* or i+++ ++j */

or similar (e.g., newline after the middle + operator) as the
lexer will group adjacent ++ characters into a single ++ operator
whenever it can (the so-called greedy matching that regular
expression recognizers are famous for), and only later will the
parser and semantic analysis phases realize that i++ ++ +j is
invalid and complain.

but there are several places where they're handy. ...
However, Python doesn't work as close to the bare metal, so it
doesn't have such constructs.

More specifically, Python has appropriate higher-level constructs
that, in effect, maintain mental invariants in a better (for some
value of better) way.  Instead of:

lst[i++] = val; /* or: *p++ = val; */

which has the effect of appending an item to an array-based list
of items -- the invariant here is that i (or p in the pointer
version) always tells you where the place the *next* item -- one
simply writes:

lst.append(val)

(which also makes sure that there is *room* in the array-based
list, something that requires a separate step in C).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python ++ Operator?

2011-07-15 Thread Chris Torek
In article mailman.1055.1310716536.1164.python-l...@python.org
Chris Angelico  ros...@gmail.com wrote:
2011/7/15 Rafael Durán Castañeda rafadurancastan...@gmail.com:
 Hello all,
 What's the meaning of using i++? Even, does exist ++ operator in python?

++i is legal Python but fairly useless. It's the unary + operator,
applied twice. It doesn't increment the variable.

Well...

class Silly:
def __init__(self, value):
self.value = value
self._pluscount = 0
def __str__(self):
return str(self.value)
def __pos__(self):
self._pluscount += 1
if self._pluscount == 2:
self.value += 1
self._pluscount = 0
return self

def main():
i = Silly(0)
print('initially, i = %s' % i)
print('plus-plus i = %s' % ++i)
print('finally, i = %s' % i)

main()

:-)

(Of course, +i followed by +i *also* increments i...)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: difflib-like library supporting moved blocks detection?

2011-07-13 Thread Chris Torek
In article mailman.1002.1310591600.1164.python-l...@python.org
Vlastimil Brom  vlastimil.b...@gmail.com wrote:
I'd like to ask about the availability of a text diff library, like
difflib, which would support the detection of moved text blocks.

If you allow arbitrary moves, the minimal edit distance problem
(string-to-string edit) becomes substantially harder.  If you only
allow insert, delete, or in-place-substitute, you have what is
called the Levenshtein distance case.  If you allow transpositions
you get Damerau-Levenshtein.  These are both solveable with a
dynamic programming algorithm.  Once you allow move operations,
though, the problem becomes NP-complete.

See http://pages.cs.brandeis.edu/~shapird/publications/JDAmoves.pdf
for instance.  (They give an algorithm that produces usually
acceptable results in polynomial time.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Does hashlib support a file mode?

2011-07-06 Thread Chris Torek
 - Do the usual dance for default arguments:
 def file_to_hash(path, m=None):
 if m is None:
 m = hashlib.md5()

[instead of

def file_to_hash(path, m = hashlib.md5()):

]

In article b317226a-8008-4177-aaa6-3fdc30125...@e20g2000prf.googlegroups.com
Phlip  phlip2...@gmail.com wrote:
Not sure why if that's what the defaulter does?

For the same reason that:

def spam(somelist, so_far = []):
for i in somelist:
if has_eggs(i):
so_far.append(i)
return munch(so_far)

is probably wrong.  Most beginners appear to expect this to take
a list of things that pass my has_eggs test, add more things
to that list, and return whatever munch(adjusted_list) returns ...
which it does.  But then they *also* expect:

result1_on_clean_list = spam(list1)
result2_on_clean_list = spam(list2)
result3_on_partly_filled_list = spam(list3, prefilled3)

to run with a clean so_far list for *each* of the first two
calls ... but it does not; the first call starts with a clean
list, and the second one starts with so_far containing all
the results accumulated from list1.

(The third call, of course, starts with the prefilled3 list and
adjusts that list.)

I did indeed get an MD5-style string of what casually appeared
to be the right length, so that implies the defaulter is not to
blame...

In this case, if you do:

print('big1:', file_to_hash('big1'))
print('big2:', file_to_hash('big2'))

you will get two md5sum values for your two files, but the
md5sum value for big2 will not be the equivalent of md5sum big2
but rather that of cat big1 big2 | md5sum.  The reason is
that you are re-using the md5-sum-so-far on the second call
(for file 'big2'), so you have the accumulated sum from file
'big1', which you then update via the contents of 'big2'.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: subtle error slows code by 10x (builtin sum()) - replace builtin sum without using import?

2011-07-01 Thread Chris Torek
In article f6dbf631-73a9-485f-8ada-bc7376ac6...@h25g2000prf.googlegroups.com
bdb112  boyd.blackw...@gmail.com wrote:
First a trap for new players, then a question to developers

Code accelerated by numpy can be slowed down by a large factor is you
neglect to import numpy.sum .

from timeit import Timer
frag = 'x=sum(linspace(0,1,1000))'
Timer(frag ,setup='from numpy import linspace').timeit(1000)
# 0.6 sec
Timer(frag, setup='from numpy import sum, linspace').timeit(1000)  #
difference is I import numpy.sum
# 0.04 sec  15x faster!

This is obvious of course - but it is very easy to forget to import
numpy.sum and pay the price in execution.

Question:
Can I replace the builtin sum function globally for test purposes so
that my large set of codes uses the replacement?
The replacement would simply issue warnings.warn() if it detected an
ndarray argument, then call the original sum
I could then find the offending code and use the appropriate import to
get numpy.sum


Sure, just execute code along these lines before running any of
the tests:

import __builtin__
import warnings

_sys_sum = sum # grab it before we change __builtin__.sum

def hacked_sum(sequence, start=0):
if isinstance(sequence, whatever):
warnings.warn('your warning here')
return _sys_sum(sequence, start)

__builtin__.sum = hacked_sum

(You might want to grab a stack trace too, using the traceback
module.)  You said without using import but all you have to
do is arrange for python to import this module before running
any of your own code, e.g., with $PYTHONHOME and a modified
site file.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Safely modify a file in place -- am I doing it right?

2011-06-29 Thread Chris Torek
In article 4e0b6383$0$29996$c3e8da3$54964...@news.astraweb.com
 steve+comp.lang.pyt...@pearwood.info wrote:
I have a script running under Python 2.5 that needs to modify files in
place. I want to do this with some level of assurance that I won't lose
data. ... I have come up with this approach:

[create temp file in suitable directory, write new data, and
use os.rename() to atomically swap out the old file for the
new]

As Grant Edwards said, this is the right general idea.  There
are lots of variations.  If you want to make the original
be a backup, the sequence:

os.link(original_name, backup_name)
os.rename(new_synced_file, original_name)

should generally do the trick (rename will unlink the target
which means that the backup name will refer to the original
inode).

import os, tempfile
def safe_modify(filename):
fp = open(filename, 'r')
data = modify(fp.read())
fp.close()
# Use a temporary file.
loc = os.path.dirname(filename)
fd, tmpname = tempfile.mkstemp(dir=loc, text=True)
# In my real code, I need a proper Python file object, 
# not just a file descriptor.
outfile = os.fdopen(fd, 'w')
outfile.write(data)
outfile.close()

It is a good idea to use outfile.flush() and then os.fsync() before
doing the close, as well.  Among other things, this *usually* gets
you some kind of notice-of-failure in the case of deferred writes
across a network (e.g., NFS).  (While it would be nice for os.close()
to deliver failure notices, in practice the fsync() is at least
sometimes required.  This is the OS's fault, not Python's. :-) )

# Move the temp file over the original.
os.rename(tmpname, filename)

os.rename is an atomic operation, at least under Linux and Mac,
so if the move fails, the original file should be untouched.

This seems to work for me, but is this the right way to do it?
Is there a better/safer way?

For additional checking and cleanup purposes, you may want to catch
exceptions and delete the temporary file if the rename has not yet
been done (and therefore the original file is still intact).

You will likely also need to fiddle with the permission bits
on the file resulting from the mkstemp() call (to make them
match those on the original file).  Alternatively, you may want
to build your own mkstemp() (this can be a bit of a challenge!).

Finally, as I implied above in talking about the os.link()-then-
os.rename() sequence, if the original file has multiple links to
it, note that this breaks the links.  If this is not what you
want, the problem has no fully general solution (but there are
various application-specific solutions).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interpreting Left to right?

2011-06-25 Thread Chris Torek
(Re:

x = x['huh'] = {}

which binds x to a new dictionary, then binds that dictionary's 'huh'
key to the same dictionary...)

In article mailman.389.1308949722.1164.python-l...@python.org
Tycho Andersen  ty...@tycho.ws wrote:
Perhaps I'm thick, but (the first thing I did was read the docs and) I
still don't get it. From the docs:

An assignment statement evaluates the expression list (remember that
this can be a single expression or a comma-separated list, the latter
yielding a tuple) and assigns the single resulting object to each of
the target lists, from left to right.

The target list in this case is, in effect:

evail(x), eval(x['huh'])

For a single target, it evaluates the RHS and assigns the result to
the LHS. Thus

x = x['foo'] = {}

first evaluates

x['foo'] = {}

which should raise a NameError, since x doesn't exist yet. Where am I
going wrong?

I believe you are still reading this as:

   x = (something)

and setting aside x and something, and only then peering into the
something and finding:

x['foo'] = {}

and -- while keeping all of the other x = (something) at bay, trying
to do the x['foo'] assignment.

This is the wrong way to read it!

The correct way to read it is:

  - Pseudo_eval(x) and pseudo_eval(x['foo']) are both to be set
to something, so before we look any more closely at the x and
x['foo'] part, we need to evaluate the something part.

  - The something part is: {}, so create a dictionary.  There is
no name bound to this result, but for discussion let's bind tmp
to it.

  - Now that we have evaluated the RHS of the assignment statement
(which we are calling tmp even though it has no actual name),
*now* we can go eval() (sort of -- we only evaluate them for
assignment, rather than for current value) the pieces of the LHS.

  - The first piece of the LHS is x.  Eval-ing x for assignment
gets us the as-yet-unbound x, and we do:

x = tmp

which binds x to the new dictionary.

  - The second piece of the LHS is x['foo'].  Eval-ing this for
assignment gets us the newly-bound x, naming the dictionary;
the key 'foo', a string; and now we bind x['foo'], doing:

x['foo'] = tmp

which makes the dictionary contain itself.

Again, Python's assignment statement (not expression) has the form:

one or more LHS = parts, AKA target list expression-list

and the evaluation order is, in effect and using pseudo-Python:

1. expression-list -- the (single) RHS
   tmp = eval(expression-list)

2. for LHS-part in target-list: # left-to-right
 LHS-part = tmp

When there is only one item in the target-list (i.e., just one
x = part in the whole statement), or when all of the parts of
the target-list are independent of each other and of the
expression-list, the order does not matter.  When the parts are
interdependent, then this left-to-right order *is* important.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Significant figures calculation

2011-06-25 Thread Chris Torek
In article mailman.386.1308949143.1164.python-l...@python.org
Jerry Hill  malaclyp...@gmail.com wrote:
I'm curious.  Is there a way to get the number of significant digits
for a particular Decimal instance?

Yes:

def sigdig(x):
return the number of significant digits in x
return len(x.as_tuple()[1])

import decimal
D = decimal.Decimal

for x in (
'1',
'1.00',
'1.23400e-8',
'0.003'
):
print 'sigdig(%s): %d' % (x, sigdig(D(x)))
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: those darn exceptions

2011-06-24 Thread Chris Torek
Chris Torek wrote:
 I can then check the now-valid
 pid via os.kill().  However, it turns out that one form of trash
 is a pid that does not fit within sys.maxint.  This was a surprise
 that turned up only in testing, and even then, only because I
 happened to try a ridiculously large value as one of my test cases.

In article 96itucfad...@mid.individual.net
Gregory Ewing  greg.ew...@canterbury.ac.nz wrote:
It appears that this situation is not unique to os.kill(),
for example,

  import os
  os.read(, 42)
Traceback (most recent call last):
   File stdin, line 1, in module
OverflowError: Python int too large to convert to C long

In fact I'd expect it to happen any time you pass a
very large int to something that's wrapping a C function.

You can't really blame the wrappers for this -- it's not
reasonable to expect all of them to catch out-of-range ints
and do whatever the underlying function would have done if
it were given an invalid argument.

I think the lesson to take from this is that you should
probably add OverflowError to the list of things to catch
whenever you're calling a function with input that's not
fully validated.

Indeed.  (Provided that your call is the point at which the validation
should occur -- otherwise, let the exception flow upwards as usual.)

But again, this is why I would like to have the ability to use some
sort of automated tool, where one can point at any given line of
code and ask: what exceptions do you, my faithful tool, believe
can be raised as a consequence of this line of code?

If you point it at the call to main():

if __name__ == '__main__':
main()

then you are likely to get a useless answer (why, any exception
at all); but if you point it at a call to os.read(), then you get
one that is useful -- and tells you (or me) about the OverflowError.
If you point it at a call to len(x), then the tool tells you what
it knows about type(x) and x.__len__.  (This last may well be
nothing: some tools have only limited application.  However, if
the call to len(x) is preceded by an assert isinstance(x,
(some,fixed,set,of,types)) for instance, or if all calls to the
function that in turn calls len(x) are visible and the type of x
can be proven, the tool might tell you something useful agin.)

It is clear at this point that a simple list (or tuple) of possible
exceptions is insufficient -- the tool has to learn, somehow, that
len() raises TypeError itself, but also raises whatever x.__len__
raises (where x is the parameter to len()).  If I ever get around
to attempting this in pylint (in my Copious Spare Time no doubt
:-) ), I will have to start with an external mapping from built
in function F to exceptions that F raises and figure out an
appropriate format for the table's entries.  That is about half
the point of this discussion (to provoke thought about how one
might express this); the other half is to note that the documentation
could probably be improved (as someone else already noted elsethread).

Note that, if nothing else, the tool -- even in limited form,
without the kind of type inference that pylint attempts -- gives
you the ability to automate part of the documentation process.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writable iterators?

2011-06-23 Thread Chris Torek
, sequence):
if isinstance(sequence, dict):
self._iter = self._dict_iter
self._get = self._dict_get
self._set = self._dict_set
elif isinstance(sequence, list):
self._iter = self._list_iter
self._get = self._list_get
self._set = self._list_set
else:
raise IndirectIterError(
don't know how to IndirectIter over %s % type(sequence))
self._seq = sequence

def __str__(self):
return '%s(%s)' % (self.__class__.__name__, self._iterover)

def __iter__(self):
return self._iter()

def _dict_iter(self):
return _IInner(self, self._seq.keys())

def _dict_get(self, index, keys):
return self._seq[keys[index]]

def _dict_set(self, index, keys, newvalue):
self._seq[keys[index]] = newvalue

def _list_iter(self):
return _IInner(self, self._seq)

def _list_get(self, index, _):
return self._seq[index]

def _list_set(self, index, _, newvalue):
self._seq[index] = newvalue

if __name__ == '__main__':
d = {'one': 1, 'two': 2, 'three': 3}
l = [9, 8, 7]
print 'modify dict %r' % d
for i in IndirectIter(d):
i.set(-i.get())
print 'result: %r' % d
print
print 'modify list %r' % l
for i in IndirectIter(l):
i.set(-i.get())
print 'result: %r' % l
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: those darn exceptions

2011-06-23 Thread Chris Torek
In article 96gb36fc6...@mid.individual.net,
Gregory Ewing  greg.ew...@canterbury.ac.nz wrote:
Chris Torek wrote:

 Oops!  It turns out that os.kill() can raise OverflowError (at
 least in this version of Python, not sure what Python 3.x does).

Seems to me that if this happens it indicates a bug in
your code. It only makes sense to pass kill() something
that you know to be the pid of an existing process,
presumably one returned by some other system call.

So if kill() raises OverflowError, you *don't* want
to catch and ignore it. You want to find out about it,
just as much as you want to find out about a TypeError,
so you can track down the cause and fix it.

A bunch of you are missing the point here, perhaps because my
original example was not the best, as it were.  (I wrote it
on the fly; the actual code was elsewhere at the time.)

I do, indeed, want to find out about it.  But in this case
what I want to find out is the number I thought was a pid,
was not a pid, and I want to find that out early and catch
the OverflowError() in the function in question.

(The two applications here are a daemon and a daemon-status-checking
program.  The daemon needs to see if another instance of itself is
already running [*].  The status-checking program needs to see if
the daemon is running [*].  Both open a pid file and read the contents.
The contents might be stale or trash.  I can check for trash because
int(some_string) raises ValueError.  I can then check the now-valid
pid via os.kill().  However, it turns out that one form of trash
is a pid that does not fit within sys.maxint.  This was a surprise
that turned up only in testing, and even then, only because I
happened to try a ridiculously large value as one of my test cases.
It *should*, for some value of should :-) , have turned up much
earlier, such as when running pylint.)

([*] The test does not have to be perfect, but it sure would be
nice if it did not result in a Python stack dump. :-) )
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writable iterators?

2011-06-23 Thread Chris Torek
In article iu00fs1...@news3.newsguy.com I wrote, in part:
Another possible syntax:

for item in container with key:

which translates roughly to bind both key and item to the value
for lists, but bind key to the key and value for the value for
dictionary-ish items.  Then ... the OP would write, e.g.:

for elem in sequence with index:
...
sequence[index] = newvalue

which of course calls the usual container.__setitem__.  In this
case the new protocol is to have iterators define a function
that returns not just the next value in the sequence, but also
an appropriate key argument to __setitem__.  For lists, this
is just the index; for dictionaries, it is the key; for other
containers, it is whatever they use for their keys.

I note I seem to have switched halfway through thinking about
this from value to index for lists, and not written that. :-)

Here's a sample of a simple generator that does the trick for
list, buffer, and dict:

def indexed_seq(seq):

produce a pair
key_or_index value
such that seq[key_or_index] is value initially; you can
write on seq[key_or_index] to set a new value while this
operates.  Note that we don't allow tuple and string here
since they are not writeable.

if isinstance(seq, (list, buffer)):
for i, v in enumerate(seq):
yield i, v
elif isinstance(seq, dict):
for k in seq:
yield k, seq[k]
else:
raise TypeError(don't know how to index %s % type(seq))

which shows that there is no need for a new syntax.  (Turning the
above into an iterator, and handling container classes that have
an __iter__ callable that produces an iterator that defines an
appropriate index-and-value-getter, is left as an exercise. :-) )
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How can I speed up a script that iterates over a large range (600 billion)?

2011-06-22 Thread Chris Torek
, 1, 0, 0, 1, 1, 0, 0)
MODULOS = frozenset( (1, 7, 11, 13, 17, 19, 23, 29) )

# If we started counting from 7, we'd want:
#   itertools.compress(itertools.count(7,2), itertools.cycle(MASK))
# But we start counting from q which means we need to clip off
# the first ((q - 7) % 30) // 2 items:
offset = ((q - 7) % 30) // 2
for q in itertools.compress(itertools.count(q, 2),
itertools.islice(itertools.cycle(MASK), offset, None, 1)):
p = D.pop(q, None)
if p is None:
D[q * q] = q
primes.primes.append(q)
yield q
else:
twop = p + p
x = q + twop
while x in D or (x % 30) not in MODULOS:
x += twop
D[x] = p

def factors(num):

Return all the prime factors of the given number.

if num  0:
num = -num
if num  2:
return
for p in primes():
q, r = divmod(num, p)
while r == 0:
yield p
if q == 1:
return
num = q
q, r = divmod(num, p)

if __name__ == '__main__':
for arg in (sys.argv[1:] if len(sys.argv)  1 else ['600851475143']):
try:
arg = int(arg)
except ValueError, error:
print error
else:
print '%d:' % arg,
for fac in factors(arg):
print fac,
sys.stdout.flush()
print
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to get return values of a forked process

2011-06-21 Thread Chris Torek
 On Tue, Jun 21, 2011 at 12:26 PM, Ian ian.l...@rocketmail.com wrote:
 myForkedScript has code like this:
 if fail:
 os._exit(1)
 else:
os._exit(os.EX_OK)

 Is using os._exit() the correct way to get a return value back to the
 main process?

The correct way, no, but it is a correct way (and cheaper than
using a pipe to pickle and unpickle failure, the way the subprocess
module does it, for instance).  In any case, you *should* call
os._exit() either directly or indirectly after a successful fork
but a failed exec.

On Jun 21, 1:54 pm, Ian Kelly ian.g.ke...@gmail.com wrote:
 sys.exit() is the preferred way.

Using sys.exit() after a fork() has other risks (principally,
duplication of pending output when flushing write-mode streams),
which is why os._exit() is provided.

 I thought the value 'n', passed in os._exit(n) would be the value I
 get returned.  In the case of a failure, I get 256 returned rather
 than 1.

 According to the docs ...
   [snip documentation and description]
 However, I would advise using the subprocess module for this instead
 of the os module (which is just low-level wrappers around system
 calls).

Indeed, subprocess gives you convenience, safety, and platform
independence (at least across POSIX-and-Windows) with a relatively
low cost.  As long as the cost is low enough (and it probably is)
I agree with this.

In article d195a74d-e173-4168-8812-c03fc02e8...@fr19g2000vbb.googlegroups.com
Ian  ian.l...@rocketmail.com wrote:
Where did you find the Unix docs you pasted in?  I didn't find it in
the man pages.  Thank you.  Based on what you say, I will change my
os._exit() to sys.exit().

Not sure where Ian Kelly's documentation came from, but note that on
Unix, the os module also provides os.WIFSIGNALED, os.WTERMSIG,
os.WIFEXITED, and os.WEXITSTATUS for dissecting the status
integer returned from the various os.wait* calls.

Again, if you use the subprocess module, you are insulated from
this sort of detail (which, as always, has both advantages and
disadvantages).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: those darn exceptions

2011-06-21 Thread Chris Torek
On Tue, 21 Jun 2011 01:43:39 +, Chris Torek wrote:
 But how can I know a priori
 that os.kill() could raise OverflowError in the first place?

In article 4e006912$0$29982$c3e8da3$54964...@news.astraweb.com
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
You can't. Even if you studied the source code, you couldn't be sure that 
it won't change in the future. Or that somebody will monkey-patch 
os.kill, or a dependency, introducing a new exception.

Indeed.  However, if functions that know which exceptions they
themselves can raise declare this (through an __exceptions__
attribute for instance), then whoever changes the source or
monkey-patches os.kill can also make the appropriate change to
os.kill.__exceptions__.

More importantly though, most functions are reliant on their argument. 
You *cannot* tell what exceptions len(x) will raise, because that depends 
on what type(x).__len__ does -- and that could be anything. So, in 
principle, any function could raise any exception.

Yes; this is exactly why you need a type-inference engine to make
this work.  In this case, len() is more (though not quite exactly)
like the following user-defined function:

def len2(x):
try:
fn = x.__len__
except AttributeError:
raise TypeError(object of type %r has no len() % type(x))
return fn()

eg:

 len(3)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: object of type 'int' has no len()
 len2(3)
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 5, in len2
TypeError: object of type type 'int' has no len()

In this case, len would not have any __exceptions__ field (or if
it does, it would not be a one-element tuple, but I currently think
it makes more sense for many of the built-ins to resort to rules
in the inference engine).  This is also the case for most operators,
e.g., ordinary + (or operator.add) is syntactic sugar for:

first_operand.__add__(second_operand)

or:

second_operand.__radd__(first_operand)

depending on both operands' types and the first operand's __add__.

The general case is clearly unsolveable (being isomorphic to the
halting problem), but this is not in itself an excuse for attempting
to solve more-specific cases.  A better excuse -- which may well
be better enough :-) -- occurs when the specific cases that *can*
be solved are so relatively-rare that the approach degenerates into
uselessness.

It is worth noting that the approach I have in mind does not
survive pickling, which means a very large subset of Python code
is indigestible to a pylint-like exception-inference engine.

Another question -- is the list of exceptions part of the function's 
official API? *All* of the exceptions listed, or only some of them?

All the ones directly-raised.  What to do about invisible
dependencies (such as those in len2() if len2 is invisible,
e.g., coded in C rather than Python) is ... less obvious. :-)

In general, you can't do this at compile-time, only at runtime. There's 
no point inspecting len.__exceptions__ at compile-time if len is a 
different function at runtime.

Right.  Which is why pylint is fallible ... yet pylint is still
valuable.  At least, I find it so.  It misses a lot of important
things -- it loses types across list operations, for instance --
but it catches enough to help.  Here is a made-up example based on
actual errors I have found via pylint:

doc
class Frob(object):
doc
def __init__(self, arg1, arg2):
self.arg1 = arg1
self.arg2 = arg2

def frob(self, nicate):
frobnicate the frob
self.arg1 += nicate

def quux(self):
return the frobnicated value
example = self # demonstrate that pylint is not using the *name*
return example.argl # typo, meant arg1
...

$ pylint frob.py
* Module frob
E1101: 15:Frob.quux: Instance of 'Frob' has no 'argl' member

(Loses types across list operations means that, e.g.:

def quux(self):
return [self][0].argl

hides the type, and hence the typo, from pylint.  At some point I
intend to go in and modify it to track the element-types of list
elements: in enough cases, a list's elements all have the same
type, which means we can predict the type of list[i].  If a list
contains mixed types, of course, we have to fall back to the
failure-to-infer case.)

(This also shows that much real code might raise IndexError: any
list subscript that is out of range does so.  So a lot of real
functions *might* raise IndexError, etc., which is another argument
that in real code, an exception inference engine will wind up
concluding that every line might raise every exception.  Which
might be true, but I still believe, for the moment, that a tool
for inferring exceptions would have some value.)
-- 
In-Real-Life: Chris Torek, Wind River

Re: Boolean result of divmod

2011-06-20 Thread Chris Torek
In article 261fc85a-ca6b-4520-93ed-27e78bc21...@y30g2000yqb.googlegroups.com
Gnarlodious  gnarlodi...@gmail.com wrote:
What is the easiest way to get the first number as boolean?

divmod(99.6, 30.1)

divmod returns a 2-tuple:

 divmod(99.6,30.1)
(3.0, 9.2901)

Therefore, you can subscript the return value to get either
element:

 divmod(99.6,30.1)[0]
3.0

Thus, you can call bool() on the subscripted value to convert
this to True-if-not-zero False-if-zero:

 bool(divmod(99.6,30.1)[0])
True
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


those darn exceptions

2011-06-20 Thread Chris Torek
Exceptions are great, but...

Sometimes when calling a function, you want to catch some or
even all the various exceptions it could raise.  What exceptions
*are* those?

It can be pretty obvious.  For instance, the os.* modules raise
OSError on errors.  The examples here are slightly silly until
I reach the real code at the bottom, but perhaps one will get
the point:

 import os
 os.kill(getpid(), 0) # am I alive?
 # yep, I am alive.
...

[I'm not sure why the interpreter wants more after my comment here.]

 os.kill(1, 0) # is init still running?
Traceback (most recent call last):
  File stdin, line 1, in module
OSError: [Errno 1] Operation not permitted
 # init is running, and I don't have permission to send it a signal
...
 os.kill(12345, 0) # what do we get for a pid that is NOT running?
Traceback (most recent call last):
  File stdin, line 1, in module
OSError: [Errno 3] No such process


So now I am ready to write my is process pid running function:

import os, errno

def is_running(pid):
Return True if the given pid is running, False if not.
try:
os.kill(pid, 0)
except OSError, err:
# We get an EPERM error if the pid is running
# but we are not allowed to signal it (even with
# signal 0).  If we get any other error we'll assume
# it's not running.
if err.errno != errno.EPERM:
return False
return True

This function works great, and never raises an exception itself.
Or does it?

 is_running(1)
True
 is_running(os.getpid())
True
 is_running(12345)
False
 is_running(9)
Traceback (most recent call last):
  File stdin, line 1, in module
  File stdin, line 3, in is_running
OverflowError: long int too large to convert to int

Oops!  It turns out that os.kill() can raise OverflowError (at
least in this version of Python, not sure what Python 3.x does).

Now, I could add, to is_running, the clause:

except OverflowError:
return False

(which is what I did in the real code).  But how can I know a priori
that os.kill() could raise OverflowError in the first place?  This
is not documented, as far as I can tell.  One might guess that
os.kill() would raise TypeError for things that are not integers
(this is the case) but presumably we do NOT want to catch that
here.  For the same reason, I certainly do not want to put in a
full-blown:

except Exception:
return False

It would be better just to note somewhere that OverflowError is
one of the errors that os.kill() normally produces (and then,
presumably, document just when this happens, so although having
noted that it can, one could make an educated guess).

Functions have a number of special __ attributes.  I think it
might be reasonable to have all of the built-in functions, at least,
have one more, perhaps spelled __exceptions__, that gives you a
tuple of all the exceptions that the function might raise.
Imagine, then:

 os.kill.__doc__
'kill(pid, sig)\n\nKill a process with a signal.'

[this part exists]

 os.kill.__exceptions__
(type 'exceptions.OSError', type 'exceptions.TypeError', type 
'exceptions.OverflowError', type 'exceptions.DeprecationWarning')

[this is my new proposed part]

With something like this, a pylint-like tool could compute the
transitive closure of all the exceptions that could occur in any
function, by using __exceptions__ (if provided) or recursively
finding exceptions for all functions called, and doing a set-union.
You could then ask which exceptions can occur at any particular
call site, and see if you have handled them, or at least, all the
ones you intend to handle.  (The DeprecationWarning occurs if you
pass a float to os.kill() -- which I would not want to catch.
Presumably the pylint-like tool, which might very well *be* pylint,
would have a comment directive you would put in saying I am
deliberately allowing these exceptions to pass on to my caller,
for the case where you are asking it to tell you which exceptions
you may have forgotten to catch.)

User functions could set __exceptions__ for documentation purposes
and/or speeding up this pylint-like tool.  (Obviously, user-provided
functions might raise exception classes that are only defined in
user-provided code -- but to raise them, those functions have to
include whatever code defines them, so I think this all just works.)
The key thing needed to make this work, though, is the base cases
for system-provided code written in C, which pylint by definition
cannot inspect to find a set of exceptions that might be raised.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40

Re: those darn exceptions

2011-06-20 Thread Chris Torek
In article mailman.211.1308626356.1164.python-l...@python.org
Chris Angelico  ros...@gmail.com wrote:
Interesting concept of pulling out all possible exceptions. Would be
theoretically possible to build a table that keeps track of them, but
automated tools may have problems:

a=5; b=7; c=12
d=1/(a+b-c) # This could throw ZeroDivisionError

if a+bc:
  d=1/(a+b-c) # This can't, because it's guarded.
else:
  d=2

And don't tell me to rewrite that with try/except, because it's not
the same :)

I don't know if pylint is currently (or eventually :-) ) smart
enough to realize that the if test here guarantees that a+b-c 
0 (if indeed it does guarantee it -- this depends on the types of
a, b, and c and the operations invoked by the + and - operators
here! -- but pylint *does* track all the types, to the extent that
it can, so it has, in theory, enough information to figure this out
for integers, at least).

If not, though, you could simply tell pylint not to complain
here (via the usual # pylint: disable=ID, presumably), rather
than coding it as a try/except sequence.

I'd be inclined to have comments about the exceptions that this can
itself produce, but if there's exceptions that come from illogical
arguments (like the TypeError above), then just ignore them and let
them propagate. If is_process(asdf) throws TypeError rather than
returning False, I would call that acceptable behaviour.

Right, this is precisely what I want: the ability to determine
which exceptions something might raise, catch some subset of them,
and allow the remaining ones to propagate.

I can do the catch subset, allow remainder to propagate but the
first step -- determine possible exceptions -- is far too difficult
right now.  I have not found any documentation that points out that
os.kill() can raise TypeError, OverflowError, and DeprecationWarning.
TypeError was not a *surprise*, but the other two were.

(And this is only os.kill().  What about, say, subprocess.Popen()?
Strictly speaking, type inference cannot help quite enough here,
because the subprocess module does this:

data = self._read_no_intr(errpipe_read, 1048576)
# Exceptions limited to 1 MB
os.close(errpipe_read)
if data != :
self._waitpid_no_intr(self.pid, 0)
child_exception = pickle.loads(data)
raise child_exception

and the pickle.loads() can create any exception sent to it from
the child, which can truly be any exception due to catching all
exceptions raised in preexec_fn, if there is one.  Pylint can't do
type inference across the error-pipe between child and parent here.
However, it would suffice to set subprocess.__exceptions__ to some
reasonable tuple, and leave the preexec_fn exceptions to the text
documentation.  [Of course, strictly speaking, the fact that the
read cuts off at 1 MB means that even the pickle.loads() call might
fail!  But a megabyte of exception trace is probably plenty. :-) ])
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Improper creating of logger instances or a Memory Leak?

2011-06-18 Thread Chris Torek
In article ebafe7b6-aa93-4847-81d6-12d396a4f...@j28g2000vbp.googlegroups.com
foobar  wjship...@gmail.com wrote:
I've run across a memory leak in a long running process which I can't
determine if its my issue or if its the logger.

You do not say what version of python you are using, but on the
other hand I do not know how much the logger code has evolved
over time anyway. :-)

 Each application thread gets a logger instance in it's init() method
via:

self.logger = logging.getLogger('ivr-'+str(self.rand))

where self.rand is a suitably large random number to avoid collisions
of the log file's name.

This instance will live forever (since the thread shares the
main logging manager with all other threads).
-
class Manager:

There is [under normal circumstances] just one Manager instance, which
holds the hierarchy of loggers.

def __init__(self, rootnode):

Initialize the manager with the root node of the logger hierarchy.

[snip]
self.loggerDict = {}

def getLogger(self, name):

Get a logger with the specified name (channel name), creating it
if it doesn't yet exist. This name is a dot-separated hierarchical
name, such as a, a.b, a.b.c or similar.

If a PlaceHolder existed for the specified name [i.e. the logger
didn't exist but a child of it did], replace it with the created
logger and fix up the parent/child references which pointed to the
placeholder to now point to the logger.

[snip]
self.loggerDict[name] = rv
[snip]
[snip]
Logger.manager = Manager(Logger.root)
-

So you will find all the various ivr-* loggers in
logging.Logger.manager.loggerDict[].

finally the last statements in the run() method are:

filehandler.close()
self.logger.removeHandler(filehandler)
del self.logger #this was added to try and force a clean up of
the logger instances.

There appears to be no __del__ handler and nothing that allows
removing a logger instance from the manager's loggerDict.  Of
course you could do this manually, e.g.:

...
self.logger.removeHandler(filehandler)
del logging.Logger.manager.loggerDict[self.logger.name]
del self.logger # optional

I am curious as to why you create a new logger for each thread.
The logging module has thread synchronization in it, so that you
can share one log (or several logs) amongst all threads, which is
more typically what one wants.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: import from environment path

2011-06-18 Thread Chris Torek
In article 3a2b0261-ee10-40c0-8fad-342f186ee...@q30g2000yqb.googlegroups.com
Guillaume Martel-Genest  guillaum...@gmail.com wrote:
Here's my situation : I got a script a.py that need to call b.py. The
2 scripts can't be in a same package. Script a.py knows the path of
b.py relative to an environment variable B_PATH, let's say B_PATH/foo/
b.py. The solution I found is to do the flowwing :

b_dir = os.path.join(os.environ['B_PATH'], 'foo')
sys.path.append(b_dir)
import b
b.main()

Is it the right way to do it, should I use subprocess.call instead?

The right way depends on what you want to happen.

Consider, e.g., the case where sys.path starts with:

['/some/where/here', '/some/where/there', ...]

and program a.py lives in /some/where/here.  Suppose B_PATH is
'/where/b/is'.  The sys.path.append will leave sys.path set to:

['/some/where/here', '/some/where/there', ..., '/where/b/is']

If /some/where/there happens to contain a b.py, your import b
will load /some/where/there/b.py rather than /where/b/is/b.py.

Did you want that?  Well, then, good!  If not ... bad! :-)

Consider what happens if there is a bug in b.main(), or b.main()
is missing entirely.  Then import b works, but the call b.main()
raises an exception directly in program a.py.

Did you want that?  Well, then, good!  If not ... bad! :-)

You might also want to take a look at PEP 302:

http://www.python.org/dev/peps/pep-0302/

If you use subprocess to run program B, it cannot affect program
A in any way that program A does not allow.  This gives you a lot
more control, with the price you pay being that you need to open
some kind of communications channel between the two programs if
you want more than the simplest kinds of data transfer.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to insert sorted in a list

2011-06-17 Thread Chris Torek
In article c63e771c-8968-4d7a-9c69-b7fa6ff34...@35g2000prp.googlegroups.com
SherjilOzair  sherjiloz...@gmail.com wrote:
There are basically two ways to go about this.
One is, to append the new value, and then sort the list.
Another is to traverse the list, and insert the new value at the
appropriate position.

The second one's complexity is O(N), while the first one's is O(N *
log N).

This is not quite right; see below.

Still, the second one works much better, because C code is being used
instead of pythons.

Still, being a programmer, using the first way (a.insert(x);
a.sort()), does not feel right.

What has the community to say about this ? What is the best (fastest)
way to insert sorted in a list ?

In this case, the best way is most likely don't do that at all.

First, we should note that a python list() data structure is actually
an array.  Thus, you can locate the correct insertion point pretty
fast, by using a binary or (better but not as generally applicable)
interpolative search to find the proper insertion point.

Having found that point, though, there is still the expense of
the insertion, which requires making some room in the array-that-
makes-the-list (I will use the name a as you did above):

position = locate_place_for_insert(a, the_item)
# The above is O(log n) for binary search,
# O(log log n) for interpolative search, where
# n is len(a).

a.insert(position, the_item)
# This is still O(log n), alas.

Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with:

# wrong way:
# for item in large_list:
#a.append(item)
# right way, but fundamentally still the same cost (constant
# factor is much smaller due to built-in append())
a.append(large_list)

If len(large_list) is m, this is O(m).  Inserting each item in
the right place would be O(m log (n + m)).  But we still
have to sort:

a.sort()

This is O(log (n + m)), hence likely better than repeatedly inserting
in the correct place.

Depending on your data and other needs, though, it might be best
to use a red-black tree, an AVL tree, or a skip list.  You might
also investigate radix sort, radix trees, and ternary search trees
(again depending on your data).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Best way to insert sorted in a list

2011-06-17 Thread Chris Torek
In article itgi3801...@news2.newsguy.com I wrote, in part:
Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with: [...]

In article mailman.96.1308348643.1164.python-l...@python.org
Ethan Furman et...@stoneleaf.us wrote:

 a.append(large_list)
 ^- should be a.extend(large_list)

Er, right.  Posted in haste (had to get out the door).  I also
wrote:

 If len(large_list) is m, this is O(m).  Inserting each item in
 the right place would be O(m log (n + m)).  But we still
 have to sort:

 a.sort()

In article mailman.98.1308353648.1164.python-l...@python.org,
Ian Kelly  ian.g.ke...@gmail.com wrote:
 This is O(log (n + m)), hence likely better than repeatedly inserting
 in the correct place.

Surely you mean O((n + m) log (n + m)).

Er, maybe?  (It depends on the relative values of m and n, and
the underlying sort algorithm to some extent. Some algorithms are
better at inserting a relatively small number of items into a
mostly-sorted large list.  As I recall, Shell sort does well with
this.)  But generally, yes.  See posted in haste above. :-)

There are a lot of other options, such as sorting just the list of
items to be inserted, which lets you do a single merge pass:

# UNTESTED
def merge_sorted(it1, it2, must_copy = True):

Merge two sorted lists/iterators it1 and it2.
Roughly equivalent to sorted(list(it2) + list(it2)),
except for attempts to be space-efficient.

You can provide must_copy = False if the two iterators
are already lists and can be destroyed for the purpose
of creating the result.


# If it1 and it1 are deque objects, we don't need to
# reverse them, as popping from the front is efficient.
# If they are plain lists, popping from the end is
# required.  If they are iterators or tuples we need
# to make a list version anyway.  So:
if must_copy:
it1 = list(it1)
it2 = list(it2)

# Reverse sorted lists (it1 and it2 are definitely
# lists now) so that we can pop off the end.
it1.reverse()
it2.reverse()

# Now accumulate final sorted list.  Basically, this is:
# take first (now last) item from each list, and put whichever
# one is smaller into the result.  When either list runs
# out, tack on the entire remaining list (whichever one is
# non-empty -- if both are empty, the two extend ops are
# no-ops, so we can just add both lists).
#
# Note that we have to re-reverse them to get
# them back into forward order before extending.
result = []
while it1 and it2:
# Note: I don't know if it might be faster
# to .pop() each item and .append() the one we
# did not want to pop after all.  This is just
# an example, after all.
last1 = it1[-1]
last2 = it2[-1]
if last2  last1:
result.append(last2)
it2.pop()
else:
result.append(last1)
it1.pop()
it1.reverse()
it2.reverse()
result.extend(it1)
result.extend(it2)
return result

So, now if a is the original (sorted) list and b is the not-yet-
sorted list of things to add:

a = merge_sorted(a, sorted(b), must_copy = False)

will work, provided you are not required to do the merge in place.
Use the usual slicing trick if that is necessary:

a[:] = merge_sorted(a, sorted(b), must_copy = False)

If list b is already sorted, leave out the sorted() step.  If list
b is not sorted and is particularly long, use b.sort() to sort in
place, rather than making a sorted copy.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.path and Path

2011-06-16 Thread Chris Torek
Steven D'Aprano wrote:
 Why do you think there's no Path object in the standard library? *wink*

In article mailman.16.1308239495.1164.python-l...@python.org
Ethan Furman  et...@stoneleaf.us wrote:
Because I can't find one in either 2.7 nor 3.2, and every reference I've 
found has indicated that the other Path contenders were too 
all-encompassing.

What I think Steven D'Aprano is suggesting here is that the general
problem is too hard, and specific solutions too incomplete, to
bother with.

Your own specific solution might work fine for your case(s), but it
is unlikely to work in general.

I am not aware of any Python implementations for VMS, CMS, VM,
EXEC-8, or other dinosaurs, but it would be ... interesting.
Consider a typical VMS full pathname:

DRA0:[SYS0.SYSCOMMON]FILE.TXT;3

The first part is the (literal) disk drive (a la MS-DOS A: or C:
but slightly more general).  The part in [square brackets] is the
directory path.  The extension (.txt) is limited to three characters,
and the part after the semicolon is the file version number, so
you can refer to a backup version.  (Typically one would use a
logical name like SYS$SYSROOT in place of the disk and/or
directory-sequence, so as to paper over the overly-rigid syntax.)

Compare with an EXEC-8 (now, apparently, OS 2200 -- I guess it IS
still out there somewhere) file name:

QUAL*FILE(cyclenumber)

where cycle-numbers are relative, i.e., +0 means use the current
file while +1 means create a new one and -1 means use the
first backup.  (However, one normally tied external file names to
internal names before running a program, via the @USE statement.)
The vile details are still available here:

   http://www.bitsavers.org/pdf/univac/1100/UE-637_1108execUG_1970.pdf

(Those of you who have never had to deal with these machines, as I
did in the early 1980s, should consider yourselves lucky. :-) )
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is the most efficient way to compare similar contents in two lists?

2011-06-13 Thread Chris Torek
In article mailman.188.1307988677.11593.python-l...@python.org
Chris Angelico  ros...@gmail.com wrote:
If order and duplicates matter, then you want a completely different
diff. I wrote one a while back, but not in Python. ...

If order and duplicates matter, one might want to look into
difflib. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Infinite recursion in __reduce__ when calling original base class reduce, why?

2011-06-13 Thread Chris Torek
In article 4df669ea$0$49182$e4fe5...@news.xs4all.nl
Irmen de Jong  irmen.nos...@xs4all.nl wrote:
I've pasted my test code below. It works fine if 'substitute' is True,
but as soon as it is set to False, it is supposed to call the original
__reduce__ method of the base class. However, that seems to crash
because of infinite recursion on Jython and IronPython and I don't
know why. It works fine in CPython and Pypy.

In this particular case (no fancy inheritance going on), the base
__reduce__ method would be object.__reduce__.  Perhaps in those
implementations, object.__reduce__ goes back to TestClass.__reduce__,
rather than being appropriately magic.

I wonder if my understanding of __reduce__ is wrong, or that I've
hit a bug in IronPython and Jython?  Do I need to do something with
__reduce_ex__ as well?

You should not *need* to; __reduce_ex__ is just there so that you
can do something different for different versions of the pickle
protocol (I believe).

Nonetheless, there is something at least slightly suspicious here:

import pickle

class Substitute(object):
def __init__(self, name):
self.name=name
def getname(self):
return self.name

class TestClass(object):
def __init__(self, name):
self.name=name
self.substitute=True
def getname(self):
return self.name
def __reduce__(self):
if self.substitute:
return Substitute, (SUBSTITUTED:+self.name,)
else:
# call the original __reduce__ from the base class
return super(TestClass, self).__reduce__()  #  crashes on
ironpython/jython
[snip]

In general, the way __reduce__ is written in other class implementations
(as distributed with Python2.5 at least) boils down to the very
simple:

def __reduce__(self):
return self.__class__, (arg, um, ents)

For instance, consider a class with a piece that looks like this:

def __init__(self, name, value):
self.name = name
self.value = value
self.giant_cached_state = None

def make_parrot_move(self):
if self.giant_cached_state is None:
self._do_lots_of_computation()
return self._quickstuff_using_cache()

Here, the Full Internal State is fairly long but the part that
needs to be saved (or, for copy operations, copied -- but you can
override this with __copy__ and __deepcopy__ members, if copying
the cached state is a good idea) is quite short.  Pickled instances
need only save the name and value, not any of the computed cached
stuff (if present).  So:

def __reduce__(self):
return self.__class__, (name, value)

If you define this (and no __copy__ and no __deepcopy__), the
pickler will save the name and value and call __init__ with the
name and value arguments.  The copy.copy and copy.deepcopy operations
will also call __init__ with these arguments (unless you add
__copy__(self) and __deepcopy__(self) functions).

So, it seems like in this case, you would want:

def __reduce__(self):
if self.substitute:
return Substitute, (SUBSTITUTED:+self.name,)
else:
return self.__class__, (self.name,)

or if you want to be paranoid and only do a Substitute if
self.__class__ is your own class:

if type(self) == TestClass and self.substitute:
return Substitute, (SUBSTITUTED:+self.name,)
else:
return self.__class__, (self.name,)

In CPython, if I import your code (saved in foo.py):

 x = foo.TestClass(janet)
 x
foo.TestClass object at 0x66290
 x.name
'janet'
 x.__reduce__()
(class 'foo.Substitute', ('SUBSTITUTED:janet',))
 x.substitute=False
 x.__reduce__()
(function _reconstructor at 0x70bf0, (class 'foo.TestClass', type 
'object', None), {'name': 'janet', 'substitute': False})

which is of course the same as:

 object.__reduce__(x)
(function _reconstructor at 0x70bf0, (class 'foo.TestClass', type 
'object', None), {'name': 'janet', 'substitute': False})

which means that CPython's object.__reduce__() uses a smart fallback
reconstructor.  Presumably IronPython and Jython lack this.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool

2011-06-12 Thread Chris Torek
In article mailman.105.1307737402.11593.python-l...@python.org
Hseu-Ming Chen  hseum...@gmail.com wrote:
I am having an issue when making a shell call from within a
multiprocessing.Process().  Here is the story: i tried to parallelize
the computations in 800-ish Matlab scripts and then save the results
to MySQL.   The non-parallel/serial version has been running fine for
about 2 years.  However, in the parallel version via multiprocessing
that i'm working on, it appears that the Matlab scripts have never
been kicked off and nothing happened with subprocess.Popen.  The debug
printing below does not show up either.

I obviously do not have your code, and have not even tried this as
an experiment in a simplified environment, but:

import subprocess
from multiprocessing import Pool

def worker(DBrow,config):
   #  run one Matlab script
   cmd1 = /usr/local/bin/matlab  ...  myMatlab.1.m
   subprocess.Popen([cmd1], shell=True, 
 stdout=subprocess.PIPE).communicate()[0]
   print this does not get printed
 ...
# kick off parallel processing
pool = Pool()
for DBrow in DBrows: pool.apply_async(worker,(DBrow,config))
pool.close()
pool.join()

The multiprocessing code makes use of pipes to communicate between
the various subprocesses it creates.  I suspect these extra pipes
are interfering with your subprocesses, when pool.close() waits
for the Matlab script to do something with its copy of the pipes.
To make the subprocess module close them -- so that Matlab does
not have them in the first place and hence pool.close() cannot get
stuck there -- add close_fds=True to the Popen() call.

There could still be issues with competing wait() and/or waitpid()
calls (assuming you are using a Unix-like system, or whatever the
equivalent is for Windows) eating the wrong subprocess completion
notifications, but that one is harder to solve in general :-) so
if close_fds fixes things, it was just the pipes.  If close_fds
does not fix things, you will probably need to defer the pool.close()
step until after all the subprocesses complete.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to avoid leading white spaces

2011-06-08 Thread Chris Torek
On 03/06/2011 03:58, Chris Torek wrote:
 -
 This is a bit surprising, since both s1 in s2 and re.search()
 could use a Boyer-Moore-based algorithm for a sufficiently-long
 fixed string, and the time required should be proportional to that
 needed to set up the skip table.  The re.compile() gets to re-use
 the table every time.

In article mailman.2508.1307394262.9059.python-l...@python.org
Ian  hobso...@gmail.com wrote:
Is that true?  My immediate thought is that Boyer-Moore would quickly give
the number of characters to skip, but skipping them would be slow because
UTF8 encoded characters are variable sized, and the string would have to be
walked anyway.

As I understand it, strings in python 3 are Unicode internally and
(apparently) use wchar_t.  Byte strings in python 3 are of course
byte strings, not UTF-8 encoded.

Or am I misunderstanding something.

Here's python 2.7 on a Linux box:

 print sys.getsizeof('a'), sys.getsizeof('ab'), sys.getsizeof('abc')
38 39 40
 print sys.getsizeof(u'a'), sys.getsizeof(u'ab'), sys.getsizeof(u'abc')
56 60 64

This implies that strings in Python 2.x are just byte strings (same
as b... in Python 3.x) and never actually contain unicode; and
unicode strings (same as ... in Python 3.x) use 4-byte characters
per that box's wchar_t.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Validating string for FDQN

2011-06-07 Thread Chris Torek
On Tue, Jun 7, 2011 at 3:23 PM, Nobody nob...@nowhere.com wrote:
 [1] If a hostname ends with a dot, it's fully qualified.
[otherwise not, so you have to use the resolver]

In article mailman.2521.1307425928.9059.python-l...@python.org,
Chris Angelico  ros...@gmail.com wrote:
Outside of BIND files, when do you ever see a name that actually ends
with a dot?

I type them in this way sometimes, when poking at network issues. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to avoid leading white spaces

2011-06-06 Thread Chris Torek
In article ef48ad50-da06-47a8-978a-47d6f4271...@d28g2000yqf.googlegroups.com
ru...@yahoo.com ru...@yahoo.com wrote (in part):
[mass snippage]
What I mean is that I see regexes as being an extremely small,
highly restricted, domain specific language targeted specifically
at describing text patterns.  Thus they do that job better than
than trying to describe patterns implicitly with Python code.

Indeed.

Kernighan has often used / supported the idea of little languages;
see:

http://www.princeton.edu/~hos/frs122/precis/kernighan.htm

In this case, regular expressions form a little language that is
quite well suited to some lexical analysis problems.  Since the
language is (modulo various concerns) targeted at the right level,
as it were, it becomes easy (modulo various concerns :-) ) to
express the desired algorithm precisely yet concisely.

On the whole, this is a good thing.

The trick lies in knowing when it *is* the right level, and how to
use the language of REs.

On 06/03/2011 08:05 PM, Steven D'Aprano wrote:
 If regexes were more readable, as proposed by Wall, that would go
 a long way to reducing my suspicion of them.

Suspicion seems like an odd term here.

Still, it is true that something (whether it be use of re.VERBOSE,
and whitespace-and-comments, or some New and Improved Syntax) could
help.  Dense and complex REs are quite powerful, but may also contain
and hide programming mistakes.  The ability to describe what is
intended -- which may differ from what is written -- is useful.

As an interesting aside, even without the re.VERBOSE flag, one can
build complex, yet reasonably-understandable, REs in Python, by
breaking them into individual parts and giving them appropriate
names.  (This is also possible in perl, although the perl syntax
makes it less obvious, I think.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-06-05 Thread Chris Torek
In article mailman.2438.1307133316.9059.python-l...@python.org
Chris Angelico  ros...@gmail.com wrote:
Uhh, noob question here. I'm way out of my depth with hardware
floating point.

Isn't a signaling nan basically the same as an exception?

Not exactly, but one could think of them as very similar.

Elsethread, someone brought up the key distinction, which is
that in hardware that implements IEEE arithmetic, you have two
possibilities at pretty much all times:

 - op(args) causes an exception (and therefore does not deliver
   a result), or
 - op(args) delivers a result that may indicate exception-like
   lack of result.

In both cases, a set of accrued exceptions flags accumulates the
new exception, and a set of most recent exceptions flags tells
you about the current exception.  A set of exception enable
flags -- which has all the same elements as current and
accrued -- tells the hardware which exceptional results
should trap.

A number is NaN if it has all-1-bits for its exponent and at
least one nonzero bit in its mantissa.  (All-1s exponent, all-0s
mantissa represents Infinity, of the sign specified by the sign
bit.)  For IEEE double precision floating point, there are 52
mantissa bits, so there are (2^52-1) different NaN bit patterns.
One of those 52 bits is the please signal on use bit.

A signalling NaN traps at (more or less -- details vary depending
on FPU architecture) load time.  However, there must necessarily
(for OS and thread-library level context switching) be a method
of saving the FPU state without causing an exception when loading
a NaN bit pattern, even if the NaN has the signal bit set.

Which would imply that the hardware did support exceptions (if it
did indeed support IEEE floating point, which specifies signalling nan)?

The actual hardware implementations (of which there are many) handle
the niggling details differently.  Some CPUs do not implement
Infinity and NaN in hardware at all, delivering a trap to the OS
on every use of an Inf-or-NaN bit pattern.  The OS then has to
emulate what the hardware specification says (if anything), and
make it look as though the hardware did the job.  Sometimes denorms
are also done in software.

Some implementations handle everything directly in hardware, and
some of those get it wrong. :-)  Often the OS has to fix up some
special case -- for instance, the hardware might trap on every NaN
and make software decide whether the bit pattern was a signalling
NaN, and if so, whether user code should receive an exception.

As I think John Nagle pointed out earlier, sometimes the hardware
does support exceptions, but rather loosely, where the hardware
delivers a morass of internal state and a vague indication that
one or more exceptions happened somewhere near address A,
leaving a huge pile of work for software.

In Python, the decimal module gets everything either right or
close-to-right per the (draft? final? I have not kept up with
decimal FP standards) standard.  Internal Python floating point,
not quite so much.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-06-05 Thread Chris Torek
 On Mon, Jun 6, 2011 at 8:54 AM, Chris Torek nos...@torek.net wrote:
 A signalling NaN traps at (more or less -- details vary depending on
 FPU architecture) load time.

On Mon, 06 Jun 2011 09:13:25 +1000, Chris Angelico wrote:
 Load. By this you mean the operation of taking a bit-pattern in RAM and
 putting it into a register? So, you can calculate 0/0, get a signalling
 NaN, and then save that into a memory variable, all without it trapping;
 and then it traps when you next perform an operation on that number?

I mean, if you think of the FPU as working (in principle) with
either just one or two registers and a load/store architecture, or
a tiny little FPU-stack (the latter is in fact the case for Intel
FPUs), with no optimization, you get a trap when you attempted to
load-up the sNaN value in order to do some operation on it.  For
instance, if x is an sNaN, y = x + 1 turns into load x; load
1.0; add; store y and the trap occurs when you do load x.

In article 4dec2ba6$0$29996$c3e8da3$54964...@news.astraweb.com,
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
The intended behaviour is operations on quiet NANs should return NANs, 
but operations on signalling NANs should cause a trap, which can either 
be ignored, and converted into a quiet NAN, or treated as an exception.

E.g. in Decimal:

 import decimal
 qnan = decimal.Decimal('nan')  # quiet NAN
 snan = decimal.Decimal('snan')  # signalling NAN
 1 + qnan
Decimal('NaN')
 1 + snan
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/local/lib/python3.1/decimal.py, line 1108, in __add__
ans = self._check_nans(other, context)
  File /usr/local/lib/python3.1/decimal.py, line 746, in _check_nans
self)
  File /usr/local/lib/python3.1/decimal.py, line 3812, in _raise_error
raise error(explanation)
decimal.InvalidOperation: sNaN

Moreover:

cx = decimal.getcontext()
cx
   Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-9, Emax=9, 
capitals=1, flags=[], traps=[DivisionByZero, Overflow, InvalidOperation])
cx.traps[decimal.InvalidOperation] = False
snan
   Decimal(sNaN)
1 + snan
   Decimal(NaN)

so as you can see, by ignoring the InvalidOperation exception, we
had our sNaN converted to a (regular, non-signal-ing, quiet) NaN,
and 1 + NaN is still NaN.

(I admit that my mental model using loads can mislead a bit since:

 cx.traps[decimal.InvalidOperation] = True # restore trapping
 also_snan = snan


A simple copy operation is not a load in this particular sense,
and on most real hardware, one just uses an ordinary 64-bit integer
memory-copying operation to copy FP bit patterns from one place to
another.)

There is some good information on wikipedia:

http://en.wikipedia.org/wiki/NaN

(Until I read this, I was not aware that IEEE now recommends that
the quiet-vs-signal bit be 1-for-quiet 0-for-signal. I prefer the
other way around since you can then set memory to all-1-bits if it
contains floating point numbers, and get exceptions if you refer
to a value before seting it.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing.connection magic

2011-06-03 Thread Chris Torek
In article mailman.2417.1307082948.9059.python-l...@python.org
Claudiu Popa  cp...@bitdefender.com wrote:
Hello guys,
  While  working  at a dispatcher using
  multiprocessing.connection.Listener  module  I've stumbled upon some
  sortof  magic  trick  that  amazed  me. How is this possible and
  what  does  multiprocessing  library doing in background for this to
  work?

Most of Python's sharing routines (including multiprocessing
send, in this case) use the pickle routines to package data
for transport between processes.

Thus, you can see the magic pretty simply:

  Client, Python 2.6

   from multiprocessing.connection import Client
   client = Client((localhost, 8080))
   import shutil
   client.send(shutil.copy)

Here I just use pickle.dumps() to return (and print, since we are
in the interpreter) the string representation that client.send()
will send:

import pickle
import shutil
pickle.dumps(shutil.copy)
   'cshutil\ncopy\np0\n.'
   

  Server, 3.2
   from multiprocessing.connection import Listener
   listener = Listener((localhost, 8080))
   con = listener.accept()
   data = con.recv()
   data
  function copy at 0x024611E0
   help(data)
  Help on function copy in module shutil:
[snip]

On this end, the (different) version of python simply unpickles the
byte stream.  Starting a new python session (to get rid of any
previous imports):

$ python
...
 import pickle
 pickle.loads('cshutil\ncopy\np0\n.')
function copy at 0x86ef0
 help(_)
Help on function copy in module shutil:
...

The real magic is in the unpickler, which has figured out how to
access shutil.copy without importing shutil into the global namespace:

 shutil
Traceback (most recent call last):
  File stdin, line 1, in module
NameError: name 'shutil' is not defined


but we can expose that magic as well, by feeding pickle.loads()
a bad string:

 pickle.loads('cNotAModule\nfunc\np0\n.')
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py,
 line 1374, in loads
return Unpickler(file).load()
  File 
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py,
 line 858, in load
dispatch[key](self)
  File 
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py,
 line 1090, in load_global
klass = self.find_class(module, name)
  File 
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py,
 line 1124, in find_class
__import__(module)
ImportError: No module named NotAModule


Note the rather total lack of security here -- in the receiver, by
doing con.recv(), you are trusting the sender not to send you a
dangerous or invalid pickle-data-stream.  This is why the documentation
includes the following:

Warning: The Connection.recv() method automatically unpickles
the data it receives, which can be a security risk unless you
can trust the process which sent the message.

Therefore, unless the connection object was produced using Pipe()
you should only use the recv() and send() methods after performing
some sort of authentication. See Authentication keys.

(i.e., do that :-) -- see the associated section on authentication)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-06-03 Thread Chris Torek
On 2011-06-02, Nobody nob...@nowhere.com wrote:
 (I note that Python actually raises an exception for 0.0/0.0).

In article isasfm$inl$1...@reader1.panix.com
Grant Edwards  invalid@invalid.invalid wrote:
IMHO, that's a bug.  IEEE-754 states explicit that 0.0/0.0 is NaN.
Pythons claims it implements IEEE-754.  Python got it wrong.

Indeed -- or at least, inconsistent.  (Again I would not mind at
all if Python had raise exception on NaN-result mode *as well
as* quietly make NaN, perhaps using signalling vs quiet NaN to
tell them apart in most cases, plus some sort of floating-point
context control, for instance.)

 Also, note that the convenience of NaN (e.g. not propagating from
 the untaken branch of a conditional) is only available for
 floating-point types. If it's such a good idea, why don't we have it
 for other types?

Mostly because for integers it's too late and there is no standard
for it.  For others, well:

 import decimal
 decimal.Decimal('nan')
Decimal(NaN)
 _ + 1
Decimal(NaN)
 decimal.setcontext(decimal.ExtendedContext)
 print decimal.Decimal(1) / 0
Infinity
 [etc]

(Note that you have to set the decimal context to one that does
not produce a zero-divide exception, such as the pre-loaded
decimal.ExtendedContext.  On my one Python 2.7 system -- all the
rest are earlier versions, with 2.5 the highest I can count on,
and that only by upgrading it on the really old work systems --
I note that fractions.Fraction(0,0) raises a ZeroDivisionError,
and there is no fractions.ExtendedContext or similar.)

 The definition is entirely arbitrary.

I don't agree, but even if was entirely arbitrary, that doesn't make
the decision meaningless.  IEEE-754 says it's True, and standards
compliance is valuable.  Each country's decision to drive on the
right/left side of the road is entire arbitrary, but once decided
there's a huge benefit to everybody following the rule.

This analogy perhaps works better than expected.  Whenever I swap
between Oz or NZ and the US-of-A, I have a brief mental clash that,
if I am not careful, could result in various bad things. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to avoid leading white spaces

2011-06-03 Thread Chris Torek
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote:
[prefers]
 re.split ('[ ,]', source)

This is probably not what you want in dealing with
human-created text:

 re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam', 'maps']

Instead, you probably want a comma followed by zero or
more spaces; or, one or more spaces:

 re.split(r',\s*|\s+', 'foo bar, spam,maps')
['foo', 'bar', 'spam', 'maps']

or perhaps (depending on how you want to treat multiple
adjacent commas) even this:

 re.split(r',+\s*|\s+', 'foo bar, spam,maps,, eggs')
['foo', 'bar', 'spam', 'maps', 'eggs']

although eventually you might want to just give in and use the
csv module. :-)  (Especially if you want to be able to quote
commas, for instance.)

 ...  With regexes the code is likely to be less brittle than a
 dozen or more lines of mixed string functions, indexes, and
 conditionals.

In article 94svm4fe7...@mid.individual.net
Neil Cerutti  ne...@norwich.edu wrote:
[lots of snippage]
That is the opposite of my experience, but YMMV.

I suspect it depends on how familiar the user is with regular
expressions, their abilities, and their limitations.

People relatively new to REs always seem to want to use them
to count (to balance parentheses, for instance).  People who
have gone through the compiler course know better. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to avoid leading white spaces

2011-06-02 Thread Chris Torek
In article 94ph22frh...@mid.individual.net
 Neil Cerutti ne...@norwich.edu wrote:
 Python's str methods, when they're sufficent, are usually more
 efficient.

In article roy-e2fa6f.21571602062...@news.panix.com
Roy Smith  r...@panix.com replied:
I was all set to say, prove it! when I decided to try an experiment.  
Much to my surprise, for at least one common case, this is indeed 
correct.
 [big snip]
t1 = timeit.Timer('laoreet' in text,
 text = '%s' % text)
t2 = timeit.Timer(pattern.search(text),
  import re; pattern = re.compile('laoreet'); text = 
'%s' % text)
print t1.timeit()
print t2.timeit()
-
./contains.py
0.990975856781
1.91417002678
-

This is a bit surprising, since both s1 in s2 and re.search()
could use a Boyer-Moore-based algorithm for a sufficiently-long
fixed string, and the time required should be proportional to that
needed to set up the skip table.  The re.compile() gets to re-use
the table every time.  (I suppose in could as well, with some sort
of cache of recently-built tables.)

Boyer-Moore search is roughly O(M/N) where M is the length of the
text being searched and N is the length of the string being sought.
(However, it depends on the form of the string, e.g., searching
for ababa is not as good as searching for abcde.)

Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries (one per possible ord() value).  However, if the
string being sought is all single-byte values, a 256-element
table suffices; re.compile(), at least, could scan the pattern
and choose an appropriate underlying search algorithm.

There is an interesting article here as well:
   http://effbot.org/zone/stringlib.htm
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to avoid leading white spaces

2011-06-02 Thread Chris Torek
In article is9ikg0...@news1.newsguy.com,
 Chris Torek nos...@torek.net wrote:
 Python might be penalized by its use of Unicode here, since a
 Boyer-Moore table for a full 16-bit Unicode string would need
 65536 entries (one per possible ord() value).

In article roy-751fac.23443902062...@news.panix.com
Roy Smith  r...@panix.com wrote:
I'm not sure what you mean by full 16-bit Unicode string?  Isn't 
unicode inherently 32 bit?

Well, not exactly.  As I understand it, Python is normally built
with a 16-bit unicode character type though (using either UCS-2
or UTF-16 internally; but I admit I have been far too lazy to look
up stuff like surrogates here :-) ).

In any case, while I could imagine building a 2^16 entry jump table, 
clearly it's infeasible (with today's hardware) to build a 2^32 entry 
table. But, there's nothing that really requires you to build a table at 
all.  If I understand the algorithm right, all that's really required is 
that you can map a character to a shift value.

Right.  See the URL I included for an example.  The point here,
though, is ... well:

For an 8 bit character set, an indexed jump table makes sense.  For a 
larger character set, I would imagine you would do some heuristic 
pre-processing to see if your search string consisted only of characters 
in one unicode plane and use that fact to build a table which only 
indexes that plane.  Or, maybe use a hash table instead of a regular 
indexed table.

Just so.  You have to pay for one scan through the string to build
a hash-table of offsets -- an expense similar to that for building
the 256-entry 8-bit table, perhaps, depending on string length --
but then you pay again for each character looked-at, since:

skip = hashed_lookup(table, this_char);

is a more complex operation than:

skip = table[this_char];

(where table is a simple array, hence the C-style semicolons: this
is not Python pseudo-code :-) ).  Hence, a penalty.

Not as fast, but only slower by a small constant factor, 
which is not a horrendous price to pay in a fully i18n world :-)

Indeed.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Updated blog post on how to use super()

2011-06-01 Thread Chris Torek
Summary: super(cls, data) in a method gets you the next handler
for a given class cls and an instance data that has derived
from that class at some point.  In Python 2 you must spell out the
names of the class and instance (normally self) explicitly, while
Python 3 grabs, at compile time, the class from the lexically
enclosing class, and the instance from the first argument of the
method that invokes super.

The next handler depends on the instance's __mro__.  If all
your classes use at most single inheritance, the next handler
in class Cls1 is easy to predict:

class Cls1(Cls2):

Any instance of Cls1 always has Cls2 as its next, so:

def method(self, arg1, arg2):
...
Cls2.method(self, arg1_mutated, arg2_mutated)
...

works fine.  But if you use multiple inheritance, the next method
is much harder to predict.  If you have a working super, you
can use:

super().method(self, arg1_mutated, arg2_mutated)

and it will find the correct next method in all cases.

In article is5qd7$t5b$1...@speranza.aioe.org
Billy Mays  no...@nohow.com wrote:
What it does is clear to me, but why is it interesting or special isn't. 
  This looks like a small feature that would be useful in a handful of 
cases.

Indeed: it is useful when you have multiple inheritance, which for
most programmers, is a handful of cases.

However, provided you *have* the Py3k super() in the first place,
it is also trivial and obviously-correct to write:

super().method(...)

whereas writing:

NextClass.method(...)

requires going up to the class definition to make sure that
NextClass is indeed the next class, and hence -- while usually
no more difficult to write -- less obviously-correct.

Moreover, if you write the easy-to-write obviously-correct
super().method, *your* class may now be ready for someone
else to use in a multiple-inheritance (MI) situation.  If you type
in the not-as-obviously-correct NextClass.method, *your* class
is definitely *not* ready for someone else to use in that MI
situation.

(I say may be ready for MI, because being fully MI ready requires
several other code discipline steps.  The point of super() -- at
least when implemented nicely, as in Py3k -- is that it makes it
easy -- one might even say super easy :-) -- to write your code
such that it is obviously correct, and also MI-friendly.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-06-01 Thread Chris Torek
Carl Banks wrote:
 For instance, say you are using an implementation that uses
  floating point, and you define a function that uses Newton's
  method to find a square root:
 
 def square_root(N,x=None):
 if x is None:
 x = N/2
 for i in range(100):
 x = (x + N/x)/2
 return x
 
 It works pretty well on your floating-point implementation.
  Now try running it on an implementation that uses fractions
  by default
 
 (Seriously, try running this function with N as a Fraction.)

In article mailman.2376.1306950997.9059.python-l...@python.org
Ethan Furman  et...@stoneleaf.us wrote:
Okay, will this thing ever stop?  It's been running for 90 minutes now. 
  Is it just incredibly slow?

The numerator and denominator get very big, very fast.

Try adding a bit of tracing:

for i in range(100):
x = (x + N/x) / 2
print 'refinement %d: %s' % (i + 1, x)

and lo:

 square_root(fractions.Fraction(5,2))
refinement 1: 13/8
refinement 2: 329/208
refinement 3: 216401/136864
refinement 4: 93658779041/59235012928
refinement 5: 17543933782901678712641/11095757974628660884096
refinement 6: 
615579225157677613558476890352854841917537921/389326486355976942712506162834130868382115072
refinement 7: 
757875564891453502666431245010274191070178420221753088072252795554063820074969259096915201/479322593608746863553102599134385944371903608931825380820104910630730251583028097491290624
refinement 8: 
1148750743719079498041767029550032831122597958315559446437317334336105389279028846671983328007126798344663678217310478873245910031311232679502892062001786881913873645733507260643841/726533762792931259056428876869998002853417255598937481942581984634876784602422528475337271599486688624425675701640856472886826490140251395415648899156864835350466583887285148750848

In the worst case, the number of digits in numerator and denominator
could double on each pass, so if you start with 1 digit in each,
you end with 2**100 in each.  (You will run out of memory first
unless you have a machine with more than 64 bits of address space. :-) )

-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unshelving the data?

2011-06-01 Thread Chris Torek
In article 4433955b-7f54-400a-af08-1f58a75e7...@j31g2000yqe.googlegroups.com
Uncle Ben  bgr...@nycap.rr.com wrote:
Shelving is a wonderfully simple way to get keyed access to a store of
items. I'd like to maintain this cache though.

Is there any way to remove a shelved key once it is hashed into the
system?

$ pydoc shelve
...
To summarize the interface (key is a string, data is an arbitrary
object):
...
d[key] = data   # store data at key (overwrites old data if
# using an existing key)
data = d[key]   # retrieve a COPY of the data at key (raise
# KeyError if no such key) -- NOTE that this
# access returns a *copy* of the entry!
del d[key]  # delete data stored at key (raises KeyError
# if no such key)
...

Seems pretty straightforward. :-)  Are you having some sort
of problem with del?
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-05-30 Thread Chris Torek
In article 4de3358b$0$29990$c3e8da3$54964...@news.astraweb.com
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
Better than a float method is a function which takes any number as 
argument:

 import math, fractions, decimal
 math.isnan(fractions.Fraction(2, 3))
False
 math.isnan(decimal.Decimal('nan'))
True

Ah, apparently someone's been using Larry Wall's time machine. :-)

I should have looked at documentation.  In my case, though:

$ python
Python 2.5.1 (r251:54863, Dec 16 2010, 14:12:43) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
 import math
 math.isnan
Traceback (most recent call last):
  File stdin, line 1, in module
AttributeError: 'module' object has no attribute 'isnan'

You can even handle complex NANs with the cmath module:

 import cmath
 cmath.isnan(complex(1, float('nan')))
True

Would it be appropriate to have isnan() methods for Fraction,
Decimal, and complex, so that you do not need to worry about whether
to use math.isnan() vs cmath.isnan()?  (I almost never work with
complex numbers so am not sure if the or behavior -- cmath.isinf
and cmath.isnan return true if either real or complex part are
Infinity or NaN respectively -- is appropriate in algorithms that
might be working on any of these types of numbers.)

It might also be appropriate to have trivial always-False isinf and
isnan methods for integers.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to catch a line with Popen

2011-05-30 Thread Chris Torek
Chris Torek wrote:
 In at least some versions of Python 2
[the file-type object iterators behave badly with pipes]

(This may still be true in Python 3, I just have no experience with
Py3k.  At least some version of Python 2 means the ones I have
access to, and have tried. :-) )

In article is0d44$d7m$1...@speranza.aioe.org
TheSaint  nob...@nowhere.net.no wrote:
I'm with P3k :P. However thank you for your guidelines.
Last my attempt was to use a *for* p.wait() , as mentioned earlier

If you have a process that has not yet terminated and that you
must stop from your own python program, calling the wait() method
will wait forever (because you are now waiting for yourself, in
effect -- waiting for yourself to terminate the other process).

The only time to call p.wait() (or p.communicate(), which itself
calls the wait() method) is when you believe the subprocess is
on its wao to terminating -- in this case, after you force it
to do so.

That looks good enough. I noted some little delay for the first lines, 
mostly sure Popen assign some buffer even it is not set.

According to the documentation, the default buffer size of Python 2
is 0, which is passed to fdopen() and makes the resulting files
unbuffered.  I recall some sort of changes discussed for Py3k though.

Haven't you try a perpetual ping, how would be the line_at_a_time ?

Since it is a generator that only requests another line when called,
it should be fine.  (Compare to the itertools cycle and repeat
generators, for instance, which return an infinite sequence.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Error: child process close a socket inherited from parent

2011-05-29 Thread Chris Torek
In article slrniu42cm.2s8.narkewo...@cnzuhnb904.ap.bm.net
narke  narkewo...@gmail.com wrote:
As illustrated in the following simple sample:

import sys
import os
import socket

class Server:
def __init__(self):
self._listen_sock = None

def _talk_to_client(self, conn, addr):
text = 'The brown fox jumps over the lazy dog.\n'
while True:
conn.send(text)
data = conn.recv(1024)
if not data:
break
conn.close()

def listen(self, port):
self._listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self._listen_sock.bind(('', port))
self._listen_sock.listen(128)
self._wait_conn()

def _wait_conn(self):
while True:
conn, addr = self._listen_sock.accept()
if os.fork() == 0:
self._listen_sock.close()   # line x
self._talk_to_client(conn, addr)
else:
conn.close()

if __name__ == '__main__':
Server().listen(int(sys.argv[1]))

Unless I comment out the line x, I will get a 'Bad file descriptor'
error when my tcp client program (e.g, telnet) closes the connection to
the server.  But as I understood, a child process can close a unused
socket (file descriptor).

It can.

Do you know what's wrong here?

The problem turns out to be fairly simple.

The routine listen() forks, and the parent process (with nonzero pid)
goes into the else branch of _wait_conn(), hence closes the newly
accepted socket and goes back to waiting on the accept() call, which
is all just fine.

Meanwhile, the child (with pid == 0) calls close() on the listening
socket and then calls self._talk_to_client().

What happens when the client is done and closes his end?  Well,
take a look at the code in _talk_to_client(): it reaches the
if not data clause and breaks out of its loop, and calls close()
on the accepted socket ... and then returns to its caller, which
is _wait_conn().

What does _wait_conn() do next?  It has finished if branch in
the while True: loops, so it must skip the else branch and go
around the loop again.  Which means its very next operation is
to call accept() on the listening socket it closed just before
it called self._talk_to_client().

If that socket is closed, you get an EBADF error raised.  If not,
the child and parent compete for the next incoming connection.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-05-29 Thread Chris Torek
Incidentally, note:

$ python
...
 nan = float(nan)
 nan
nan
 nan is nan
True
 nan == nan
False

In article 4de1e3e7$0$2195$742ec...@news.sonic.net
John Nagle  na...@animats.com wrote:
The correct answer to nan == nan is to raise an exception, because
you have asked a question for which the answer is nether True nor False.

Well, in some sense, the correct answer depends on which question
you *meant* to ask. :-)  Seriously, some (many?) instruction sets
have two kinds of comparison instructions: one that raises an
exception here, and one that does not.

The correct semantics for IEEE floating point look something like
this:

   1/0 INF
   INF + 1 INF
   INF - INF   NaN
   INF == INF  unordered
   NaN == NaN  unordered

INF and NaN both have comparison semantics which return
unordered. The FPU sets a bit for this, which most language
implementations ignore.

Again, this depends on the implementation.

This is similar to (e.g.) the fact that on the MIPS, there are two
different integer add instructions (addi and addiu): one
raises an overflow exception, the other performs C unsigned
style arithmetic (where, e.g., 0x + 1 = 0, in 32 bits).

Python should raise an exception on unordered comparisons.
Given that the language handles integer overflow by going to
arbitrary-precision integers, checking the FPU status bits is
cheap.

I could go for that myself.  But then you also need a don't raise
exception but give me an equality test result operator (for various
special-case purposes at least) too.  Of course a simple classify
this float as one of normal, subnormal, zero, infinity, or NaN
operator would suffice here (along with the usual extract sign
and differentiate between quiet and signalling NaN operations).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to catch a line with Popen

2011-05-29 Thread Chris Torek
In article irtj2o$h0m$1...@speranza.aioe.org
TheSaint  nob...@nowhere.net.no wrote:
Chris Rebert wrote:
I just suppose to elaborate the latest line, as soon it's written on the 
pipe, and print some result on the screen.
Imaging something like

 p= Popen(['ping','-c40','www.google.com'], stdout=PIPE)
 for line in p.stdout:
 print(str(line).split()[7])

I'd like to see something like *time=54.4*
This is just an example, where if we remove the -c40 on the command line, 
I'd expect to read the latest line(s), until the program will be killed.

In at least some versions of Python 2, file-like object next
iterators do not work right with unbuffered (or line-buffered)
pipe-file-objects.  (This may or may not be fixed in Python 3.)

A simple workaround is a little generator using readline():

def line_at_a_time(fileobj):

Return one line at a time from a file-like object.
Works around the iter behavior of pipe files in
Python 2.x, e.g., instead of for line in file you can
write for line in line_at_a_time(file).

while True:
line = fileobj.readline()
if not line:
return
yield line

Adding this to your sample code gives something that works for me,
provided I fiddle with it to make sure that the only lines
examined are those with actual ping times:

p = subprocess.Popen([ping, -c5, www.google.com],
stdout = subprocess.PIPE)
for lineno, line in enumerate(line_at_a_time(p.stdout)):
if 1 = lineno = 5:
print line.split()[6]
else:
print line.rstrip('\n')
p.wait() # discard final result

(Presumably the enumerate() trick would not be needed in whatever
you really use.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: portable way of sending notifying a process

2011-05-29 Thread Chris Torek
In article 4de183e7$0$26108$426a7...@news.free.fr
News123  news1...@free.fr wrote:
I'm looking for a portable way (windows XP / Windows Vista and Linux )
to send a signal from any python script to another one
(one signa would be enough)

This turns out to be pretty hard to do reliably-and-securely
even *without* crossing the Windows / Linux barrier.

It seems, that neither the signals HUP / USR1 are implemented under windows.

Signals are also very messy and easy to get wrong on Unix, with
earlier Python versions missing a few key items to make them
entirely reliable (such as the sigblock/sigsetmask/sigpause suite,
and/or setting interrupt-vs-resume behavior on system calls).

What would be a light weight portable way, that one process can tell
another to do something?

The main requirement would be to have no CPU impact while waiting (thus
no polling)

Your best bet here is probably to use sockets.  Both systems have
ways to create service sockets and to connect to a socket as a
client.  Of course, making these secure can be difficult: you must
decide what sort of attack(s) could occur and how much effort to
put into defending against them.

(For instance, even if there is only a wake up, I have done
something you should look at signal that you can transmit by
connecting to a server and then closing the connection, what happens
if someone inside or outside your local network decides to repeatedly
poke that port in the hopes of causing a Denial of Service by making
the server use lots of CPU time?)

If nothing exists I might just write a wrapper around
pyinotify and (Tim Goldens code snippet allowing to watch a directory
for file changes)
http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html

and use a marker file in a marker directory
but I wanted to be sure of not reinventing the wheel.

It really sounds like you are writing client/server code in which
the client writes a file into a queue directory.  In this case,
that may be the way to go -- or you could structure it as an actual
client and server, i.e., the client connects to the server and
writes the request directly (but then you have to decide about
security considerations, which the OS's local file system may
provide more directly).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-05-29 Thread Chris Torek
In article 4de31635$0$29990$c3e8da3$54964...@news.astraweb.com,
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
That's also completely wrong. The correct way to test for a NAN is with 
the IEEE-mandated function isnan(). The NAN != NAN trick is exactly that, 
a trick, used by programmers when their language or compiler doesn't 
support isnan().

Perhaps it would be reasonable to be able to do:

x.isnan()

when x is a float.

Without support for isinf(), identifying an INF is just as hard as 
identifying an NAN, and yet their behaviour under equality is the 
complete opposite:

 inf = float('inf')
 inf == inf
True

Fortunately:

def isnan(x):
return x != x
_inf = float(inf)
def isinf(x):
return x == _inf
del _inf

both do the trick here.

I would like to have both modes (non-exception-ing and exception-ing)
of IEEE-style float available in Python, and am not too picky about
how they would be implemented or which one would be the default.
Python could also paper over the brokenness of various actual
implementations (where signalling vs quiet NaNs, and so on, do not
quite work right in all cases), with some performance penalty on
non-conformant hardware.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: float(nan) in set or as key

2011-05-29 Thread Chris Torek
In article irv6ev01...@news1.newsguy.com I wrote, in part:
_inf = float(inf)
def isinf(x):
return x == _inf
del _inf

Oops, take out the del, or otherwise fix the obvious problem,
e.g., perhaps:

def isinf(x):
return x == isinf._inf
isinf._inf = float(inf)

(Of course, if something like this were adopted properly, it would
all be in the base float type anyway.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Code Review

2011-05-25 Thread Chris Torek
:

find DIRLIST -ctime +N ( -type d -o -type f ) -exec rm -rf {} \;

but can also a great deal more since (a) it has many other options
than just -ctime, and (b) -exec will execute any arbitrary command.

---

import os
import time
import shutil
import argparse
import sys

def main():

main program: parse arguments, and clean out directories.

parser = argparse.ArgumentParser(
description=Delete files and folders in a directory N days old,
prog=directorycleaner)
parser.add_argument(days, type=int,
help=Numeric value: delete files and folders older than N days)
parser.add_argument(directory, nargs=+,
help=delete files and folders in this directory)

args = parser.parse_args()

for dirname in args.directory:
clean_dir(dirname, args.days)

def clean_dir(dirname, n_days):

Clean one directory of files / subdirectories older than
the given number of days.

time_to_live = n_days * 86400 # 86400 = seconds-per-day
current_time = time.time()

try:
contents = os.listdir(dirname)
except OSError, err:
sys.exit(can't read %s: %s % (dirname, err))
for filename in contents:
# Get the path of the file name
path = os.path.join(dirname, filename)
# Get the creation time of the file
# NOTE: this only works on Windows-like systems
when_created = os.path.getctime(path)
# If the file/directory has expired, remove it
if when_created + time_to_live  current_time:
if os.path.isfile(path):
print os.remove(%s) % path
# It is not a file it is a directory
elif os.path.isdir(path):
print shutil.rmtree(%s) % path

if __name__ == __main__:
main()
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Condition.wait(timeout) oddities

2011-05-23 Thread Chris Torek
In article
94d1d127-b423-4bd4-853c-d92da9ac7...@glegroupsg2000goo.googlegroups.com
Floris Bruynooghe  comp.lang.pyt...@googlegroups.com wrote:
I'm a little confused about the corner cases of Condition.wait() with a
timeout parameter in the threading module.

When looking at the code the first thing that I don't quite get is that
the timeout should never work as far as I understand it.  .wait() always
needs to return while holding the lock, therefore it does an .acquire()
on the lock in a finally clause.  Thus pretty much ignoring the timeout
value.

It does not do a straight acquire, it uses self._acquire_restore(),
which for a condition variable, does instead:

self.__block.acquire()
self.__count = count
self.__owner = owner

(assuming that you did not override the lock argument or passed
in a threading.RLock() object as the lock), due to this bit of
code in _Condition.__init__():

# If the lock defines _release_save() and/or _acquire_restore(),
# these override the default implementations (which just call
# release() and acquire() on the lock).  Ditto for _is_owned().
[snippage]
try:
self._acquire_restore = lock._acquire_restore
except AttributeError:
pass

That is, lock it holds is the one on the blocking lock (the
__block of the underlying RLock), which is the same one you had
to hold in the first place to call the .wait() function.

To put it another way, the lock that .wait() waits for is
a new lock allocated for the duration of the .wait() operation:

waiter = _allocate_lock()
waiter.acquire()
self.__waiters.append(waiter)
saved_state = self._release_save()
here we wait for lock waiter, with timeout
self._acquire_restore(saved_state)
# the last stmt is the finally clause, I've just un-indented it

which is entirely different from the lock that .wait() re-acquires
(and which you held when you called .wait() initially) before it
returns.

The second issue is that while looking around for this I found two bug
reports: http://bugs.python.org/issue1175933 and
http://bugs.python.org/issue10218.  Both are proposing to add a return
value indicating whether the .wait() timed out or not similar to the
other .wait() methods in threading.  However the first was rejected
after some (seemingly inconclusive) discussion.

Tim Peters' reply seemed pretty conclusive to me. :-)

While the latter had
minimal discussion and and was accepted without reference to the earlier
attempt.  Not sure if this was a process oversight or what, but it does
leave the situation confusing.

But regardless I don't understand how the return value can be used
currently: yes you did time out but you're still promised to hold the
lock thanks to the .acquire() call on the lock in the finally block.

The return value is not generally useful for the reasons Tim Peters
noted originally.  Those are all still true even in the second
discussion.

In my small brain I just can't figure out how Condition.wait() can both
respect a timeout parameter and the promise to hold the lock on return. 

Remember, two different locks. :-)  There is a lock on the state
of the condition variable itself, and then there is a lock on which
one actually waits.  On both entry to and return from .wait(), you
(the caller) hold the lock on the state of the condition variable,
so you may inspect it and proceed based on the result.  In between,
you give up that lock, so that other threads may obtain it and
change the state of the condition variable.

It seems to me that the only way to handle the timeout is to raise an
exception rather then return a value because when you get an exception
you can break the promise of holding the lock.

That *would* be a valid way to implement a timeout -- to return with
the condition variable lock itself no longer held -- but that would
require changing lots of other code structure.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Converting a set into list

2011-05-16 Thread Chris Torek
Chris Torek nos...@torek.net wrote:
  x = [3, 1, 4, 1, 5, 9, 2, 6]
  list(set(x))
 This might not be the best example since the result is sorted
 by accident, while other list(set(...)) results are not. 

In article Xns9EE772D313153duncanbooth@127.0.0.1,
Duncan Booth  duncan.bo...@suttoncourtenay.org.uk wrote:
A minor change to your example makes it out of order even for integers:

 x = [7, 8, 9, 1, 4, 1]
 list(set(x))
[8, 9, 1, 4, 7]

or for that mattter:

 list(set([3, 32, 4, 32, 5, 9, 2, 6]))
[32, 2, 3, 4, 5, 6, 9]

Yes, but then it is no longer as easy as pi. :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Converting a set into list

2011-05-14 Thread Chris Torek
In article 871v00j2bh@benfinney.id.au
Ben Finney  ben+pyt...@benfinney.id.au wrote:
As pointed out: you already know how to create a set from an object;
creating a list from an object is very similar:

list(set(aa))

But why are you doing that? What are you trying to achieve?

I have no idea why someone *else* is doing that, but I have used
this very expression to unique-ize a list:

 x = [3, 1, 4, 1, 5, 9, 2, 6]
 x
[3, 1, 4, 1, 5, 9, 2, 6]
 list(set(x))
[1, 2, 3, 4, 5, 6, 9]


Of course, this trick only works if all the list elements are
hashable.

This might not be the best example since the result is sorted
by accident, while other list(set(...)) results are not.  Add
sorted() or .sort() if needed:

 x = ['three', 'one', 'four', 'one', 'five']
 x
['three', 'one', 'four', 'one', 'five']
 list(set(x))
['four', 'five', 'three', 'one']
 sorted(list(set(x)))
['five', 'four', 'one', 'three']
 
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: checking if a list is empty

2011-05-11 Thread Chris Torek
In article 4dcab8bf$0$29980$c3e8da3$54964...@news.astraweb.com
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
When you call len(x) you don't care about the details of how to calculate 
the length of x. The object itself knows so that you don't have to. The 
same applies to truth testing.

I have a data type that is an array of lists. When you call if len(x)  
0 on it, it will blow up in your face, because len(x) returns a list of 
lengths like [12, 0, 2, 5]. But if you say if x, it will do the right 
thing. You don't need to care how to truth-test my data type, because it 
does it for you. By ignoring my type's interface, and insisting on doing 
the truth-test by hand, you shoot yourself in the foot.

What this really points out is that if x and if len(x)  0 are
*different tests*.  Consider xml.etree.ElementTree Element objects.
The documentation says, in part:

In ElementTree 1.2 and earlier, the sequence behavior means
that an element without any subelements tests as false (since
it's an empty sequence), even if it contains text and
attributions. ...

Note: This behavior is likely to change somewhat in ElementTree
1.3.  To write code that is compatible in both directions, use
... len(element) to test for non-empty elements.

In this case, when x is an Element, the result of bool(x) *could*
mean just x has sub-elements, but it could also/instead mean x
has sub-elements, text, or attributions.

The issue raised at the beginning of this thread was: which test
is better when x is a list, the test that invokes bool(x), or
the test that invokes len(x)?  There is no answer to that, any more
than there is to which ice cream flavor is best. [%]  A more
interesting question to ask, in any given bit of code, is whether
bool(x) or len(x) is more appropriate for *all* the types x might
take at that point, rather than whether one or the other is better
for lists, where the result is defined as equivalent.

(The biggest problem with answering that tends to be deciding
what types x might take.)

[% Chocolate with raspberry, or mint, or similar.]
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: checking if a list is empty

2011-05-11 Thread Chris Torek
In article 4dc6a39a$0$29991$c3e8da3$54964...@news.astraweb.com
Steven D'Aprano  steve+comp.lang.pyt...@pearwood.info wrote:
In English, [the word not] negates a word or statement:

the cat is not on the mat -- the cat is on the mat is false.

As a mostly off topic aside, English is considerably more complicated
than that.  There are people who use the word not as a purely
boolean operator (a la computer languages), so that the cat is
not not on the mat means the cat IS on the mat, but others use
double negation as a form of intensifier, so that the phrase with
multiple nots is simply a more emphatic claim: the cat is really,
truly, *definitely*, not on that particular mat. :-)

In various other natural languages -- i.e., languages meant for
human-to-human communications, rather than for computers -- multiple
negatives are more often (or always?) intensifiers.  Some languages
have the idea of negative matching in much the same sense that
English has number [%] matching: the cat is on the mat and the cats
are on the mat are OK because the noun and verb numbers match,
but neither the cats is on the mat nor the cat are on the mat
are correct.

[% Number here is really 1 vs not-1: no cats, one cat, two cats.]

Of course, there are descriptivists and prescriptivists, and many
of the latter claim that using multi-valued boolean logic in English
is nonstandard or invalid.  Many of those in turn will tell
you that ain't good English ain't good English.  Still, one should
be aware of these forms and their uses, in much the same way as
one should be able to boldly split infinitives. :-)

Moving back towards on-topic-ness:

As an operator, not negates a true value to a false value. In 
mathematical Boolean algebra, there only is one true value and one false 
value, conventionally called True/False or 1/0. In non-Boolean algebras, 
you can define other values. In three-value logic, the negation of True/
False/Maybe is usually False/True/Maybe. In fuzzy logic, the logic values 
are the uncountable infinity (that's a technical term, not hyperbole) of 
real numbers between 0 and 1.

Or, to put it another way, before we can communicate clearly, we have
to pick out a set of rules.  Most computer languages do this pretty
well, and Python does a good (and reasonably conventional) job:

Python uses a boolean algebra where there are many ways of spelling the 
true and false values. The not operator returns the canonical bool 
values:

not any true value returns False
not any false value returns True

Take note of the distinction between lower-case true/false, which are 
adjectives, and True/False, which are objects of class bool.

(At least as of current versions of Python -- in much older versions
there was no real distinction between booleans and type int,
presumably a a holdover from C.)

[remainder snipped as I have nothing else to add]
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What other languages use the same data model as Python?

2011-05-06 Thread Chris Torek
 John Nagle wrote:
 A reasonable compromise would be that is is treated as == on
 immutable objects.

(Note: I have no dog in this fight, I would be happy with a changed
is or with the current one -- leaky abstractions are fine with
me, provided I am told *when* they may -- or sometimes may not --
leak. :-) )

 On 5/5/2011 3:06 AM, Gregory Ewing wrote:
 That wouldn't work for tuples, which can contain references
 to other objects that are not immutable.

On Thu, May 5, 2011 at 9:41 AM, John Nagle na...@animats.com wrote:
 Such tuples are still identical, even if they
 contain identical references to immutable objects.

In article mailman.1196.1304613911.9059.python-l...@python.org
Ian Kelly  ian.g.ke...@gmail.com wrote:
 a = (1, 2, [3, 4, 5])
 b = (1, 2, [3, 4, 5])
 a == b
True
 a is b  # Using the proposed definition
True

I believe that John Nagle's proposal would make a is b false,
because while a and b are both immutable, they contain *different*
refernces to *mutable* objects (thus failing the identical
references to immutable objects part of the claim).

On the other hand, should one do:

L = [3, 4, 5]
a = (1, 2, L)
b = (1, 2, L)

then a is b should (I say) be True under the proposal -- even
though they contain (identical) references to *mutable* objects.
Loosely speaking, we would define the is relation as:

(x is y) if and only if
   (id(x) == id(y)
   or
   (x is immutable and y is immutable and
(for all components xi and yi of x, xi is yi)))

In this case, even if the tuples a and b have different id()s,
we would find that both have an immutable type, and both have
components -- in this case, numbered, subscriptable tuple elements,
but instances of immutable class types like decimal.Decimal would
have dictionaries instead -- and thus we would recursively apply
the modified is definition to each element.  (For tuples, the
all components implies that the lengths must be equal; for class
instances, it implies that they need to have is-equal attributes,
etc.)

It's not entirely clear to me whether different immutable classes
(i.e., different types) but with identical everything-else should
compare equal under this modified is.  I.e., today:

$ cp /usr/lib/python2.?/decimal.py /tmp/deccopy.py
$ python
...
 sys.path.append('/tmp')
 import decimal
 import deccopy
 x = decimal.Decimal('1')
 y = deccopy.Decimal('1')
 print x, y
1 1
 x == y
False

and obviously x is y is currently False:

 type(x)
class 'decimal.Decimal'
 type(y)
class 'deccopy.Decimal'

However, even though the types differ, both x and y are immutable
[%] and obviously (because I copied the code) they have all the
same operations.  Since they were both created with the same starting
value, x and y will behave identically given identical treatment.
As such, it might be reasonable to ask that x is y be True
rather than False.

[% This is not at all obvious -- I have written an immutable class,
and it is pretty easy to accidentally mutate an instance inside
the class implementation.  There is nothing to prevent this in
CPython, at least.  If there were a minor bug in the decimal.Decimal
code such that x.invoke_bug() modified x, then x would *not* be
immutable, even though it is intended to be.  (As far as I know
there are no such bugs in decimal.Decimal, it's just that I had
them in my Money class.)]
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What other languages use the same data model as Python?

2011-05-06 Thread Chris Torek
In article GOmwp.13554$vp.9...@newsfe14.iad
harrismh777  harrismh...@charter.net wrote:
There may be some language somewhere that does pass-by-reference which 
is not implemented under the hood as pointers, but I can't think of 
any...   'cause like I've been saying, way down under the hood, we only 
have direct and indirect memory addressing in today's processors. EOS.

There have been Fortran compilers that implemented modification of
variables via value-result rather than by-reference.

This is perhaps best illustrated by some code fragments:

  SUBROUTINE FOO(X, Y)
  INTEGER X, Y
  ...
  X = 3
  Y = 4
  RETURN

  SUBROUTINE BAR(A)
  FOO(A, 0)
  RETURN

might compile to the equivalent of the following C code:

void foo(int *x0, int *y0) {
int x = *x0, y = *y0;
...
*x0 = x;
*y0 = y;
}

void bar(int *a0) {
int a = *a0;
int temp = 0;
foo(a, temp);
*a0 = a;
}

In order to allow both by-reference and value-result, Fortran
forbids the programmer to peek at the machinery.  That is, the
following complete program is invalid:

  SUBROUTINE PEEK(X)
  INTEGER X, GOTCHA
  COMMON /BLOCK/ GOTCHA
  PRINT *, 'INITIALLY GOTCHA = ', GOTCHA
  X = 4
  PRINT *, 'AFTER X=4 GOTCHA = ', GOTCHA
  RETURN

  PROGRAM MAIN
  INTEGER GOTCHA
  COMMON /BLOCK/ GOTCHA
  GOTCHA = 3
  PEEK(GOTCHA)
  PRINT *, 'FINALLY   GOTCHA = ', GOTCHA
  STOP
  END

(It has been so long since I used Fortran that the above may not
be quite right in ways other than the one intended.  Please forgive
small errors. :-) )

The trick in subroutine peek is that it refers to both a global
variable (in Fortran, simulated with a common block) and a dummy
variable (as it is termed in Fortran) -- the parameter that aliases
the global variable -- in such a way that we can see *when* the
change happens.  If gotcha starts out set to 3, remains 3 after
assignment to x, and changes to 4 after peek() returns, then peek()
effectively used value-result to change the parameter.  If, on the
other hand, gotcha became 4 immediately after the assignment to
x, then peek() effectively used by-reference.

The key take-away here is not so much the trick by which we peeked
inside the implementation (although peeking *is* useful in solving
the murder mystery we have after some program aborts with a
core-dump or what-have-you), but rather the fact that the Fortran
language proper forbids us from peeking at all.  By forbidding it
-- by making the program illegal -- the language provide implementors
the freedom to use *either* by-reference or value-result.  All
valid Fortran programs behave identically under either kind of
implementation.

Like it or not, Python has similar defined as undefined grey
areas: one is not promised, for instance, whether the is operator
is always True for small integers that are equal (although it is
in CPython), nor when __del__ is called (if ever), and so on.  As
with the Python-named-Monty, we have rigidly defined areas of
doubt and uncertainty.  These exist for good reasons: to allow
different implementations.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What other languages use the same data model as Python?

2011-05-06 Thread Chris Torek
In article iq1e0j02...@news2.newsguy.com I wrote, in part:
Like it or not, Python has similar defined as undefined grey
areas: one is not promised, for instance, whether the is operator
is always True for small integers that are equal (although it is
in CPython), nor when __del__ is called (if ever), and so on.  As
with the Python-named-Monty, we have rigidly defined areas of
doubt and uncertainty.  These exist for good reasons: to allow
different implementations.

Oops, attribution error: this comes from Douglas Adams rather
than Monty Python.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Popen Question

2010-11-05 Thread Chris Torek
In article 891a9a80-c30d-4415-ac81-bddd0b564...@g13g2000yqj.googlegroups.com
moogyd  moo...@yahoo.co.uk wrote:
[sde:st...@lbux03 ~]$ python
Python 2.6 (r26:66714, Feb 21 2009, 02:16:04)
[GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2
Type help, copyright, credits or license for more information.
 import os, subprocess
 os.environ['MYVAR'] = myval
 p = subprocess.Popen(['echo', '$MYVAR'],shell=True)

Alain Ketterlin has already explained these to some extent.
Here is a bit more.

This runs, underneath:

['/bin/sh', '-c', 'echo', '$MYVAR']

(with arguments expressed as a Python list).  /bin/sh takes the
string after '-c' as a command, and the remaining argument(s) if
any are assigned to positional parameters ($0, $1, etc).

If you replace the command with something a little more explicit,
you can see this:

 p = subprocess.Popen(
...[r'echo \$0=$0 \$1=$1', 'arg0', '$MYVAR'], shell=True)
 $0=arg0 $1=$MYVAR
p.wait()
0
 

(I like to call p.communicate() or p.wait(), although p.communicate()
is pretty much a no-op if you have not done any redirecting.  Note that
p.communicate() does a p.wait() for you.)

 p = subprocess.Popen(['echo', '$MYVAR'])
 $MYVAR

This time, as Alain noted, the shell does not get involved so no
variable expansion occurs.  However, you could do it yourself:

 p = subprocess.Popen(['echo', os.environ['MYVAR']])
 myval
p.wait()
0
 

 p = subprocess.Popen('echo $MYVAR',shell=True)
 myval

(here /bin/sh does the expansion, because you invoked it)

 p = subprocess.Popen('echo $MYVAR')
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib64/python2.6/subprocess.py, line 595, in __init__
errread, errwrite)
  File /usr/lib64/python2.6/subprocess.py, line 1106, in
_execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

This attempted to run the executable named 'echo $MYVAR'.  It did
not exist so the underlying exec (after the fork) failed.  The
exception was passed back to the subprocess module, which raised
it in the parent for you to see.

If you were to create an executable named 'echo $MYVAR' (including
the blank and dollar sign) somewhere in your path (or use an explicit
path to it), it would run.  I will also capture the actual output
this time:

$ cat '/tmp/echo $MYVAR'
#! /usr/bin/awk NR1{print}
this is a self-printing file
anything after the first line has NR  1, so gets printed
$ chmod +x '/tmp/echo $MYVAR'
$ python
Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type help, copyright, credits or license for more information.
 import subprocess
 p = subprocess.Popen('/tmp/echo $MYVAR', stdout=subprocess.PIPE)
 print p.communicate()[0]
this is a self-printing file
anything after the first line has NR  1, so gets printed

 p.returncode
0
 

Incidentally, fun with #!: you can make self-renaming scripts:

sh-3.2$ echo '#! /bin/mv'  /tmp/selfmove; chmod +x /tmp/selfmove
sh-3.2$ ls /tmp/*move*
/tmp/selfmove
sh-3.2$ /tmp/selfmove /tmp/I_moved
sh-3.2$ ls /tmp/*move*
/tmp/I_moved
sh-3.2$

or even self-removing scripts:

sh-3.2$ echo '#! /bin/rm'  /tmp/rmme; chmod +x /tmp/rmme
sh-3.2$ /tmp/rmme
sh-3.2$ /tmp/rmme
sh: /tmp/rmme: No such file or directory

(nothing to do with python, just the way #! interpreter lines work).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Interaction btw unittest.assertRaises and __getattr__. Bug?

2010-10-27 Thread Chris Torek
In article c38a5bb4-6087-453c-8873-f193927fd...@d8g2000yqf.googlegroups.com
Inyeol inyeol@gmail.com wrote:
[snippage below]
import unittest
class C():
def __getattr__(self, name):
raise AttributeError
class Test(unittest.TestCase):
def test_getattr(self):
c = C()
self.assertRaises(AttributeError, c.foo)
unittest.main()
-
... or am I missing something obvious?

As Benjamin Peterson noted, the error occurs too soon, so that
the unittest code never has a chance to see it.

The something obvious is to defer the evaluation just long enough:

self.assertRaises(AttributeError, lambda: c.foo)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: merge list of tuples with list

2010-10-20 Thread Chris Torek
On Wed, Oct 20, 2010 at 1:33 PM, Daniel Wagner
brocki2...@googlemail.com wrote:
 Any more efficient ways or suggestions are still welcome!

In article mailman.58.1287547882.2218.python-l...@python.org
James Mills  prolo...@shortcircuit.net.au wrote:
Did you not see Paul Rubin's solution:

 [x+(y,) for x,y in zip(a,b)]
 [(1, 2, 3, 7), (4, 5, 6, 8)]

I think this is much nicer and probably more efficient.

For a slight boost in Python 2.x, use itertools.izip() to avoid
making an actual list out of zip(a,b).  (In 3.x, plain zip() is
already an iterator rather than a list-result function.)

This method (Paul Rubin's) uses only a little extra storage, and
almost no extra when using itertools.izip() (or 3.x).  I think it
is more straightforward than multi-zip-ing (e.g., zip(*zip(*a) + [b]))
as well.  The two-zip method needs list()-s in 3.x as well, making
it clearer where the copies occur:

   list(zip(*a)) makes the list [(1, 4), (2, 5), (3, 6)]
 [input value is still referenced via a so
  sticks around]
   [b]   makes the tuple (7, 8) into the list [(7, 8)]
 [input value is still referenced via b so
  sticks around]
   + adds those two lists producing the list
 [(1, 4), (2, 5), (3, 6), (7, 8)]
 [the two input values are no longer referenced
  and are thus discarded]
   list(zip(*that))  makes the list [(1, 2, 3, 7), (4, 5, 6, 8)]
 [the input value -- the result of the addition
  in the next to last step -- is no longer
  referenced and thus discarded]

All these temporary results take up space and time.  The list
comprehension simply builds the final result, once.

Of course, I have not used timeit to try this out. :-)  Let's do
that, just for fun (and to let me play with timeit from the command
line):

(I am not sure why I have to give the full path to the
timeit.py source here)

sh-3.2$ python /System/Library/Frameworks/Python.framework/\
Versions/2.5/lib/python2.5/timeit.py \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]'
10 loops, best of 3: 2.55 usec per loop

sh-3.2$ python [long path snipped] \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]'
10 loops, best of 3: 2.56 usec per loop

sh-3.2$ python [long path snipped] \
'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + [b])'
10 loops, best of 3: 3.84 usec per loop

sh-3.2$ python [long path snipped] \
'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + [b])'
10 loops, best of 3: 3.85 usec per loop

Hence, even in 2.5 where zip makes a temporary copy of the list,
the list comprehension version is faster.  Adding an explicit use
of itertools.izip does help, but not much, with these short lists:

sh-3.2$ python ... -s 'import itertools' \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]'
10 loops, best of 3: 2.27 usec per loop

sh-3.2$ python ... -s 'import itertools' \
'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]'
10 loops, best of 3: 2.29 usec per loop

(It is easy enough to move the assignments to a and b into the -s
argument, but it makes relatively little difference since the list
comprehension and two-zip methods both have the same setup overhead.
The import, however, is pretty slow, so it is not good to repeat
it on every trip through the 10 loops -- on my machine it jumps
to 3.7 usec/loop, almost as slow as the two-zip method.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Catching a SIGSEGV signal on an import

2010-10-19 Thread Chris Torek
(I realize this is old but I am recovering from dental surgery and,
while on the Good Drugs for the pain, going through old stuff on
purpose :-) )

On Thu, 09 Sep 2010 05:23:14 -0700, Ryan wrote:
 In general, is there anyway to catch a  SIGSEGV on import?

In article pan.2010.09.09.21.20.26.16...@nowhere.com,
Nobody  nob...@nowhere.com wrote:
No. If SIGSEGV is raised, it often indicates that memory has been
corrupted. At that point, you can't assume that the Python runtime is
still functional.

Indeed.

Still, there *is* a way to do this, should you choose to live
somewhat dangerously.

First, make a copy of the original process.  Using Unix as an
example:

pid = os.fork()
if pid == 0:
# child
import untrustworthy
os._exit(0)

The import will either succeed or fail.  If it fails with a SIGSEGV
the child process will die; if not, the child will move on to the
next statement and exit (using os._exit() to bypass exit handlers,
since this is a forked child etc).

The parent can then do a waitpid and see whether the child was able
to do the import.

The obvious flaw in this method is that something that causes Python
to die with a SIGSEGV when imported probably has some serious bugs
in it, and depending on the state of the importing process, these
bugs might not cause a problem immediately, but instead set time-bombs
that will go off later.  In this case, the child import will succeed
and the parent will then trust the import itself (note that you
have to re-do the same import in the parent as it is completely
independent after the fork()).  Still, if you are dead set on the
idea, the test code below that I threw together here may be helpful.

---

import os, signal, sys

pid = os.fork()
if pid == 0:
# deliberately not checking len(sys.argv) nor using try
# this allows you to see what happens if you run python t.py
# instead of python t.py sig or python t.py fail or
# python t.py ok, for instance.
if sys.argv[1] == 'sig':
os.kill(os.getpid(), signal.SIGSEGV)
if sys.argv[1] == 'fail':
os._exit(1)
# Replace the above stuff with the untrustworthy import,
# assuming you like the general idea.
os._exit(0)

print 'parent: child =', pid
wpid, status = os.waitpid(pid, 0)
print 'wpid =', wpid, 'status =', status
if os.WIFSIGNALED(status):
print 'child died from signal', os.WTERMSIG(status)
if os.WCOREDUMP(status):
print '(core dumped)'
elif os.WIFEXITED(status):
print 'child exited with', os.WEXITSTATUS(status)
# at this point the parent can repeat the import
else:
print 'I am confused, maybe I got the wrong pid'

---

The same kind of thing can be done on other OSes, but all the details
will differ.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Simple logging example doesn't work!

2010-10-19 Thread Chris Torek
, in that the
# name to underlying file mapping can change in the presence of
# a rename.  Do not use this for security-issue operations.)
def fd_is_open_to(fileno, filename):
try:
s2 = os.stat(filename)
except OSError:
return False
s1 = os.fstat(fileno)
return s1.st_dev == s2.st_dev and s1.st_ino == s2.st_ino

errs = False

# Configure our logs as directed.
logconf = conf['logging']

# First, adjust stderr output level.  We deliberately do
# this before changing other log handers, so that new debug
# messages printed here can be seen.  (Maybe should do raise
# now and lower later, but does not seem worth the effort.)
level = logging.getLevelName(logconf['stderr-level'].upper())
g.stderr_logger.setLevel(level)

# Gripe about old unsupported config, if needed.
if conf['USE-FAST-LOGGER']:
logger.error('FAST logger no longer supported')
errs = True

# Now set up syslog logger, if any.
syslog_to = logconf['syslog-to']
if syslog_to:
# Might be nice to remember previous syslog-to (if any)
# and not create and delete handler if unchanged.  (But
# see comments elsewhere within this function.)
addr = get_syslog_addr(syslog_to)
logger.debug('syslog to: %s' % str(addr))
try:
sh = logging.handlers.SysLogHandler(addr,
logging.handlers.SysLogHandler.LOG_DAEMON)
sh.setFormatter(logging.Formatter(g.syslog_format))
except IOError, e:
logger.error('syslog-to: %s', e)
errs = True
sh = g.syslog_logger
level = logging.getLevelName(logconf['syslog-level'].upper())
if sh:
sh.setLevel(level)
else:
logger.debug('syslog logging suppressed')
sh = None

# And file logger, if any.
filepath = logconf['file']
if filepath:
if not os.path.isabs(filepath):
newpath = os.path.join(conf['NODEMGR-BASE-PATH'], filepath)
logger.warning('logging file=%s: relative path converted to %s',
filepath, newpath)
filepath = newpath
logger.debug('filelog to: %s' % str(filepath))
mode = logconf['mode']
maxsize = logconf['max-size']
try:
maxsize = utils.string_to_bytes(maxsize)
except ValueError:
logger.error('logging max-size=%s: not a valid size', maxsize)
maxsize = 1 * 1024 * 1024 # 1 MB
backup_count = logconf['backup-count']
level = logging.getLevelName(logconf['level'].upper())
# If mode is 'w' and maxsize==0, this will open an existing
# file for writing, truncating it.  If the existing file is
# our own currently-open log file, this does the wrong thing:
# we really only want any new level to apply.
#
# (If mode is 'a', it's harmless to re-open it, and if
# maxsize0 the RotatingFileHandler changes the mode to 'a'.
# In these cases we want to pick up any max-size or backup-count
# changes as well.)
fh = g.file_logger if mode == 'w' and maxsize == 0 else None
if fh and fd_is_open_to(fh.stream.fileno(), filepath):
pass # use it unchanged
else:
try:
fh = logging.handlers.RotatingFileHandler(filepath, mode,
maxsize, backup_count)
fh.setFormatter(logging.Formatter(g.log_format))
except IOError, e:
logger.error('log to file: %s', e)
errs = True
fh = g.file_logger
if fh:
fh.setLevel(level)
else:
logger.debug('file logging suppressed')
fh = None

if not errs:
# Swap out syslog and file loggers last, so that any previous
# logging about syslog logging and file logging goes to the
# old loggers (if any).
g.syslog_logger = swapout(g.syslog_logger, sh)
g.file_logger = swapout(g.file_logger, fh)

return errs
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ANN: stats 0.1a calculator statistics for Python

2010-10-19 Thread Chris Torek
2010/10/17 Steven D'Aprano st...@remove-this-cybersource.com.au:
 http://pypi.python.org/pypi/stats

In article mailman.23.1287437081.15964.python-l...@python.org
Vlastimil Brom  vlastimil.b...@gmail.com wrote:
Thanks for this useful module!
I just wanted to report a marginal error triggered in the doctests:

Failed example:
isnan(float('nan'))
Exception raised:
Traceback (most recent call last):
  File C:\Python25\lib\doctest.py, line 1228, in __run
compileflags, 1) in test.globs
  File doctest __main__.isnan[0], line 1, in module
isnan(float('nan'))
ValueError: invalid literal for float(): nan

(python 2.5.4 on win XP; this might be OS specific; probably in the
newer versions float() was updated, the tests on 2.6 and 2.7 are ok ):

Indeed it was; in older versions float() just invoked the C library
routines, so float('nan') works on Mac OS X python 2.5, for instance,
but then you run into the fact that math.isnan() is only in 2.6 and
later :-)

Workaround, assuming an earlier from math import *:

try:
isnan(0.0)
except NameError:
def isnan(x): x != x

Of course you are still stuck with float('nan') failing on Windows.
I have no quick and easy workaround for that one.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Spreadsheet-style dependency tracking

2010-10-17 Thread Chris Torek
(r.S[c], nstack)
#nstack.pop()
#r.L.append(node)

# Build set S of all cells (r.S) which gives their dependencies.
# By indexing by cell, we can find cells from dependencies in visit().
for row in sheet:
for cell in row:
if cell:
r.S[cell] = cell
cell.visited = False

# Now simply (initial-)visit all the cells.
for cell in r.S.itervalues():
visit(cell)

# Now r.L defines an evaluation order; it has at least one cycle in it
# if r.cycles is nonempty.
return (r.L, r.cycles)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to implement retrying a lock tidily in Python?

2010-10-17 Thread Chris Torek
In article 4imro7-ds6@chris.zbmc.eu,  tinn...@isbd.co.uk wrote:
I'm writing some code that writes to a mbox file and want to retry
locking the mbox file a few times before giving up. ...

dest = mailbox.mbox(mbName, factory=None)
for tries in xrange(3):
try:
dest.lock()
#
#
# Do some stuff to the mbox file
#
dest.unlock()
break   # done what we need, carry on

except mailbox.ExternalClashError:
log(Destination locked, try  + str(tries))
time.sleep(1)
# and try again

... but this doesn't really work 'nicely' because the break after
dest.unlock() takes me to the same place as running out of the number
of tries in the for loop.

Seems to me the right place for this is a little wrapper lock as
it were:

def retried_lock(max_attempts=3):
for tries in xrange(max_attempts):
try:
self.lock()
return # got the lock
except mailbox.ExternalClashError:
log and sleep here
raise mailbox.ExternalClashError # or whatever

and now instead of dest.lock() you just do a dest.retried_lock().

Plumbing (including fitting this in as a context manager so
that you can just do with dest or some such) is left as an
exercise, :-)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: EOF while scanning triple-quoted string literal

2010-10-15 Thread Chris Torek
 On 2010-10-15, Grant Edwards inva...@invalid.invalid wrote:
 How do you create a [Unix] file with a name that contains a NULL byte?

On 2010-10-15, Seebs usenet-nos...@seebs.net wrote:
 So far as I know, in canonical Unix, you don't -- the syscalls all work
 with something like C strings under the hood, meaning that no matter what
 path name you send, the first null byte actually terminates it.

In article i9a84m$rp...@reader1.panix.com
Grant Edwards  inva...@invalid.invalid wrote:
Yes, all of the Unix syscalls use NULL-terminated path parameters (AKA
C strings).  What I don't know is whether the underlying filesystem
code also uses NULL-terminated strings for filenames or if they have
explicit lengths.  If the latter, there might be some way to bypass
the normal Unix syscalls and actually create a file with a NULL in its
name -- a file that then couldn't be accessed via the normal Unix
system calls.  My _guess_ is that the underlying filesystem code in
most all Unices also uses NULL-terminated strings, but I haven't
looked yet.

Multiple common on-disk formats (BSD's UFS variants and Linux's
EXTs, for instance) use counted strings, so it is possible -- via
disk corruption or similar -- to get impossible file names (those
containing either an embedded NUL or an embedded '/').

More notoriously, earlier versions of NFS could create files with
embedded slashes when serving non-Unix clients.  These were easily
removed with the same non-Unix client, but not on the server! :-)

None of this has anything to do with the original problem, in which
a triple-quoted string is left to contain arbitrary binary data
(up to, of course, the closing triple-quote).  Should that arbitrary
binary data itself happen to include a triple-quote, this trivial
encoding technique will fail.  (And of course, as others have noted,
it fails on some systems that distinguish betwen text and binary
file formats in the first place.)  This is why using some
text-friendly encoding scheme, such as base64, is a good idea.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: what happens to Popen()'s parent-side file descriptors?

2010-10-15 Thread Chris Torek
In message pan.2010.10.15.06.27.02.360...@nowhere.com, Nobody wrote:
 Another gotcha regarding pipes: the reader only sees EOF once there are no
 writers, i.e. when the *last* writer closes their end.

In article i9atra$j4...@lust.ihug.co.nz
Lawrence D'Oliveiro  l...@geek-central.gen.new_zealand wrote:
Been there, been bitten by that.

Nobody mentioned the techniques of setting close_fds = True and
passing a preexec_fn that closes the extra pipe descriptors.  You
can also use fcntl.fcntl() to set the fcntl.FD_CLOEXEC flag on the
underlying file descriptors (this of course requires that you are
able to find them).

The subprocess module sets FD_CLOEXEC on the pipe it uses to pass
back a failure to exec, or even to reach the exec, e.g., due to an
exception during preexec_fn.  One could argue that perhaps it should
set FD_CLOEXEC on the parent's remaining pipe descriptors, once
the child is successfully started, if it created them (i.e., if
the corresponding arguments were PIPE).  In fact, thinking about it
now, I *would* argue that.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: what happens to Popen()'s parent-side file descriptors?

2010-10-14 Thread Chris Torek
 that any as-yet-uncollected fork()ed processes are eventually
waitpid()-ed for.

Can anyone explain the treatment of the pipe FDs opened in the parent
by Popen() to me or point me to some documentation?

The best documentation seems generally to be the source.  Fortunately
subprocess.py is written in Python.  (Inspecting C modules is less
straightforward. :-) )

Also, does Popen.returncode contain only the child's exit code or is
does it also contain signal info like the return of os.wait()?
Documentation on this is also unclear to me.

A negative value -N indicates that the child was terminated by
signal N (Unix only).  Again, the Python source is handy:

def _handle_exitstatus(self, sts):
if os.WIFSIGNALED(sts):
self.returncode = -os.WTERMSIG(sts)
elif os.WIFEXITED(sts):
self.returncode = os.WEXITSTATUS(sts)
else:
# Should never happen
raise RuntimeError(Unknown child exit status!)

The only things left out are the core-dump flag, and stopped/suspended.
The latter should never occur as os.waitpid() is called with only
os.WNOHANG, not os.WUNTRACED (of course a process being traced,
stopping at a breakpoint, would mess this up, but subprocess.Popen
is not a debugger :-) ).

It might be nice to capture os.WCOREDUMPED(sts), though.

Also, while I was writing this, I discovered that appears to be a
buglet in _cleanup(), with regard to abandoned Unix processes that
terminate due to a signal.  Note that _handle_exitstatus() will
set self.returncode to (e.g.) -1 if the child exits due to SIGHUP.
The _cleanup() function, however, does this in part:

if inst.poll(_deadstate=sys.maxint) = 0:
try:
_active.remove(inst)

The Unix-specific poll() routine, however, reads:

if self.returncode is None:
try:
pid, sts = os.waitpid(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
if _deadstate is not None:
self.returncode = _deadstate
return self.returncode

Hence if pid 12345 is abandoned (and thus on _active), and we
os.waitpid(12345, os.WNOHANG) and get a status that has a termination
signal, we set self.returncode to -N, and return that.  Hence
inst.poll returns (e.g.) -1 and we never attempt to remove it from
_active.  Now that its returncode is not None, though, every later
poll() will continue to return -1.  It seems it would be better to
have _cleanup() read:

if inst.poll(_deadstate=sys.maxint) is not None:

(Note, this is python 2.5, which is what I have installed on my
Mac laptop, where I am writing this at the moment).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: what happens to Popen()'s parent-side file descriptors?

2010-10-14 Thread Chris Torek
 on so this is not necessary, but again,
using the communicate function will close them for you.  In this
case, though, I am not entirely sure subprocess is the right hammer
-- it mostly will give you portablility to Windows (well, plus the
magic for preexec_fn and reporting exec failure).

Once again, peeking at the source is the trick :-) ... the arguments
you provide for stdin, stdout, and stderr are used thus:

if stdin is None:
pass
elif stdin == PIPE:
p2cread, p2cwrite = os.pipe()
elif isinstance(stdin, int):
p2cread = stdin
else:
# Assuming file-like object
p2cread = stdin.fileno()

(this is repeated for stdout and stderr) and the resulting
integer file descriptors (or None if not applicable) are
passed to os.fdopen() on the parent side.

(On the child side, the code does the usual shell-like dance
to move the appropriate descriptors to 0 through 2.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: My first Python program

2010-10-13 Thread Chris Torek
In article slrnibboof.29uv.usenet-nos...@guild.seebs.net
Seebs  usenet-nos...@seebs.net wrote:
 * raising `Exception` rather than a subclass of it is uncommon.

Okay.  I did that as a quick fix when, finally having hit one of them,
I found out that 'raise Error message' didn't work.  :)  I'm a bit unsure
as to how to pick the right subclass, though.

For exceptions, you have two choices:

  - pick some existing exception that seems to make sense, or
  - define your own.

The obvious cases for the former are things like ValueError or
IndexError.  Indeed, in many cases, you just let a work-step
raise these naturally:

def frobulate(self, x):
...
self.table[x] += ...   # raises IndexError when x out of range
...

For the latter, make a class that inherits from Exception.  In
a whole lot of cases a trivial/empty class suffices:

class SoLongAndThanksForAllTheFish(Exception):
pass

def ...:
...
if somecondition:
raise SoLongAndThanksForAllTheFish()

Since Exception provides a base __init__() function, you can
include a string:

raise SoLongAndThanksForAllTheFish('RIP DNA')

which becomes the .message field:

 x = SoLongAndThanksForAllTheFish('RIP DNA')
 x.message
'RIP DNA'
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: My first Python program

2010-10-13 Thread Chris Torek
In article mailman.1673.1286992432.29448.python-l...@python.org
Jonas H. jo...@lophus.org wrote:
On 10/13/2010 06:48 PM, Seebs wrote:
 Is it safe for me to assume that all my files will have been flushed and
 closed?  I'd normally assume this, but I seem to recall that not every
 language makes those guarantees.

Not really. Files will be closed when the garbage collector collects the 
file object, but you can't be sure the GC will run within the next N 
seconds/instructions or something like that. So you should *always* make 
sure to close files after using them. That's what context managers were 
introduced for.

 with open('foobar') as fileobject:
 do_something_with(fileobject)

basically is equivalent to (simplified!)

 fileobject = open('foobar')
 try:
 do_something_with(fileobject)
 finally:
 fileobject.close()

So you can sure `fileobject.close()` is called in *any* case.

Unfortunately with is newish and this code currently has to
support python 2.3 (if not even older versions).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Class-level variables - a scoping issue

2010-10-10 Thread Chris Torek
In article 4cb14f8c$0$1627$742ec...@news.sonic.net
John Nagle  na...@animats.com wrote:
Here's an obscure bit of Python semantics which
is close to being a bug:

[assigning to instance of class creates an attribute within
the instance, thus obscuring the class-level version of the
attribute]

This is sort of a feature, but one I have been reluctant to use:
you can define default values for instances within the class,
and only write instance-specific values into instances as needed.
This would save space in various cases, for instance.

 Python protects global variables from similar confusion
by making them read-only when referenced from an inner scope
without a global statement.  But that protection isn't
applied to class-level variables referenced through 'self'.
Perhaps it should be.  

It's not really clear to me how one would distinguish between
accidental and deliberate creation of these variables,
syntactically speaking.

If you want direct, guaranteed access to the class-specific variable,
using __class__ is perhaps the Best Way right now:

 class K:
... x = 42
... def __init__(self): pass
... 
 inst = K()
 inst.x # just to show that we're getting K.x here
42
 inst.x = 'hah'
 inst.x
'hah'
 inst.__class__.x
42


One could borrow the nonlocal keyword to mean I know that
there is potential confusion here between instance-specific
attribute and class-level attribute, but the implication seems
backwards:

nonlocal self.foo

implies that you want self.foo to be shorthand for self.__class__.foo,
not that you know that self.__class__.foo exists but you *don't*
want to use that.

If Python had explicit local variable declarations, then:

local self.foo

would be closer to the implied semantics here.

As it is, I think Python gets this pretty much right, and if you
think this is more a bug than a feature, you can always insert
assert statements in key locations, e.g.:

assert 'foo' not in inst.__class__.__dict__, \
'overwriting class var foo'

(you can even make that a function using introspection, although
it could get pretty hairy).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list parameter of a recursive function

2010-10-06 Thread Chris Torek
In article rsuun7-eml@rama.universe
TP  tribulati...@paralleles.invalid wrote:
I have a function f that calls itself recursively. It has a list as second 
argument, with default argument equal to None (and not [], as indicated at:
http://www.ferg.org/projects/python_gotchas.html#contents_item_6 )

This is the outline of my function:

def f ( argument, some_list = None ):

   if some_list == None:
  some_list = []
   [...]
   # creation of a new_argument
   # the function is called recursively only if some condition is respected
   if some_condition:
  some_list.append( elem )
  f( new_argument, some_list )
   # otherwise, we have reached a leaf of the a branch of the recursive tree
   # (said differently, terminal condition has been reached for this branch)
   print Terminal condition

The problem is that when the terminal condition is reached, we return back 
to some other branch of the recursive tree, but some_list has the value 
obtained in the previous branch!

Yes, this is the way it is supposed to work. :-)

So, it seems that there is only one some_list list for all the recursive 
tree.  To get rid of this behavior, I have been compelled to do at the
beginning of f:

import copy from copy

[from copy import copy, rather]

some_list = copy( some_list )

I suppose this is not a surprise to you: I am compelled to create a new 
some_list with the same content.

The above will work, or for this specific case, you can write:

some_list = list(some_list)

which has the effect of making a shallow copy of an existing list:

base = [1, 2]
l1 = base
l2 = list(l1)
l1 is l2
   False
l1[0] is l2[0]
   True
base.append(3)
l2
   [[1, 2, 3]]
   

but will also turn *any* iterator into a (new) list; the latter
may often be desirable.

So, if I am right, all is happening as if the parameters of a function are 
always passed by address to the function. Whereas in C, they are always 
passed by copy (which gives relevance to pointers).

Am I right?

Mostly.  Python distinguishes between mutable and immutable items.
Mutable items are always mutable, immutable items are never mutable,
and the mutability of arguments is attached to their fundamental
mutability rather than to their being passed as arguments.  This
is largely a point-of-view issue (although it matters a lot to
people writing compilers, for instance).

Note that if f() is *supposed* to be able to modify its second
parameter under some conditions, you would want to make the copy
not at the top of f() but rather further in, and in this case,
that would be trivial:

def f(arg, some_list = None):
if some_list is None:
some_list = []
...
if some_condition:
# make copy of list and append something
f(new_arg, some_list + [elem])
elif other_condition:
# continue modifying same list
f(new_arg, some_list)
...

(Note: you can use also the fact that list1 + list2 produces a new
list to make a copy by writing x = x + [], or x = [] + x, but
x = list(x) is generally a better idea here.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Inheritance and name clashes

2010-10-03 Thread Chris Torek
 here ... we're not using
# that so I omit it
func = None
if func is not None:
logger.debug('%s: %s%s',
format_ipaddr(self.client_address), method, str(params))
try:
return func(*params)
except MgrError, e:
# Given, e.g., MgrError(ValueError('bad value'))),
# send the corresponding exc_type / exc_val back
# via xmlrpclib, which transforms it into a Fault.
raise e.exc_type, e.exc_val
except xmlrpclib.Fault:
# Already a Fault, pass it back unchanged.
raise
except TypeError, e:
# If the parameter count did not match, we will get
# a TypeError with the traceback ending with our own
# call at func(*params).  We want to pass that back,
# rather than logging it.
#
# If the TypeError happened inside func() or one of
# its sub-functions, the traceback will continue beyond
# here, i.e., its tb_next will not be None.
if sys.exc_info()[2].tb_next is None:
raise
# else fall through to error-logging code
except:
pass # fall through to error-logging code

# Any other exception is assumed to be a bug in the server.
# Log a traceback for server debugging.
# is logger.error exc_info thread-safe? let's assume so
logger.error('internal failure in %s', method, exc_info = True)
# traceback.format_exc().rstrip()
raise xmlrpclib.Fault(2000, 'internal failure in ' + method)
else:
logger.info('%s: bad request: %s%s',
format_ipaddr(self.client_address), method, str(params))
raise Exception('method %s is not supported' % method)

# Tests of the form:
#   c = new_class_object(params)
#   if c: ...
# are turned into calls to the class's __nonzero__ method.
# We don't do if server: in our own server code, but if we did
# this would get called, and it's reasonable to just define it as
# True.  Probably the existing SimpleXMLRPCServer (or one of its
# base classes) should have done this, but they did not.
#
# For whatever reason, the xml-rpc library routines also pass
# a client's __nonzero__ (on his server proxy connection) to us,
# which reaches our dispatcher above.  By registering this in
# our __init__, clients can do if server: to see if their
# connection is up.  It's a frill, I admit
def __nonzero__(self):
return True

def register_admin_function(self, f, name = None):
... more stuff snipped out ...

# --END-- threading XML RPC server code
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: partial sums problem

2010-09-29 Thread Chris Torek
In article i7trs4$9e...@reader1.panix.com kj  no.em...@please.post wrote:
The following attempt to get a list of partial sums fails:

 s = 0
 [((s += t) and s) for t in range(1, 10)]
  File stdin, line 1
[((s += t) and s) for t in range(1, 10)]
  ^
SyntaxError: invalid syntax

What's the best way to get a list of partial sums?

Well, define best; but curiously enough, I wrote this just a few
days ago for other purposes, so here you go, a slightly cleaned-up /
better documented version of what I wrote:

def iaccumulate(vec, op):
Do an accumulative operation on a vector (any iterable, really).

The result is a generator whose first call produces vec[0],
second call produces vec[0] op vec[1], third produces
(vec[0] op vec[1]) op vec[2], and so on.

Mostly useful with + and *, probably.

iterable = iter(vec)
acc = iterable.next()
yield acc
for x in iterable:
acc = op(acc, x)
yield acc

def cumsum(vec):
Return a list of the cumulative sums of a vector.
import operator

return list(iaccumulate(vec, operator.add))
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)  http://web.torek.net/torek/index.html
-- 
http://mail.python.org/mailman/listinfo/python-list