Re: Py2.7/FreeBSD: maximum number of open files
In article mailman.2711.1321299276.27778.python-l...@python.org Christian Heimes li...@cheimes.de wrote: Am 14.11.2011 19:28, schrieb Tobias Oberstein: Thanks! This is probably the most practical option I can go. I've just tested: the backported new IO on Python 2.7 will indeed open 32k files on FreeBSD. It also creates the files much faster. The old, non-monkey-patched version was getting slower and slower as more files were opened/created .. I wonder what's causing the O(n^2) behavior. Is it the old file type or BSD's fopen() fault? It is code in libc. My old stdio (still in use on FreeBSD) was never designed to be used in situations with more than roughly 256 file descriptors -- hence the short in the file number field. (The OS used to be full of other places that kept the maximum allowable file descriptor fairly small, such as the on-stack copies of fd_set objects in select() calls.) You will want to redesign the code that finds or creates a free FILE object, and probably some of the things that work with line-buffered FILEs (specifically calls to _fwalk() when reading from a line-buffered stream). -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to check that you are at the beginning (the end) of an iterable?
In article mailman.854.1315441399.27778.python-l...@python.org Cameron Simpson c...@zip.com.au wrote: Facilities like feof() in C and eof in Pascal already lead to lots of code that runs happily with flat files and behaves badly in interactive or piped input. It is _so_ easy to adopt a style like: while not eof(filehandle): line = filehandle.nextline() ... Minor but important point here: eof() in Pascal is predictive (uses a crystal ball to peer into the future to see whether EOF is is about to occur -- which really means, reads ahead, causing that interactivity problem you mentioned), but feof() in C is post-dictive. The feof(stream) function returns a false value if the stream has not yet encountered an EOF, but your very next attempt to read from it may (or may not) immediately encounter that EOF. Thus, feof() in C is sort of (but not really) useless. (The actual use cases are to distinguish between EOF and error after a failed read from a stream -- since C lacks exceptions, getc() just returns EOF to indicate failed to get a character due to end of file or error -- or in some more obscure cases, such as the nonstandard getw(), to distinguish between a valid -1 value and having encountered an EOF. The companion ferror() function tells you whether an earlier EOF value was due to an error.) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do class methods always need 'self' as the first parameter?
Chris Torek nos...@torek.net writes: [snip] when you have [an] instance and call [an] instance or class method: [note: I have changed the names very slightly here, and removed additional arguments, on purpose] black_knight = K() black_knight.spam() black_knight.eggs() the first parameters ... are magic, and invisible. Thus, Python is using the explicit is better than implicit rule in the definition, but not at the call site. ... In article m2wrdnf53u@cochabamba.vanoostrum.org Piet van Oostrum p...@vanoostrum.org wrote: It *is* explicit also at the call site. It only is written at the left of the dot rather than at the right of the parenthesis. It cannot possibly be explicit. The first parameter to one of the method functions is black_knight, but the first parameter to the other method is black_knight.__class__. Which one is which? Is spam() the instance method and eggs() the class method, or is spam() the class method and eggs the instance method? (One does not, and should not, have to *care*, which is kind of the point here. :-) ) And that is necessary to locate which definition of the method applies. By that I assume you mean the name black_knight here. But the name is not required to make the call; see the last line of the following code fragment: funclist = [] ... black_knight = K() funclist.append(black_knight.spam) funclist.append(black_knight.eggs) ... # At this point, let's say len(funclist) 2, # and some number of funclist[i] entries are ordinary # functions that have no special first parameter. random.choice(funclist)() It would be silly to repeat this information after the parenthesis. Not only silly, it would be stupid as it would be a source of errors, and an example of DRY. Indeed. But I believe the above is a demonstration of how the self or cls parameter is in fact implicit, not explicit. (I am using python 2.x, and doing this in the interpreter: random.choice(funclist) -- without the parentheses to call the function -- produces: bound method K.[name omitted] of __main__.K object at 0x249f50 bound method type.[name omitted] of class '__main__.K' function ordinary at 0x682b0 The first is the instance method, whose name I am still keeping secret; the second is the class method; and the third is the ordinary function I added to the list. The actual functions print their own name and their parameters if any, and one can see that the class and instance methods get one parameter, and the ordinary function gets none.) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't threading.join() return a value?
On Sep 2, 2:23 pm, Alain Ketterlin al...@dpt-info.u-strasbg.fr wrote: Sorry, you're wrong, at least for POSIX threads: void pthread_exit(void *value_ptr); int pthread_join(pthread_t thread, void **value_ptr); pthread_exit can pass anything, and that value will be retrieved with pthread_join. In article bf50c8e1-1476-41e1-b2bc-61e329bfa...@s12g2000yqm.googlegroups.com Adam Skutt ask...@gmail.com wrote: No, it can only pass a void*, which isn't much better than passing an int. It is far better than passing an int, although it leaves you with an annoying storage-management issue, and sidesteps any reasonable attempts at type-checking (both of which are of course par for the course in C). For instance: struct some_big_value { ... lots of stuff ... }; struct some_big_value storage_management_problem[SIZE]; ... void *func(void *initial_args) { ... #ifdef ONE_WAY_TO_DO_IT pthread_exit(storage_management_problem[index]); /* NOTREACHED */ #else /* the other way */ return storage_management_problem[index]; #endif } ... int error; pthread_t threadinfo; pthread_attr_t attr; ... pthread_attr_init(attr); /* set attributes if desired */ error = pthread_create(threadinfo, attr, func, args_to_func); if (error) { ... handle error ... } else { ... void *rv; result = pthread_join(threadinfo, rv); if (rv == PTHREAD_CANCELED) { ... the thread was canceled ... } else { struct some_big_value *ret = rv; ... work with ret-field ... } } (Or, do dynamic allocation, and have a struct with a distinguishing ID followed by a union of multiple possible values, or a flexible array member, or whatever. This means you can pass any arbitrary data structure back, provided you can manage the storage somehow.) Passing a void* is not equivalent to passing anything, not even in C. Moreover, specific values are still reserved, like PTHREAD_CANCELLED. Some manual pages are clearer about this than others. Here is one that I think is not bad: The symbolic constant PTHREAD_CANCELED expands to a constant expression of type (void *), whose value matches no pointer to an object in memory nor the value NULL. So, provided you use pthread_exit() correctly (always pass either NULL or the address of some actual object in memory), the special reserved value is different from all of your values. (POSIX threads are certainly klunky, but not all *that* badly designed given the constraints.) Re. the original question: since you can define your own Thread subclass, with wathever attribute you want, I guess there was no need to use join() to communicate the result. The Thread's run() can store its result in an attribute, and the client can get it from the same attribute after a successful call to join(). For that matter, you can use the following to get what the OP asked for. (Change all the instance variables to __-prefixed versions if you want them to be Mostly Private.) import threading class ValThread(threading.Thread): like threading.Thread, but the target function's return val is captured def __init__(self, group=None, target=None, name=None, args=(), kwargs=None, verbose=None): super(ValThread, self).__init__(group, None, name, None, None, verbose) self.value = None self.target = target self.args = args self.kwargs = {} if kwargs is None else kwargs def run(self): run the thread if self.target: self.value = self.target(*self.args, **self.kwargs) def join(self, timeout = None): join, then return value set by target function super(ValThread, self).join(timeout) return self.value -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Why doesn't threading.join() return a value?
In article roy-030914.19162802092...@news.panix.com Roy Smith r...@panix.com wrote: Thread.join() currently returns None, so there's no chance for [return value] confusion. Well, still some actually. If you use my example code (posted elsethread), you need to know: - that there was a target function (my default return value if there is none is None); and - that the joined thread really did finish (if you pass a timeout value, rather than None, and the join times out, the return value is again None). Of course, if your target function always exists and never returns None, *then* there's no chance for confusion. :-) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Algorithms Library - Asking for Pointers
In article 18fe4afd-569b-4580-a629-50f6c7482...@c29g2000yqd.googlegroups.com Travis Parks jehugalea...@gmail.com wrote: [Someone] commented that the itertools algorithms will perform faster than the hand-written ones. Are these algorithms optimized internally? They are written in C, so avoid a lot of CPython interpreter overhead. Mileage in Jython, etc., may vary... -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do class methods always need 'self' as the first parameter?
In article 0dc26f12-2541-4d41-8678-4fa53f347...@g9g2000yqb.googlegroups.com T. Goodchild asked, in part: ... One of the things that bugs me is the requirement that all class methods have 'self' as their first parameter. In article 4e5e5628$0$29977$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: [Comprehensive reply, noting that these are actually instance methods, and that there are class and static methods as well]: Python does have class methods, which receive the class, not the instance, as the first parameter. These are usually written something like this: class K(object): @classmethod def spam(cls, args): print cls # always prints class K, never the instance Just like self, the name cls is a convention only. Class methods are usually used for alternate constructors. There are also static methods, which don't receive any special first argument, plus any other sort of method you can invent, by creating descriptors... but that's getting into fairly advanced territory. ... [rest snipped] I am not sure whether T. Goodchild was asking any of the above or perhaps also one other possible question: if an instance method is going to receive an automatic first self parameter, why require the programmer to write that parameter in the def? For instance we *could* have: class K(object): def meth1(arg1, arg2): self.arg1 = arg1 # self is magically available self.arg2 = arg2 @classmethod def meth2(arg): use(cls) # cls is magically available and so on. This would work fine. It just requires a bit of implicit sneakiness in the compiler: an instance method magically creates a local variable named self that binds to the invisible first parameter, and a class method magically creates a local variable named cls that binds to the invisible first parameter, and so on. Instead, we have a syntax where you, the programmer, write out the name of the local variable that binds to the first parameter. This means the first parameter is visible. Except, it is only visible at the function definition -- when you have the instance and call the instance or class method: black_knight = K() black_knight.meth1('a', 1) black_knight.meth2(2) the first parameters (black_knight, and black_knight.__class__, respectively) are magic, and invisible. Thus, Python is using the explicit is better than implicit rule in the definition, but not at the call site. I have no problem with this. Sometimes I think implicit is better than explicit. In this case, there is no need to distinguish, at the calls to meth1() and meth2(), as to whether they are class or instance methods. At the *calls* they would just be distractions. At the *definitions*, they are not as distraction-y since it is important to know, during the definition, whether you are operating on an instance (meth1) or the class itself (meth2), or for that matter on neither (static methods). One could determine this from the absence or presence of @classmethod or @staticmethod, but the minor redundancy in the def statement seems, well, minor. Also, as a bonus, it lets you obfuscate the code by using a name other than self or cls. :-) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: try... except with unknown error types
In article mailman.286.1313956388.27778.python-l...@python.org, Terry Reedy tjre...@udel.edu wrote: I would expect that catching socket.error (or even IOError) should catch all of those. exception socket.error A subclass of IOError ... Except that, as Steven D'Aprano almost noted elsethread, it isn't (a subclass of IOError -- the note was that it is not a subclass of EnvironmentError). In 2.x anyway: import socket isinstance(socket.error, IOError) False isinstance(socket.error, EnvironmentError) False (I just catch socket.error directly for this case.) (I have also never been sure whether something is going to raise an IOError or an OSError for various OS-related read or write operation failures -- such as exceeding a resource limit, for instance -- so most places that do I/O operations on OS files, I catch both. Still, it sure would be nice to have a static analysis tool that could answer questions about potential exceptions. :-) ) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Why no warnings when re-assigning builtin names?
(I realize this thread is old. I have been away for a few weeks. I read through the whole thread, though, and did not see anyone bring up this one particular point: there is already a linting script that handles this.) On Mon, Aug 15, 2011 at 10:52 PM, Gerrat Rickert grick...@coldstorage.com wrote: With surprising regularity, I see program postings (eg. on StackOverflow) from inexperienced Python users accidentally re-assigning built-in names. For example, they'll innocently call some variable, `list', and assign a list of items to it. In article mailman.22.1313446504.27778.python-l...@python.org Chris Angelico ros...@gmail.com wrote: It's actually masking, not reassigning. That may make it easier or harder to resolve the issue. If you want a future directive that deals with it, I'd do it the other way - from __future__ import mask_builtin_warning or something - so the default remains as it currently is. But this may be a better job for a linting script. The pylint program already does this: $ cat shado.py module doc def func(list): func doc return list $ pylint shado.py * Module shado W0622: 2:func: Redefining built-in 'list' ... Your code has been rated at 6.67/10 If your shadowing is done on purpose, you can put in a pylint comment directive to suppress the warning. Pylint is the American Express Card of Python coding: don't leave $HOME without it! :-) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Why do class methods always need 'self' as the first parameter?
= X() x.xset('value') x.show() -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Early binding as an option
Chris Angelico wrote: [snippage] def func(x): len = len # localize len for i in x: len(i) # use it exactly as you otherwise would In article 4e39a6b5$0$29973$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: That can't work. The problem is that because len is assigned to in the body of the function, the Python compiler treats it as a local variable. So the line len=len is interpreted as locallen = locallen, which doesn't yet exist. There's no way of saying locallen = globallen in the body of the function. So you must either: (1) use a different name: length = len (2) use a fully-qualified name: import builtins; len = builtins.len (This is my preferred form, given what one has now, if one is going to do this in the function body. Of course in 2.x it is spelled __builtin__.len instead...) (3) do the assignment as a default parameter, which has slightly different binding rules: def func(x, locallen=globallen) (4) manual lookup: len = builtins.__dict__['len'] # untested I don't recommend that last one, unless you're deliberately trying to write obfuscated code :) If Python *were* to have some kind of tie this symbol down now operation / keyword / whatever, one could write: def func(x): snap len # here the new keyword is snap for i in x: ... len(i) ... # use it exactly as you otherwise would Of course there is no *need* for any new syntax with the other construction: def func(x, len=len) # snapshots len at def() time for i in x: ... len(i) ... but if one were to add it, it might look like: def func(x, snap len): The advantage (?) of something like a snap or snapshot or whatever keyword / magic-function / whatever is that you can apply it to more than just function names, e.g.: def func(arg): # for this example, assume that arg needs to have the # following attributes: snap arg.kloniblatt, arg.frinkle, arg.tippy ... Here, in the ... section, a compiler (whether byte-code, or JIT, or whatever -- JIT makes the most sense in this case) could grab the attributes, looking up their types and any associated stuff it wants to, and then assume that for the rest of that function's execution, those are not allowed to change. (But other arg.whatever items are, here. If you want to bind *everything*, perhaps snap arg or snap arg.* -- see below.) Even a traditional (CPython) byte-code compiler could do something sort of clever here, by making those attributes read-only to whatever extent the snapshot operation is defined as fixing the binding (e.g., does it recurse into sub-attributes? does it bind only the name-and-type, or does it bind name-and-type-and-value, or name-and-type-and-function-address-if-function, or ... -- all of which has to be pinned down before any such suggestion is taken too seriously :-) ). -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How should I document exceptions thrown by a method?
Arcadio arcadiosinc...@gmail.com writes: I have a Settings class that is used to hold application settings. A Settings object initializes itself from a ConfigParser that gets passed in as an argument to the constructor. In article 87oc0fpg9o@benfinney.id.au Ben Finney ben+pyt...@benfinney.id.au wrote: So the caller is aware of, and takes responsibility for, the ConfigParser instance. If a setting isn't found in whatever the ConfigParser is reading settings from, the ConfigParser's get() method will raise an exception. Should I just say that clients of my Settings class should be prepared to catch exceptions thrown by ConfigParser? Do I even have to mention that as it might be just implied? In this case IMO it is implied that one might get exceptions from the object one passes as an argument to a callable. Yes. But on the other hand: import this The Zen of Python, by Tim Peters Beautiful is better than ugly. Explicit is better than implicit. ... :-) I would suggest that in this case, too, explicit is better than implicit: the documentation should say will invoke x.get() and therefore propagate any exception that x.get() might raise. Or should Setting's constructor catch any exceptions raised by the ConfigParser and convert it to a Settings- specific exception class that I then document? Please, no. Since the ConfigParser object was created and passed in by the calling code, the calling code needs to know about the exceptions from that object. In *some* cases (probably not applicable here), one finds a good reason to transform one exception to another. In this case I agree with Ben Finney though, and so does import this: ... Simple is better than complex. ... Letting exceptions flow upward unchanged is (usually) simpler, hence better. -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Use self.vars in class.method(parameters, self.vars)
In article 0ddc2626-7b99-46ee-9974-87439ae09...@e40g2000yqn.googlegroups.com caccolangrifata caccolangrif...@gmail.com wrote: I'm very very new with python, and I have some experience with java programming, so probably you guys will notice. Anyway this is my question: I'd like to use class scope vars in method parameter ... Others have answered what appears to have been your actual question. Here's an example of using an actual class scope variable. (Note: I have a sinus headache, which is probably the source of some of the weirder names :-) ) class Florg(object): _DEFAULT_IPPY = 17 @classmethod def set_default_ippy(cls, ippy): cls._DEFAULT_IPPY = ippy def __init__(self, name, ippy = None): if ippy is None: ippy = self.__class__._DEFAULT_IPPY self.name = name self.ippy = ippy def zormonkle(self): print('%s ippy = %s' % (self.name, self.ippy)) def example(): flist = [Florg('first')] flist.append(Florg('second')) flist.append(Florg('third', 5)) Florg.set_default_ippy(-4) flist.append(Florg('fourth')) flist.append(Florg('fifth', 5)) for florg in flist: florg.zormonkle() if __name__ == '__main__': example() -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Python ++ Operator?
In article mailman.1057.1310717193.1164.python-l...@python.org Chris Angelico ros...@gmail.com wrote: I agree that [C's ++ operators are] often confusing (i+j) ... For what it is worth, this has to be written as: i++ + ++j /* or i+++ ++j */ or similar (e.g., newline after the middle + operator) as the lexer will group adjacent ++ characters into a single ++ operator whenever it can (the so-called greedy matching that regular expression recognizers are famous for), and only later will the parser and semantic analysis phases realize that i++ ++ +j is invalid and complain. but there are several places where they're handy. ... However, Python doesn't work as close to the bare metal, so it doesn't have such constructs. More specifically, Python has appropriate higher-level constructs that, in effect, maintain mental invariants in a better (for some value of better) way. Instead of: lst[i++] = val; /* or: *p++ = val; */ which has the effect of appending an item to an array-based list of items -- the invariant here is that i (or p in the pointer version) always tells you where the place the *next* item -- one simply writes: lst.append(val) (which also makes sure that there is *room* in the array-based list, something that requires a separate step in C). -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Python ++ Operator?
In article mailman.1055.1310716536.1164.python-l...@python.org Chris Angelico ros...@gmail.com wrote: 2011/7/15 Rafael Durán Castañeda rafadurancastan...@gmail.com: Hello all, What's the meaning of using i++? Even, does exist ++ operator in python? ++i is legal Python but fairly useless. It's the unary + operator, applied twice. It doesn't increment the variable. Well... class Silly: def __init__(self, value): self.value = value self._pluscount = 0 def __str__(self): return str(self.value) def __pos__(self): self._pluscount += 1 if self._pluscount == 2: self.value += 1 self._pluscount = 0 return self def main(): i = Silly(0) print('initially, i = %s' % i) print('plus-plus i = %s' % ++i) print('finally, i = %s' % i) main() :-) (Of course, +i followed by +i *also* increments i...) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: difflib-like library supporting moved blocks detection?
In article mailman.1002.1310591600.1164.python-l...@python.org Vlastimil Brom vlastimil.b...@gmail.com wrote: I'd like to ask about the availability of a text diff library, like difflib, which would support the detection of moved text blocks. If you allow arbitrary moves, the minimal edit distance problem (string-to-string edit) becomes substantially harder. If you only allow insert, delete, or in-place-substitute, you have what is called the Levenshtein distance case. If you allow transpositions you get Damerau-Levenshtein. These are both solveable with a dynamic programming algorithm. Once you allow move operations, though, the problem becomes NP-complete. See http://pages.cs.brandeis.edu/~shapird/publications/JDAmoves.pdf for instance. (They give an algorithm that produces usually acceptable results in polynomial time.) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Does hashlib support a file mode?
- Do the usual dance for default arguments: def file_to_hash(path, m=None): if m is None: m = hashlib.md5() [instead of def file_to_hash(path, m = hashlib.md5()): ] In article b317226a-8008-4177-aaa6-3fdc30125...@e20g2000prf.googlegroups.com Phlip phlip2...@gmail.com wrote: Not sure why if that's what the defaulter does? For the same reason that: def spam(somelist, so_far = []): for i in somelist: if has_eggs(i): so_far.append(i) return munch(so_far) is probably wrong. Most beginners appear to expect this to take a list of things that pass my has_eggs test, add more things to that list, and return whatever munch(adjusted_list) returns ... which it does. But then they *also* expect: result1_on_clean_list = spam(list1) result2_on_clean_list = spam(list2) result3_on_partly_filled_list = spam(list3, prefilled3) to run with a clean so_far list for *each* of the first two calls ... but it does not; the first call starts with a clean list, and the second one starts with so_far containing all the results accumulated from list1. (The third call, of course, starts with the prefilled3 list and adjusts that list.) I did indeed get an MD5-style string of what casually appeared to be the right length, so that implies the defaulter is not to blame... In this case, if you do: print('big1:', file_to_hash('big1')) print('big2:', file_to_hash('big2')) you will get two md5sum values for your two files, but the md5sum value for big2 will not be the equivalent of md5sum big2 but rather that of cat big1 big2 | md5sum. The reason is that you are re-using the md5-sum-so-far on the second call (for file 'big2'), so you have the accumulated sum from file 'big1', which you then update via the contents of 'big2'. -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: subtle error slows code by 10x (builtin sum()) - replace builtin sum without using import?
In article f6dbf631-73a9-485f-8ada-bc7376ac6...@h25g2000prf.googlegroups.com bdb112 boyd.blackw...@gmail.com wrote: First a trap for new players, then a question to developers Code accelerated by numpy can be slowed down by a large factor is you neglect to import numpy.sum . from timeit import Timer frag = 'x=sum(linspace(0,1,1000))' Timer(frag ,setup='from numpy import linspace').timeit(1000) # 0.6 sec Timer(frag, setup='from numpy import sum, linspace').timeit(1000) # difference is I import numpy.sum # 0.04 sec 15x faster! This is obvious of course - but it is very easy to forget to import numpy.sum and pay the price in execution. Question: Can I replace the builtin sum function globally for test purposes so that my large set of codes uses the replacement? The replacement would simply issue warnings.warn() if it detected an ndarray argument, then call the original sum I could then find the offending code and use the appropriate import to get numpy.sum Sure, just execute code along these lines before running any of the tests: import __builtin__ import warnings _sys_sum = sum # grab it before we change __builtin__.sum def hacked_sum(sequence, start=0): if isinstance(sequence, whatever): warnings.warn('your warning here') return _sys_sum(sequence, start) __builtin__.sum = hacked_sum (You might want to grab a stack trace too, using the traceback module.) You said without using import but all you have to do is arrange for python to import this module before running any of your own code, e.g., with $PYTHONHOME and a modified site file. -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Safely modify a file in place -- am I doing it right?
In article 4e0b6383$0$29996$c3e8da3$54964...@news.astraweb.com steve+comp.lang.pyt...@pearwood.info wrote: I have a script running under Python 2.5 that needs to modify files in place. I want to do this with some level of assurance that I won't lose data. ... I have come up with this approach: [create temp file in suitable directory, write new data, and use os.rename() to atomically swap out the old file for the new] As Grant Edwards said, this is the right general idea. There are lots of variations. If you want to make the original be a backup, the sequence: os.link(original_name, backup_name) os.rename(new_synced_file, original_name) should generally do the trick (rename will unlink the target which means that the backup name will refer to the original inode). import os, tempfile def safe_modify(filename): fp = open(filename, 'r') data = modify(fp.read()) fp.close() # Use a temporary file. loc = os.path.dirname(filename) fd, tmpname = tempfile.mkstemp(dir=loc, text=True) # In my real code, I need a proper Python file object, # not just a file descriptor. outfile = os.fdopen(fd, 'w') outfile.write(data) outfile.close() It is a good idea to use outfile.flush() and then os.fsync() before doing the close, as well. Among other things, this *usually* gets you some kind of notice-of-failure in the case of deferred writes across a network (e.g., NFS). (While it would be nice for os.close() to deliver failure notices, in practice the fsync() is at least sometimes required. This is the OS's fault, not Python's. :-) ) # Move the temp file over the original. os.rename(tmpname, filename) os.rename is an atomic operation, at least under Linux and Mac, so if the move fails, the original file should be untouched. This seems to work for me, but is this the right way to do it? Is there a better/safer way? For additional checking and cleanup purposes, you may want to catch exceptions and delete the temporary file if the rename has not yet been done (and therefore the original file is still intact). You will likely also need to fiddle with the permission bits on the file resulting from the mkstemp() call (to make them match those on the original file). Alternatively, you may want to build your own mkstemp() (this can be a bit of a challenge!). Finally, as I implied above in talking about the os.link()-then- os.rename() sequence, if the original file has multiple links to it, note that this breaks the links. If this is not what you want, the problem has no fully general solution (but there are various application-specific solutions). -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Interpreting Left to right?
(Re: x = x['huh'] = {} which binds x to a new dictionary, then binds that dictionary's 'huh' key to the same dictionary...) In article mailman.389.1308949722.1164.python-l...@python.org Tycho Andersen ty...@tycho.ws wrote: Perhaps I'm thick, but (the first thing I did was read the docs and) I still don't get it. From the docs: An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right. The target list in this case is, in effect: evail(x), eval(x['huh']) For a single target, it evaluates the RHS and assigns the result to the LHS. Thus x = x['foo'] = {} first evaluates x['foo'] = {} which should raise a NameError, since x doesn't exist yet. Where am I going wrong? I believe you are still reading this as: x = (something) and setting aside x and something, and only then peering into the something and finding: x['foo'] = {} and -- while keeping all of the other x = (something) at bay, trying to do the x['foo'] assignment. This is the wrong way to read it! The correct way to read it is: - Pseudo_eval(x) and pseudo_eval(x['foo']) are both to be set to something, so before we look any more closely at the x and x['foo'] part, we need to evaluate the something part. - The something part is: {}, so create a dictionary. There is no name bound to this result, but for discussion let's bind tmp to it. - Now that we have evaluated the RHS of the assignment statement (which we are calling tmp even though it has no actual name), *now* we can go eval() (sort of -- we only evaluate them for assignment, rather than for current value) the pieces of the LHS. - The first piece of the LHS is x. Eval-ing x for assignment gets us the as-yet-unbound x, and we do: x = tmp which binds x to the new dictionary. - The second piece of the LHS is x['foo']. Eval-ing this for assignment gets us the newly-bound x, naming the dictionary; the key 'foo', a string; and now we bind x['foo'], doing: x['foo'] = tmp which makes the dictionary contain itself. Again, Python's assignment statement (not expression) has the form: one or more LHS = parts, AKA target list expression-list and the evaluation order is, in effect and using pseudo-Python: 1. expression-list -- the (single) RHS tmp = eval(expression-list) 2. for LHS-part in target-list: # left-to-right LHS-part = tmp When there is only one item in the target-list (i.e., just one x = part in the whole statement), or when all of the parts of the target-list are independent of each other and of the expression-list, the order does not matter. When the parts are interdependent, then this left-to-right order *is* important. -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Significant figures calculation
In article mailman.386.1308949143.1164.python-l...@python.org Jerry Hill malaclyp...@gmail.com wrote: I'm curious. Is there a way to get the number of significant digits for a particular Decimal instance? Yes: def sigdig(x): return the number of significant digits in x return len(x.as_tuple()[1]) import decimal D = decimal.Decimal for x in ( '1', '1.00', '1.23400e-8', '0.003' ): print 'sigdig(%s): %d' % (x, sigdig(D(x))) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: those darn exceptions
Chris Torek wrote: I can then check the now-valid pid via os.kill(). However, it turns out that one form of trash is a pid that does not fit within sys.maxint. This was a surprise that turned up only in testing, and even then, only because I happened to try a ridiculously large value as one of my test cases. In article 96itucfad...@mid.individual.net Gregory Ewing greg.ew...@canterbury.ac.nz wrote: It appears that this situation is not unique to os.kill(), for example, import os os.read(, 42) Traceback (most recent call last): File stdin, line 1, in module OverflowError: Python int too large to convert to C long In fact I'd expect it to happen any time you pass a very large int to something that's wrapping a C function. You can't really blame the wrappers for this -- it's not reasonable to expect all of them to catch out-of-range ints and do whatever the underlying function would have done if it were given an invalid argument. I think the lesson to take from this is that you should probably add OverflowError to the list of things to catch whenever you're calling a function with input that's not fully validated. Indeed. (Provided that your call is the point at which the validation should occur -- otherwise, let the exception flow upwards as usual.) But again, this is why I would like to have the ability to use some sort of automated tool, where one can point at any given line of code and ask: what exceptions do you, my faithful tool, believe can be raised as a consequence of this line of code? If you point it at the call to main(): if __name__ == '__main__': main() then you are likely to get a useless answer (why, any exception at all); but if you point it at a call to os.read(), then you get one that is useful -- and tells you (or me) about the OverflowError. If you point it at a call to len(x), then the tool tells you what it knows about type(x) and x.__len__. (This last may well be nothing: some tools have only limited application. However, if the call to len(x) is preceded by an assert isinstance(x, (some,fixed,set,of,types)) for instance, or if all calls to the function that in turn calls len(x) are visible and the type of x can be proven, the tool might tell you something useful agin.) It is clear at this point that a simple list (or tuple) of possible exceptions is insufficient -- the tool has to learn, somehow, that len() raises TypeError itself, but also raises whatever x.__len__ raises (where x is the parameter to len()). If I ever get around to attempting this in pylint (in my Copious Spare Time no doubt :-) ), I will have to start with an external mapping from built in function F to exceptions that F raises and figure out an appropriate format for the table's entries. That is about half the point of this discussion (to provoke thought about how one might express this); the other half is to note that the documentation could probably be improved (as someone else already noted elsethread). Note that, if nothing else, the tool -- even in limited form, without the kind of type inference that pylint attempts -- gives you the ability to automate part of the documentation process. -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: writable iterators?
, sequence): if isinstance(sequence, dict): self._iter = self._dict_iter self._get = self._dict_get self._set = self._dict_set elif isinstance(sequence, list): self._iter = self._list_iter self._get = self._list_get self._set = self._list_set else: raise IndirectIterError( don't know how to IndirectIter over %s % type(sequence)) self._seq = sequence def __str__(self): return '%s(%s)' % (self.__class__.__name__, self._iterover) def __iter__(self): return self._iter() def _dict_iter(self): return _IInner(self, self._seq.keys()) def _dict_get(self, index, keys): return self._seq[keys[index]] def _dict_set(self, index, keys, newvalue): self._seq[keys[index]] = newvalue def _list_iter(self): return _IInner(self, self._seq) def _list_get(self, index, _): return self._seq[index] def _list_set(self, index, _, newvalue): self._seq[index] = newvalue if __name__ == '__main__': d = {'one': 1, 'two': 2, 'three': 3} l = [9, 8, 7] print 'modify dict %r' % d for i in IndirectIter(d): i.set(-i.get()) print 'result: %r' % d print print 'modify list %r' % l for i in IndirectIter(l): i.set(-i.get()) print 'result: %r' % l -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: those darn exceptions
In article 96gb36fc6...@mid.individual.net, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Chris Torek wrote: Oops! It turns out that os.kill() can raise OverflowError (at least in this version of Python, not sure what Python 3.x does). Seems to me that if this happens it indicates a bug in your code. It only makes sense to pass kill() something that you know to be the pid of an existing process, presumably one returned by some other system call. So if kill() raises OverflowError, you *don't* want to catch and ignore it. You want to find out about it, just as much as you want to find out about a TypeError, so you can track down the cause and fix it. A bunch of you are missing the point here, perhaps because my original example was not the best, as it were. (I wrote it on the fly; the actual code was elsewhere at the time.) I do, indeed, want to find out about it. But in this case what I want to find out is the number I thought was a pid, was not a pid, and I want to find that out early and catch the OverflowError() in the function in question. (The two applications here are a daemon and a daemon-status-checking program. The daemon needs to see if another instance of itself is already running [*]. The status-checking program needs to see if the daemon is running [*]. Both open a pid file and read the contents. The contents might be stale or trash. I can check for trash because int(some_string) raises ValueError. I can then check the now-valid pid via os.kill(). However, it turns out that one form of trash is a pid that does not fit within sys.maxint. This was a surprise that turned up only in testing, and even then, only because I happened to try a ridiculously large value as one of my test cases. It *should*, for some value of should :-) , have turned up much earlier, such as when running pylint.) ([*] The test does not have to be perfect, but it sure would be nice if it did not result in a Python stack dump. :-) ) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: writable iterators?
In article iu00fs1...@news3.newsguy.com I wrote, in part: Another possible syntax: for item in container with key: which translates roughly to bind both key and item to the value for lists, but bind key to the key and value for the value for dictionary-ish items. Then ... the OP would write, e.g.: for elem in sequence with index: ... sequence[index] = newvalue which of course calls the usual container.__setitem__. In this case the new protocol is to have iterators define a function that returns not just the next value in the sequence, but also an appropriate key argument to __setitem__. For lists, this is just the index; for dictionaries, it is the key; for other containers, it is whatever they use for their keys. I note I seem to have switched halfway through thinking about this from value to index for lists, and not written that. :-) Here's a sample of a simple generator that does the trick for list, buffer, and dict: def indexed_seq(seq): produce a pair key_or_index value such that seq[key_or_index] is value initially; you can write on seq[key_or_index] to set a new value while this operates. Note that we don't allow tuple and string here since they are not writeable. if isinstance(seq, (list, buffer)): for i, v in enumerate(seq): yield i, v elif isinstance(seq, dict): for k in seq: yield k, seq[k] else: raise TypeError(don't know how to index %s % type(seq)) which shows that there is no need for a new syntax. (Turning the above into an iterator, and handling container classes that have an __iter__ callable that produces an iterator that defines an appropriate index-and-value-getter, is left as an exercise. :-) ) -- In-Real-Life: Chris Torek, Wind River Systems Intel require I note that my opinions are not those of WRS or Intel Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How can I speed up a script that iterates over a large range (600 billion)?
, 1, 0, 0, 1, 1, 0, 0) MODULOS = frozenset( (1, 7, 11, 13, 17, 19, 23, 29) ) # If we started counting from 7, we'd want: # itertools.compress(itertools.count(7,2), itertools.cycle(MASK)) # But we start counting from q which means we need to clip off # the first ((q - 7) % 30) // 2 items: offset = ((q - 7) % 30) // 2 for q in itertools.compress(itertools.count(q, 2), itertools.islice(itertools.cycle(MASK), offset, None, 1)): p = D.pop(q, None) if p is None: D[q * q] = q primes.primes.append(q) yield q else: twop = p + p x = q + twop while x in D or (x % 30) not in MODULOS: x += twop D[x] = p def factors(num): Return all the prime factors of the given number. if num 0: num = -num if num 2: return for p in primes(): q, r = divmod(num, p) while r == 0: yield p if q == 1: return num = q q, r = divmod(num, p) if __name__ == '__main__': for arg in (sys.argv[1:] if len(sys.argv) 1 else ['600851475143']): try: arg = int(arg) except ValueError, error: print error else: print '%d:' % arg, for fac in factors(arg): print fac, sys.stdout.flush() print -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How to get return values of a forked process
On Tue, Jun 21, 2011 at 12:26 PM, Ian ian.l...@rocketmail.com wrote: myForkedScript has code like this: if fail: os._exit(1) else: os._exit(os.EX_OK) Is using os._exit() the correct way to get a return value back to the main process? The correct way, no, but it is a correct way (and cheaper than using a pipe to pickle and unpickle failure, the way the subprocess module does it, for instance). In any case, you *should* call os._exit() either directly or indirectly after a successful fork but a failed exec. On Jun 21, 1:54 pm, Ian Kelly ian.g.ke...@gmail.com wrote: sys.exit() is the preferred way. Using sys.exit() after a fork() has other risks (principally, duplication of pending output when flushing write-mode streams), which is why os._exit() is provided. I thought the value 'n', passed in os._exit(n) would be the value I get returned. In the case of a failure, I get 256 returned rather than 1. According to the docs ... [snip documentation and description] However, I would advise using the subprocess module for this instead of the os module (which is just low-level wrappers around system calls). Indeed, subprocess gives you convenience, safety, and platform independence (at least across POSIX-and-Windows) with a relatively low cost. As long as the cost is low enough (and it probably is) I agree with this. In article d195a74d-e173-4168-8812-c03fc02e8...@fr19g2000vbb.googlegroups.com Ian ian.l...@rocketmail.com wrote: Where did you find the Unix docs you pasted in? I didn't find it in the man pages. Thank you. Based on what you say, I will change my os._exit() to sys.exit(). Not sure where Ian Kelly's documentation came from, but note that on Unix, the os module also provides os.WIFSIGNALED, os.WTERMSIG, os.WIFEXITED, and os.WEXITSTATUS for dissecting the status integer returned from the various os.wait* calls. Again, if you use the subprocess module, you are insulated from this sort of detail (which, as always, has both advantages and disadvantages). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: those darn exceptions
On Tue, 21 Jun 2011 01:43:39 +, Chris Torek wrote: But how can I know a priori that os.kill() could raise OverflowError in the first place? In article 4e006912$0$29982$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: You can't. Even if you studied the source code, you couldn't be sure that it won't change in the future. Or that somebody will monkey-patch os.kill, or a dependency, introducing a new exception. Indeed. However, if functions that know which exceptions they themselves can raise declare this (through an __exceptions__ attribute for instance), then whoever changes the source or monkey-patches os.kill can also make the appropriate change to os.kill.__exceptions__. More importantly though, most functions are reliant on their argument. You *cannot* tell what exceptions len(x) will raise, because that depends on what type(x).__len__ does -- and that could be anything. So, in principle, any function could raise any exception. Yes; this is exactly why you need a type-inference engine to make this work. In this case, len() is more (though not quite exactly) like the following user-defined function: def len2(x): try: fn = x.__len__ except AttributeError: raise TypeError(object of type %r has no len() % type(x)) return fn() eg: len(3) Traceback (most recent call last): File stdin, line 1, in module TypeError: object of type 'int' has no len() len2(3) Traceback (most recent call last): File stdin, line 1, in module File stdin, line 5, in len2 TypeError: object of type type 'int' has no len() In this case, len would not have any __exceptions__ field (or if it does, it would not be a one-element tuple, but I currently think it makes more sense for many of the built-ins to resort to rules in the inference engine). This is also the case for most operators, e.g., ordinary + (or operator.add) is syntactic sugar for: first_operand.__add__(second_operand) or: second_operand.__radd__(first_operand) depending on both operands' types and the first operand's __add__. The general case is clearly unsolveable (being isomorphic to the halting problem), but this is not in itself an excuse for attempting to solve more-specific cases. A better excuse -- which may well be better enough :-) -- occurs when the specific cases that *can* be solved are so relatively-rare that the approach degenerates into uselessness. It is worth noting that the approach I have in mind does not survive pickling, which means a very large subset of Python code is indigestible to a pylint-like exception-inference engine. Another question -- is the list of exceptions part of the function's official API? *All* of the exceptions listed, or only some of them? All the ones directly-raised. What to do about invisible dependencies (such as those in len2() if len2 is invisible, e.g., coded in C rather than Python) is ... less obvious. :-) In general, you can't do this at compile-time, only at runtime. There's no point inspecting len.__exceptions__ at compile-time if len is a different function at runtime. Right. Which is why pylint is fallible ... yet pylint is still valuable. At least, I find it so. It misses a lot of important things -- it loses types across list operations, for instance -- but it catches enough to help. Here is a made-up example based on actual errors I have found via pylint: doc class Frob(object): doc def __init__(self, arg1, arg2): self.arg1 = arg1 self.arg2 = arg2 def frob(self, nicate): frobnicate the frob self.arg1 += nicate def quux(self): return the frobnicated value example = self # demonstrate that pylint is not using the *name* return example.argl # typo, meant arg1 ... $ pylint frob.py * Module frob E1101: 15:Frob.quux: Instance of 'Frob' has no 'argl' member (Loses types across list operations means that, e.g.: def quux(self): return [self][0].argl hides the type, and hence the typo, from pylint. At some point I intend to go in and modify it to track the element-types of list elements: in enough cases, a list's elements all have the same type, which means we can predict the type of list[i]. If a list contains mixed types, of course, we have to fall back to the failure-to-infer case.) (This also shows that much real code might raise IndexError: any list subscript that is out of range does so. So a lot of real functions *might* raise IndexError, etc., which is another argument that in real code, an exception inference engine will wind up concluding that every line might raise every exception. Which might be true, but I still believe, for the moment, that a tool for inferring exceptions would have some value.) -- In-Real-Life: Chris Torek, Wind River
Re: Boolean result of divmod
In article 261fc85a-ca6b-4520-93ed-27e78bc21...@y30g2000yqb.googlegroups.com Gnarlodious gnarlodi...@gmail.com wrote: What is the easiest way to get the first number as boolean? divmod(99.6, 30.1) divmod returns a 2-tuple: divmod(99.6,30.1) (3.0, 9.2901) Therefore, you can subscript the return value to get either element: divmod(99.6,30.1)[0] 3.0 Thus, you can call bool() on the subscripted value to convert this to True-if-not-zero False-if-zero: bool(divmod(99.6,30.1)[0]) True -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
those darn exceptions
Exceptions are great, but... Sometimes when calling a function, you want to catch some or even all the various exceptions it could raise. What exceptions *are* those? It can be pretty obvious. For instance, the os.* modules raise OSError on errors. The examples here are slightly silly until I reach the real code at the bottom, but perhaps one will get the point: import os os.kill(getpid(), 0) # am I alive? # yep, I am alive. ... [I'm not sure why the interpreter wants more after my comment here.] os.kill(1, 0) # is init still running? Traceback (most recent call last): File stdin, line 1, in module OSError: [Errno 1] Operation not permitted # init is running, and I don't have permission to send it a signal ... os.kill(12345, 0) # what do we get for a pid that is NOT running? Traceback (most recent call last): File stdin, line 1, in module OSError: [Errno 3] No such process So now I am ready to write my is process pid running function: import os, errno def is_running(pid): Return True if the given pid is running, False if not. try: os.kill(pid, 0) except OSError, err: # We get an EPERM error if the pid is running # but we are not allowed to signal it (even with # signal 0). If we get any other error we'll assume # it's not running. if err.errno != errno.EPERM: return False return True This function works great, and never raises an exception itself. Or does it? is_running(1) True is_running(os.getpid()) True is_running(12345) False is_running(9) Traceback (most recent call last): File stdin, line 1, in module File stdin, line 3, in is_running OverflowError: long int too large to convert to int Oops! It turns out that os.kill() can raise OverflowError (at least in this version of Python, not sure what Python 3.x does). Now, I could add, to is_running, the clause: except OverflowError: return False (which is what I did in the real code). But how can I know a priori that os.kill() could raise OverflowError in the first place? This is not documented, as far as I can tell. One might guess that os.kill() would raise TypeError for things that are not integers (this is the case) but presumably we do NOT want to catch that here. For the same reason, I certainly do not want to put in a full-blown: except Exception: return False It would be better just to note somewhere that OverflowError is one of the errors that os.kill() normally produces (and then, presumably, document just when this happens, so although having noted that it can, one could make an educated guess). Functions have a number of special __ attributes. I think it might be reasonable to have all of the built-in functions, at least, have one more, perhaps spelled __exceptions__, that gives you a tuple of all the exceptions that the function might raise. Imagine, then: os.kill.__doc__ 'kill(pid, sig)\n\nKill a process with a signal.' [this part exists] os.kill.__exceptions__ (type 'exceptions.OSError', type 'exceptions.TypeError', type 'exceptions.OverflowError', type 'exceptions.DeprecationWarning') [this is my new proposed part] With something like this, a pylint-like tool could compute the transitive closure of all the exceptions that could occur in any function, by using __exceptions__ (if provided) or recursively finding exceptions for all functions called, and doing a set-union. You could then ask which exceptions can occur at any particular call site, and see if you have handled them, or at least, all the ones you intend to handle. (The DeprecationWarning occurs if you pass a float to os.kill() -- which I would not want to catch. Presumably the pylint-like tool, which might very well *be* pylint, would have a comment directive you would put in saying I am deliberately allowing these exceptions to pass on to my caller, for the case where you are asking it to tell you which exceptions you may have forgotten to catch.) User functions could set __exceptions__ for documentation purposes and/or speeding up this pylint-like tool. (Obviously, user-provided functions might raise exception classes that are only defined in user-provided code -- but to raise them, those functions have to include whatever code defines them, so I think this all just works.) The key thing needed to make this work, though, is the base cases for system-provided code written in C, which pylint by definition cannot inspect to find a set of exceptions that might be raised. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40
Re: those darn exceptions
In article mailman.211.1308626356.1164.python-l...@python.org Chris Angelico ros...@gmail.com wrote: Interesting concept of pulling out all possible exceptions. Would be theoretically possible to build a table that keeps track of them, but automated tools may have problems: a=5; b=7; c=12 d=1/(a+b-c) # This could throw ZeroDivisionError if a+bc: d=1/(a+b-c) # This can't, because it's guarded. else: d=2 And don't tell me to rewrite that with try/except, because it's not the same :) I don't know if pylint is currently (or eventually :-) ) smart enough to realize that the if test here guarantees that a+b-c 0 (if indeed it does guarantee it -- this depends on the types of a, b, and c and the operations invoked by the + and - operators here! -- but pylint *does* track all the types, to the extent that it can, so it has, in theory, enough information to figure this out for integers, at least). If not, though, you could simply tell pylint not to complain here (via the usual # pylint: disable=ID, presumably), rather than coding it as a try/except sequence. I'd be inclined to have comments about the exceptions that this can itself produce, but if there's exceptions that come from illogical arguments (like the TypeError above), then just ignore them and let them propagate. If is_process(asdf) throws TypeError rather than returning False, I would call that acceptable behaviour. Right, this is precisely what I want: the ability to determine which exceptions something might raise, catch some subset of them, and allow the remaining ones to propagate. I can do the catch subset, allow remainder to propagate but the first step -- determine possible exceptions -- is far too difficult right now. I have not found any documentation that points out that os.kill() can raise TypeError, OverflowError, and DeprecationWarning. TypeError was not a *surprise*, but the other two were. (And this is only os.kill(). What about, say, subprocess.Popen()? Strictly speaking, type inference cannot help quite enough here, because the subprocess module does this: data = self._read_no_intr(errpipe_read, 1048576) # Exceptions limited to 1 MB os.close(errpipe_read) if data != : self._waitpid_no_intr(self.pid, 0) child_exception = pickle.loads(data) raise child_exception and the pickle.loads() can create any exception sent to it from the child, which can truly be any exception due to catching all exceptions raised in preexec_fn, if there is one. Pylint can't do type inference across the error-pipe between child and parent here. However, it would suffice to set subprocess.__exceptions__ to some reasonable tuple, and leave the preexec_fn exceptions to the text documentation. [Of course, strictly speaking, the fact that the read cuts off at 1 MB means that even the pickle.loads() call might fail! But a megabyte of exception trace is probably plenty. :-) ]) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Improper creating of logger instances or a Memory Leak?
In article ebafe7b6-aa93-4847-81d6-12d396a4f...@j28g2000vbp.googlegroups.com foobar wjship...@gmail.com wrote: I've run across a memory leak in a long running process which I can't determine if its my issue or if its the logger. You do not say what version of python you are using, but on the other hand I do not know how much the logger code has evolved over time anyway. :-) Each application thread gets a logger instance in it's init() method via: self.logger = logging.getLogger('ivr-'+str(self.rand)) where self.rand is a suitably large random number to avoid collisions of the log file's name. This instance will live forever (since the thread shares the main logging manager with all other threads). - class Manager: There is [under normal circumstances] just one Manager instance, which holds the hierarchy of loggers. def __init__(self, rootnode): Initialize the manager with the root node of the logger hierarchy. [snip] self.loggerDict = {} def getLogger(self, name): Get a logger with the specified name (channel name), creating it if it doesn't yet exist. This name is a dot-separated hierarchical name, such as a, a.b, a.b.c or similar. If a PlaceHolder existed for the specified name [i.e. the logger didn't exist but a child of it did], replace it with the created logger and fix up the parent/child references which pointed to the placeholder to now point to the logger. [snip] self.loggerDict[name] = rv [snip] [snip] Logger.manager = Manager(Logger.root) - So you will find all the various ivr-* loggers in logging.Logger.manager.loggerDict[]. finally the last statements in the run() method are: filehandler.close() self.logger.removeHandler(filehandler) del self.logger #this was added to try and force a clean up of the logger instances. There appears to be no __del__ handler and nothing that allows removing a logger instance from the manager's loggerDict. Of course you could do this manually, e.g.: ... self.logger.removeHandler(filehandler) del logging.Logger.manager.loggerDict[self.logger.name] del self.logger # optional I am curious as to why you create a new logger for each thread. The logging module has thread synchronization in it, so that you can share one log (or several logs) amongst all threads, which is more typically what one wants. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: import from environment path
In article 3a2b0261-ee10-40c0-8fad-342f186ee...@q30g2000yqb.googlegroups.com Guillaume Martel-Genest guillaum...@gmail.com wrote: Here's my situation : I got a script a.py that need to call b.py. The 2 scripts can't be in a same package. Script a.py knows the path of b.py relative to an environment variable B_PATH, let's say B_PATH/foo/ b.py. The solution I found is to do the flowwing : b_dir = os.path.join(os.environ['B_PATH'], 'foo') sys.path.append(b_dir) import b b.main() Is it the right way to do it, should I use subprocess.call instead? The right way depends on what you want to happen. Consider, e.g., the case where sys.path starts with: ['/some/where/here', '/some/where/there', ...] and program a.py lives in /some/where/here. Suppose B_PATH is '/where/b/is'. The sys.path.append will leave sys.path set to: ['/some/where/here', '/some/where/there', ..., '/where/b/is'] If /some/where/there happens to contain a b.py, your import b will load /some/where/there/b.py rather than /where/b/is/b.py. Did you want that? Well, then, good! If not ... bad! :-) Consider what happens if there is a bug in b.main(), or b.main() is missing entirely. Then import b works, but the call b.main() raises an exception directly in program a.py. Did you want that? Well, then, good! If not ... bad! :-) You might also want to take a look at PEP 302: http://www.python.org/dev/peps/pep-0302/ If you use subprocess to run program B, it cannot affect program A in any way that program A does not allow. This gives you a lot more control, with the price you pay being that you need to open some kind of communications channel between the two programs if you want more than the simplest kinds of data transfer. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to insert sorted in a list
In article c63e771c-8968-4d7a-9c69-b7fa6ff34...@35g2000prp.googlegroups.com SherjilOzair sherjiloz...@gmail.com wrote: There are basically two ways to go about this. One is, to append the new value, and then sort the list. Another is to traverse the list, and insert the new value at the appropriate position. The second one's complexity is O(N), while the first one's is O(N * log N). This is not quite right; see below. Still, the second one works much better, because C code is being used instead of pythons. Still, being a programmer, using the first way (a.insert(x); a.sort()), does not feel right. What has the community to say about this ? What is the best (fastest) way to insert sorted in a list ? In this case, the best way is most likely don't do that at all. First, we should note that a python list() data structure is actually an array. Thus, you can locate the correct insertion point pretty fast, by using a binary or (better but not as generally applicable) interpolative search to find the proper insertion point. Having found that point, though, there is still the expense of the insertion, which requires making some room in the array-that- makes-the-list (I will use the name a as you did above): position = locate_place_for_insert(a, the_item) # The above is O(log n) for binary search, # O(log log n) for interpolative search, where # n is len(a). a.insert(position, the_item) # This is still O(log n), alas. Appending to the list is much faster, and if you are going to dump a set of new items in, you can do that with: # wrong way: # for item in large_list: #a.append(item) # right way, but fundamentally still the same cost (constant # factor is much smaller due to built-in append()) a.append(large_list) If len(large_list) is m, this is O(m). Inserting each item in the right place would be O(m log (n + m)). But we still have to sort: a.sort() This is O(log (n + m)), hence likely better than repeatedly inserting in the correct place. Depending on your data and other needs, though, it might be best to use a red-black tree, an AVL tree, or a skip list. You might also investigate radix sort, radix trees, and ternary search trees (again depending on your data). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to insert sorted in a list
In article itgi3801...@news2.newsguy.com I wrote, in part: Appending to the list is much faster, and if you are going to dump a set of new items in, you can do that with: [...] In article mailman.96.1308348643.1164.python-l...@python.org Ethan Furman et...@stoneleaf.us wrote: a.append(large_list) ^- should be a.extend(large_list) Er, right. Posted in haste (had to get out the door). I also wrote: If len(large_list) is m, this is O(m). Inserting each item in the right place would be O(m log (n + m)). But we still have to sort: a.sort() In article mailman.98.1308353648.1164.python-l...@python.org, Ian Kelly ian.g.ke...@gmail.com wrote: This is O(log (n + m)), hence likely better than repeatedly inserting in the correct place. Surely you mean O((n + m) log (n + m)). Er, maybe? (It depends on the relative values of m and n, and the underlying sort algorithm to some extent. Some algorithms are better at inserting a relatively small number of items into a mostly-sorted large list. As I recall, Shell sort does well with this.) But generally, yes. See posted in haste above. :-) There are a lot of other options, such as sorting just the list of items to be inserted, which lets you do a single merge pass: # UNTESTED def merge_sorted(it1, it2, must_copy = True): Merge two sorted lists/iterators it1 and it2. Roughly equivalent to sorted(list(it2) + list(it2)), except for attempts to be space-efficient. You can provide must_copy = False if the two iterators are already lists and can be destroyed for the purpose of creating the result. # If it1 and it1 are deque objects, we don't need to # reverse them, as popping from the front is efficient. # If they are plain lists, popping from the end is # required. If they are iterators or tuples we need # to make a list version anyway. So: if must_copy: it1 = list(it1) it2 = list(it2) # Reverse sorted lists (it1 and it2 are definitely # lists now) so that we can pop off the end. it1.reverse() it2.reverse() # Now accumulate final sorted list. Basically, this is: # take first (now last) item from each list, and put whichever # one is smaller into the result. When either list runs # out, tack on the entire remaining list (whichever one is # non-empty -- if both are empty, the two extend ops are # no-ops, so we can just add both lists). # # Note that we have to re-reverse them to get # them back into forward order before extending. result = [] while it1 and it2: # Note: I don't know if it might be faster # to .pop() each item and .append() the one we # did not want to pop after all. This is just # an example, after all. last1 = it1[-1] last2 = it2[-1] if last2 last1: result.append(last2) it2.pop() else: result.append(last1) it1.pop() it1.reverse() it2.reverse() result.extend(it1) result.extend(it2) return result So, now if a is the original (sorted) list and b is the not-yet- sorted list of things to add: a = merge_sorted(a, sorted(b), must_copy = False) will work, provided you are not required to do the merge in place. Use the usual slicing trick if that is necessary: a[:] = merge_sorted(a, sorted(b), must_copy = False) If list b is already sorted, leave out the sorted() step. If list b is not sorted and is particularly long, use b.sort() to sort in place, rather than making a sorted copy. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: os.path and Path
Steven D'Aprano wrote: Why do you think there's no Path object in the standard library? *wink* In article mailman.16.1308239495.1164.python-l...@python.org Ethan Furman et...@stoneleaf.us wrote: Because I can't find one in either 2.7 nor 3.2, and every reference I've found has indicated that the other Path contenders were too all-encompassing. What I think Steven D'Aprano is suggesting here is that the general problem is too hard, and specific solutions too incomplete, to bother with. Your own specific solution might work fine for your case(s), but it is unlikely to work in general. I am not aware of any Python implementations for VMS, CMS, VM, EXEC-8, or other dinosaurs, but it would be ... interesting. Consider a typical VMS full pathname: DRA0:[SYS0.SYSCOMMON]FILE.TXT;3 The first part is the (literal) disk drive (a la MS-DOS A: or C: but slightly more general). The part in [square brackets] is the directory path. The extension (.txt) is limited to three characters, and the part after the semicolon is the file version number, so you can refer to a backup version. (Typically one would use a logical name like SYS$SYSROOT in place of the disk and/or directory-sequence, so as to paper over the overly-rigid syntax.) Compare with an EXEC-8 (now, apparently, OS 2200 -- I guess it IS still out there somewhere) file name: QUAL*FILE(cyclenumber) where cycle-numbers are relative, i.e., +0 means use the current file while +1 means create a new one and -1 means use the first backup. (However, one normally tied external file names to internal names before running a program, via the @USE statement.) The vile details are still available here: http://www.bitsavers.org/pdf/univac/1100/UE-637_1108execUG_1970.pdf (Those of you who have never had to deal with these machines, as I did in the early 1980s, should consider yourselves lucky. :-) ) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: What is the most efficient way to compare similar contents in two lists?
In article mailman.188.1307988677.11593.python-l...@python.org Chris Angelico ros...@gmail.com wrote: If order and duplicates matter, then you want a completely different diff. I wrote one a while back, but not in Python. ... If order and duplicates matter, one might want to look into difflib. :-) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Infinite recursion in __reduce__ when calling original base class reduce, why?
In article 4df669ea$0$49182$e4fe5...@news.xs4all.nl Irmen de Jong irmen.nos...@xs4all.nl wrote: I've pasted my test code below. It works fine if 'substitute' is True, but as soon as it is set to False, it is supposed to call the original __reduce__ method of the base class. However, that seems to crash because of infinite recursion on Jython and IronPython and I don't know why. It works fine in CPython and Pypy. In this particular case (no fancy inheritance going on), the base __reduce__ method would be object.__reduce__. Perhaps in those implementations, object.__reduce__ goes back to TestClass.__reduce__, rather than being appropriately magic. I wonder if my understanding of __reduce__ is wrong, or that I've hit a bug in IronPython and Jython? Do I need to do something with __reduce_ex__ as well? You should not *need* to; __reduce_ex__ is just there so that you can do something different for different versions of the pickle protocol (I believe). Nonetheless, there is something at least slightly suspicious here: import pickle class Substitute(object): def __init__(self, name): self.name=name def getname(self): return self.name class TestClass(object): def __init__(self, name): self.name=name self.substitute=True def getname(self): return self.name def __reduce__(self): if self.substitute: return Substitute, (SUBSTITUTED:+self.name,) else: # call the original __reduce__ from the base class return super(TestClass, self).__reduce__() # crashes on ironpython/jython [snip] In general, the way __reduce__ is written in other class implementations (as distributed with Python2.5 at least) boils down to the very simple: def __reduce__(self): return self.__class__, (arg, um, ents) For instance, consider a class with a piece that looks like this: def __init__(self, name, value): self.name = name self.value = value self.giant_cached_state = None def make_parrot_move(self): if self.giant_cached_state is None: self._do_lots_of_computation() return self._quickstuff_using_cache() Here, the Full Internal State is fairly long but the part that needs to be saved (or, for copy operations, copied -- but you can override this with __copy__ and __deepcopy__ members, if copying the cached state is a good idea) is quite short. Pickled instances need only save the name and value, not any of the computed cached stuff (if present). So: def __reduce__(self): return self.__class__, (name, value) If you define this (and no __copy__ and no __deepcopy__), the pickler will save the name and value and call __init__ with the name and value arguments. The copy.copy and copy.deepcopy operations will also call __init__ with these arguments (unless you add __copy__(self) and __deepcopy__(self) functions). So, it seems like in this case, you would want: def __reduce__(self): if self.substitute: return Substitute, (SUBSTITUTED:+self.name,) else: return self.__class__, (self.name,) or if you want to be paranoid and only do a Substitute if self.__class__ is your own class: if type(self) == TestClass and self.substitute: return Substitute, (SUBSTITUTED:+self.name,) else: return self.__class__, (self.name,) In CPython, if I import your code (saved in foo.py): x = foo.TestClass(janet) x foo.TestClass object at 0x66290 x.name 'janet' x.__reduce__() (class 'foo.Substitute', ('SUBSTITUTED:janet',)) x.substitute=False x.__reduce__() (function _reconstructor at 0x70bf0, (class 'foo.TestClass', type 'object', None), {'name': 'janet', 'substitute': False}) which is of course the same as: object.__reduce__(x) (function _reconstructor at 0x70bf0, (class 'foo.TestClass', type 'object', None), {'name': 'janet', 'substitute': False}) which means that CPython's object.__reduce__() uses a smart fallback reconstructor. Presumably IronPython and Jython lack this. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: parallel computations: subprocess.Popen(...).communicate()[0] does not work with multiprocessing.Pool
In article mailman.105.1307737402.11593.python-l...@python.org Hseu-Ming Chen hseum...@gmail.com wrote: I am having an issue when making a shell call from within a multiprocessing.Process(). Here is the story: i tried to parallelize the computations in 800-ish Matlab scripts and then save the results to MySQL. The non-parallel/serial version has been running fine for about 2 years. However, in the parallel version via multiprocessing that i'm working on, it appears that the Matlab scripts have never been kicked off and nothing happened with subprocess.Popen. The debug printing below does not show up either. I obviously do not have your code, and have not even tried this as an experiment in a simplified environment, but: import subprocess from multiprocessing import Pool def worker(DBrow,config): # run one Matlab script cmd1 = /usr/local/bin/matlab ... myMatlab.1.m subprocess.Popen([cmd1], shell=True, stdout=subprocess.PIPE).communicate()[0] print this does not get printed ... # kick off parallel processing pool = Pool() for DBrow in DBrows: pool.apply_async(worker,(DBrow,config)) pool.close() pool.join() The multiprocessing code makes use of pipes to communicate between the various subprocesses it creates. I suspect these extra pipes are interfering with your subprocesses, when pool.close() waits for the Matlab script to do something with its copy of the pipes. To make the subprocess module close them -- so that Matlab does not have them in the first place and hence pool.close() cannot get stuck there -- add close_fds=True to the Popen() call. There could still be issues with competing wait() and/or waitpid() calls (assuming you are using a Unix-like system, or whatever the equivalent is for Windows) eating the wrong subprocess completion notifications, but that one is harder to solve in general :-) so if close_fds fixes things, it was just the pipes. If close_fds does not fix things, you will probably need to defer the pool.close() step until after all the subprocesses complete. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 03/06/2011 03:58, Chris Torek wrote: - This is a bit surprising, since both s1 in s2 and re.search() could use a Boyer-Moore-based algorithm for a sufficiently-long fixed string, and the time required should be proportional to that needed to set up the skip table. The re.compile() gets to re-use the table every time. In article mailman.2508.1307394262.9059.python-l...@python.org Ian hobso...@gmail.com wrote: Is that true? My immediate thought is that Boyer-Moore would quickly give the number of characters to skip, but skipping them would be slow because UTF8 encoded characters are variable sized, and the string would have to be walked anyway. As I understand it, strings in python 3 are Unicode internally and (apparently) use wchar_t. Byte strings in python 3 are of course byte strings, not UTF-8 encoded. Or am I misunderstanding something. Here's python 2.7 on a Linux box: print sys.getsizeof('a'), sys.getsizeof('ab'), sys.getsizeof('abc') 38 39 40 print sys.getsizeof(u'a'), sys.getsizeof(u'ab'), sys.getsizeof(u'abc') 56 60 64 This implies that strings in Python 2.x are just byte strings (same as b... in Python 3.x) and never actually contain unicode; and unicode strings (same as ... in Python 3.x) use 4-byte characters per that box's wchar_t. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Validating string for FDQN
On Tue, Jun 7, 2011 at 3:23 PM, Nobody nob...@nowhere.com wrote: [1] If a hostname ends with a dot, it's fully qualified. [otherwise not, so you have to use the resolver] In article mailman.2521.1307425928.9059.python-l...@python.org, Chris Angelico ros...@gmail.com wrote: Outside of BIND files, when do you ever see a name that actually ends with a dot? I type them in this way sometimes, when poking at network issues. :-) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
In article ef48ad50-da06-47a8-978a-47d6f4271...@d28g2000yqf.googlegroups.com ru...@yahoo.com ru...@yahoo.com wrote (in part): [mass snippage] What I mean is that I see regexes as being an extremely small, highly restricted, domain specific language targeted specifically at describing text patterns. Thus they do that job better than than trying to describe patterns implicitly with Python code. Indeed. Kernighan has often used / supported the idea of little languages; see: http://www.princeton.edu/~hos/frs122/precis/kernighan.htm In this case, regular expressions form a little language that is quite well suited to some lexical analysis problems. Since the language is (modulo various concerns) targeted at the right level, as it were, it becomes easy (modulo various concerns :-) ) to express the desired algorithm precisely yet concisely. On the whole, this is a good thing. The trick lies in knowing when it *is* the right level, and how to use the language of REs. On 06/03/2011 08:05 PM, Steven D'Aprano wrote: If regexes were more readable, as proposed by Wall, that would go a long way to reducing my suspicion of them. Suspicion seems like an odd term here. Still, it is true that something (whether it be use of re.VERBOSE, and whitespace-and-comments, or some New and Improved Syntax) could help. Dense and complex REs are quite powerful, but may also contain and hide programming mistakes. The ability to describe what is intended -- which may differ from what is written -- is useful. As an interesting aside, even without the re.VERBOSE flag, one can build complex, yet reasonably-understandable, REs in Python, by breaking them into individual parts and giving them appropriate names. (This is also possible in perl, although the perl syntax makes it less obvious, I think.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
In article mailman.2438.1307133316.9059.python-l...@python.org Chris Angelico ros...@gmail.com wrote: Uhh, noob question here. I'm way out of my depth with hardware floating point. Isn't a signaling nan basically the same as an exception? Not exactly, but one could think of them as very similar. Elsethread, someone brought up the key distinction, which is that in hardware that implements IEEE arithmetic, you have two possibilities at pretty much all times: - op(args) causes an exception (and therefore does not deliver a result), or - op(args) delivers a result that may indicate exception-like lack of result. In both cases, a set of accrued exceptions flags accumulates the new exception, and a set of most recent exceptions flags tells you about the current exception. A set of exception enable flags -- which has all the same elements as current and accrued -- tells the hardware which exceptional results should trap. A number is NaN if it has all-1-bits for its exponent and at least one nonzero bit in its mantissa. (All-1s exponent, all-0s mantissa represents Infinity, of the sign specified by the sign bit.) For IEEE double precision floating point, there are 52 mantissa bits, so there are (2^52-1) different NaN bit patterns. One of those 52 bits is the please signal on use bit. A signalling NaN traps at (more or less -- details vary depending on FPU architecture) load time. However, there must necessarily (for OS and thread-library level context switching) be a method of saving the FPU state without causing an exception when loading a NaN bit pattern, even if the NaN has the signal bit set. Which would imply that the hardware did support exceptions (if it did indeed support IEEE floating point, which specifies signalling nan)? The actual hardware implementations (of which there are many) handle the niggling details differently. Some CPUs do not implement Infinity and NaN in hardware at all, delivering a trap to the OS on every use of an Inf-or-NaN bit pattern. The OS then has to emulate what the hardware specification says (if anything), and make it look as though the hardware did the job. Sometimes denorms are also done in software. Some implementations handle everything directly in hardware, and some of those get it wrong. :-) Often the OS has to fix up some special case -- for instance, the hardware might trap on every NaN and make software decide whether the bit pattern was a signalling NaN, and if so, whether user code should receive an exception. As I think John Nagle pointed out earlier, sometimes the hardware does support exceptions, but rather loosely, where the hardware delivers a morass of internal state and a vague indication that one or more exceptions happened somewhere near address A, leaving a huge pile of work for software. In Python, the decimal module gets everything either right or close-to-right per the (draft? final? I have not kept up with decimal FP standards) standard. Internal Python floating point, not quite so much. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
On Mon, Jun 6, 2011 at 8:54 AM, Chris Torek nos...@torek.net wrote: A signalling NaN traps at (more or less -- details vary depending on FPU architecture) load time. On Mon, 06 Jun 2011 09:13:25 +1000, Chris Angelico wrote: Load. By this you mean the operation of taking a bit-pattern in RAM and putting it into a register? So, you can calculate 0/0, get a signalling NaN, and then save that into a memory variable, all without it trapping; and then it traps when you next perform an operation on that number? I mean, if you think of the FPU as working (in principle) with either just one or two registers and a load/store architecture, or a tiny little FPU-stack (the latter is in fact the case for Intel FPUs), with no optimization, you get a trap when you attempted to load-up the sNaN value in order to do some operation on it. For instance, if x is an sNaN, y = x + 1 turns into load x; load 1.0; add; store y and the trap occurs when you do load x. In article 4dec2ba6$0$29996$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: The intended behaviour is operations on quiet NANs should return NANs, but operations on signalling NANs should cause a trap, which can either be ignored, and converted into a quiet NAN, or treated as an exception. E.g. in Decimal: import decimal qnan = decimal.Decimal('nan') # quiet NAN snan = decimal.Decimal('snan') # signalling NAN 1 + qnan Decimal('NaN') 1 + snan Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python3.1/decimal.py, line 1108, in __add__ ans = self._check_nans(other, context) File /usr/local/lib/python3.1/decimal.py, line 746, in _check_nans self) File /usr/local/lib/python3.1/decimal.py, line 3812, in _raise_error raise error(explanation) decimal.InvalidOperation: sNaN Moreover: cx = decimal.getcontext() cx Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-9, Emax=9, capitals=1, flags=[], traps=[DivisionByZero, Overflow, InvalidOperation]) cx.traps[decimal.InvalidOperation] = False snan Decimal(sNaN) 1 + snan Decimal(NaN) so as you can see, by ignoring the InvalidOperation exception, we had our sNaN converted to a (regular, non-signal-ing, quiet) NaN, and 1 + NaN is still NaN. (I admit that my mental model using loads can mislead a bit since: cx.traps[decimal.InvalidOperation] = True # restore trapping also_snan = snan A simple copy operation is not a load in this particular sense, and on most real hardware, one just uses an ordinary 64-bit integer memory-copying operation to copy FP bit patterns from one place to another.) There is some good information on wikipedia: http://en.wikipedia.org/wiki/NaN (Until I read this, I was not aware that IEEE now recommends that the quiet-vs-signal bit be 1-for-quiet 0-for-signal. I prefer the other way around since you can then set memory to all-1-bits if it contains floating point numbers, and get exceptions if you refer to a value before seting it.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiprocessing.connection magic
In article mailman.2417.1307082948.9059.python-l...@python.org Claudiu Popa cp...@bitdefender.com wrote: Hello guys, While working at a dispatcher using multiprocessing.connection.Listener module I've stumbled upon some sortof magic trick that amazed me. How is this possible and what does multiprocessing library doing in background for this to work? Most of Python's sharing routines (including multiprocessing send, in this case) use the pickle routines to package data for transport between processes. Thus, you can see the magic pretty simply: Client, Python 2.6 from multiprocessing.connection import Client client = Client((localhost, 8080)) import shutil client.send(shutil.copy) Here I just use pickle.dumps() to return (and print, since we are in the interpreter) the string representation that client.send() will send: import pickle import shutil pickle.dumps(shutil.copy) 'cshutil\ncopy\np0\n.' Server, 3.2 from multiprocessing.connection import Listener listener = Listener((localhost, 8080)) con = listener.accept() data = con.recv() data function copy at 0x024611E0 help(data) Help on function copy in module shutil: [snip] On this end, the (different) version of python simply unpickles the byte stream. Starting a new python session (to get rid of any previous imports): $ python ... import pickle pickle.loads('cshutil\ncopy\np0\n.') function copy at 0x86ef0 help(_) Help on function copy in module shutil: ... The real magic is in the unpickler, which has figured out how to access shutil.copy without importing shutil into the global namespace: shutil Traceback (most recent call last): File stdin, line 1, in module NameError: name 'shutil' is not defined but we can expose that magic as well, by feeding pickle.loads() a bad string: pickle.loads('cNotAModule\nfunc\np0\n.') Traceback (most recent call last): File stdin, line 1, in module File /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py, line 1374, in loads return Unpickler(file).load() File /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py, line 858, in load dispatch[key](self) File /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py, line 1090, in load_global klass = self.find_class(module, name) File /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/pickle.py, line 1124, in find_class __import__(module) ImportError: No module named NotAModule Note the rather total lack of security here -- in the receiver, by doing con.recv(), you are trusting the sender not to send you a dangerous or invalid pickle-data-stream. This is why the documentation includes the following: Warning: The Connection.recv() method automatically unpickles the data it receives, which can be a security risk unless you can trust the process which sent the message. Therefore, unless the connection object was produced using Pipe() you should only use the recv() and send() methods after performing some sort of authentication. See Authentication keys. (i.e., do that :-) -- see the associated section on authentication) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
On 2011-06-02, Nobody nob...@nowhere.com wrote: (I note that Python actually raises an exception for 0.0/0.0). In article isasfm$inl$1...@reader1.panix.com Grant Edwards invalid@invalid.invalid wrote: IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN. Pythons claims it implements IEEE-754. Python got it wrong. Indeed -- or at least, inconsistent. (Again I would not mind at all if Python had raise exception on NaN-result mode *as well as* quietly make NaN, perhaps using signalling vs quiet NaN to tell them apart in most cases, plus some sort of floating-point context control, for instance.) Also, note that the convenience of NaN (e.g. not propagating from the untaken branch of a conditional) is only available for floating-point types. If it's such a good idea, why don't we have it for other types? Mostly because for integers it's too late and there is no standard for it. For others, well: import decimal decimal.Decimal('nan') Decimal(NaN) _ + 1 Decimal(NaN) decimal.setcontext(decimal.ExtendedContext) print decimal.Decimal(1) / 0 Infinity [etc] (Note that you have to set the decimal context to one that does not produce a zero-divide exception, such as the pre-loaded decimal.ExtendedContext. On my one Python 2.7 system -- all the rest are earlier versions, with 2.5 the highest I can count on, and that only by upgrading it on the really old work systems -- I note that fractions.Fraction(0,0) raises a ZeroDivisionError, and there is no fractions.ExtendedContext or similar.) The definition is entirely arbitrary. I don't agree, but even if was entirely arbitrary, that doesn't make the decision meaningless. IEEE-754 says it's True, and standards compliance is valuable. Each country's decision to drive on the right/left side of the road is entire arbitrary, but once decided there's a huge benefit to everybody following the rule. This analogy perhaps works better than expected. Whenever I swap between Oz or NZ and the US-of-A, I have a brief mental clash that, if I am not careful, could result in various bad things. :-) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
On 2011-06-03, ru...@yahoo.com ru...@yahoo.com wrote: [prefers] re.split ('[ ,]', source) This is probably not what you want in dealing with human-created text: re.split('[ ,]', 'foo bar, spam,maps') ['foo', '', 'bar', '', 'spam', 'maps'] Instead, you probably want a comma followed by zero or more spaces; or, one or more spaces: re.split(r',\s*|\s+', 'foo bar, spam,maps') ['foo', 'bar', 'spam', 'maps'] or perhaps (depending on how you want to treat multiple adjacent commas) even this: re.split(r',+\s*|\s+', 'foo bar, spam,maps,, eggs') ['foo', 'bar', 'spam', 'maps', 'eggs'] although eventually you might want to just give in and use the csv module. :-) (Especially if you want to be able to quote commas, for instance.) ... With regexes the code is likely to be less brittle than a dozen or more lines of mixed string functions, indexes, and conditionals. In article 94svm4fe7...@mid.individual.net Neil Cerutti ne...@norwich.edu wrote: [lots of snippage] That is the opposite of my experience, but YMMV. I suspect it depends on how familiar the user is with regular expressions, their abilities, and their limitations. People relatively new to REs always seem to want to use them to count (to balance parentheses, for instance). People who have gone through the compiler course know better. :-) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
In article 94ph22frh...@mid.individual.net Neil Cerutti ne...@norwich.edu wrote: Python's str methods, when they're sufficent, are usually more efficient. In article roy-e2fa6f.21571602062...@news.panix.com Roy Smith r...@panix.com replied: I was all set to say, prove it! when I decided to try an experiment. Much to my surprise, for at least one common case, this is indeed correct. [big snip] t1 = timeit.Timer('laoreet' in text, text = '%s' % text) t2 = timeit.Timer(pattern.search(text), import re; pattern = re.compile('laoreet'); text = '%s' % text) print t1.timeit() print t2.timeit() - ./contains.py 0.990975856781 1.91417002678 - This is a bit surprising, since both s1 in s2 and re.search() could use a Boyer-Moore-based algorithm for a sufficiently-long fixed string, and the time required should be proportional to that needed to set up the skip table. The re.compile() gets to re-use the table every time. (I suppose in could as well, with some sort of cache of recently-built tables.) Boyer-Moore search is roughly O(M/N) where M is the length of the text being searched and N is the length of the string being sought. (However, it depends on the form of the string, e.g., searching for ababa is not as good as searching for abcde.) Python might be penalized by its use of Unicode here, since a Boyer-Moore table for a full 16-bit Unicode string would need 65536 entries (one per possible ord() value). However, if the string being sought is all single-byte values, a 256-element table suffices; re.compile(), at least, could scan the pattern and choose an appropriate underlying search algorithm. There is an interesting article here as well: http://effbot.org/zone/stringlib.htm -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: how to avoid leading white spaces
In article is9ikg0...@news1.newsguy.com, Chris Torek nos...@torek.net wrote: Python might be penalized by its use of Unicode here, since a Boyer-Moore table for a full 16-bit Unicode string would need 65536 entries (one per possible ord() value). In article roy-751fac.23443902062...@news.panix.com Roy Smith r...@panix.com wrote: I'm not sure what you mean by full 16-bit Unicode string? Isn't unicode inherently 32 bit? Well, not exactly. As I understand it, Python is normally built with a 16-bit unicode character type though (using either UCS-2 or UTF-16 internally; but I admit I have been far too lazy to look up stuff like surrogates here :-) ). In any case, while I could imagine building a 2^16 entry jump table, clearly it's infeasible (with today's hardware) to build a 2^32 entry table. But, there's nothing that really requires you to build a table at all. If I understand the algorithm right, all that's really required is that you can map a character to a shift value. Right. See the URL I included for an example. The point here, though, is ... well: For an 8 bit character set, an indexed jump table makes sense. For a larger character set, I would imagine you would do some heuristic pre-processing to see if your search string consisted only of characters in one unicode plane and use that fact to build a table which only indexes that plane. Or, maybe use a hash table instead of a regular indexed table. Just so. You have to pay for one scan through the string to build a hash-table of offsets -- an expense similar to that for building the 256-entry 8-bit table, perhaps, depending on string length -- but then you pay again for each character looked-at, since: skip = hashed_lookup(table, this_char); is a more complex operation than: skip = table[this_char]; (where table is a simple array, hence the C-style semicolons: this is not Python pseudo-code :-) ). Hence, a penalty. Not as fast, but only slower by a small constant factor, which is not a horrendous price to pay in a fully i18n world :-) Indeed. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Updated blog post on how to use super()
Summary: super(cls, data) in a method gets you the next handler for a given class cls and an instance data that has derived from that class at some point. In Python 2 you must spell out the names of the class and instance (normally self) explicitly, while Python 3 grabs, at compile time, the class from the lexically enclosing class, and the instance from the first argument of the method that invokes super. The next handler depends on the instance's __mro__. If all your classes use at most single inheritance, the next handler in class Cls1 is easy to predict: class Cls1(Cls2): Any instance of Cls1 always has Cls2 as its next, so: def method(self, arg1, arg2): ... Cls2.method(self, arg1_mutated, arg2_mutated) ... works fine. But if you use multiple inheritance, the next method is much harder to predict. If you have a working super, you can use: super().method(self, arg1_mutated, arg2_mutated) and it will find the correct next method in all cases. In article is5qd7$t5b$1...@speranza.aioe.org Billy Mays no...@nohow.com wrote: What it does is clear to me, but why is it interesting or special isn't. This looks like a small feature that would be useful in a handful of cases. Indeed: it is useful when you have multiple inheritance, which for most programmers, is a handful of cases. However, provided you *have* the Py3k super() in the first place, it is also trivial and obviously-correct to write: super().method(...) whereas writing: NextClass.method(...) requires going up to the class definition to make sure that NextClass is indeed the next class, and hence -- while usually no more difficult to write -- less obviously-correct. Moreover, if you write the easy-to-write obviously-correct super().method, *your* class may now be ready for someone else to use in a multiple-inheritance (MI) situation. If you type in the not-as-obviously-correct NextClass.method, *your* class is definitely *not* ready for someone else to use in that MI situation. (I say may be ready for MI, because being fully MI ready requires several other code discipline steps. The point of super() -- at least when implemented nicely, as in Py3k -- is that it makes it easy -- one might even say super easy :-) -- to write your code such that it is obviously correct, and also MI-friendly.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
Carl Banks wrote: For instance, say you are using an implementation that uses floating point, and you define a function that uses Newton's method to find a square root: def square_root(N,x=None): if x is None: x = N/2 for i in range(100): x = (x + N/x)/2 return x It works pretty well on your floating-point implementation. Now try running it on an implementation that uses fractions by default (Seriously, try running this function with N as a Fraction.) In article mailman.2376.1306950997.9059.python-l...@python.org Ethan Furman et...@stoneleaf.us wrote: Okay, will this thing ever stop? It's been running for 90 minutes now. Is it just incredibly slow? The numerator and denominator get very big, very fast. Try adding a bit of tracing: for i in range(100): x = (x + N/x) / 2 print 'refinement %d: %s' % (i + 1, x) and lo: square_root(fractions.Fraction(5,2)) refinement 1: 13/8 refinement 2: 329/208 refinement 3: 216401/136864 refinement 4: 93658779041/59235012928 refinement 5: 17543933782901678712641/11095757974628660884096 refinement 6: 615579225157677613558476890352854841917537921/389326486355976942712506162834130868382115072 refinement 7: 757875564891453502666431245010274191070178420221753088072252795554063820074969259096915201/479322593608746863553102599134385944371903608931825380820104910630730251583028097491290624 refinement 8: 1148750743719079498041767029550032831122597958315559446437317334336105389279028846671983328007126798344663678217310478873245910031311232679502892062001786881913873645733507260643841/726533762792931259056428876869998002853417255598937481942581984634876784602422528475337271599486688624425675701640856472886826490140251395415648899156864835350466583887285148750848 In the worst case, the number of digits in numerator and denominator could double on each pass, so if you start with 1 digit in each, you end with 2**100 in each. (You will run out of memory first unless you have a machine with more than 64 bits of address space. :-) ) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Unshelving the data?
In article 4433955b-7f54-400a-af08-1f58a75e7...@j31g2000yqe.googlegroups.com Uncle Ben bgr...@nycap.rr.com wrote: Shelving is a wonderfully simple way to get keyed access to a store of items. I'd like to maintain this cache though. Is there any way to remove a shelved key once it is hashed into the system? $ pydoc shelve ... To summarize the interface (key is a string, data is an arbitrary object): ... d[key] = data # store data at key (overwrites old data if # using an existing key) data = d[key] # retrieve a COPY of the data at key (raise # KeyError if no such key) -- NOTE that this # access returns a *copy* of the entry! del d[key] # delete data stored at key (raises KeyError # if no such key) ... Seems pretty straightforward. :-) Are you having some sort of problem with del? -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
In article 4de3358b$0$29990$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: Better than a float method is a function which takes any number as argument: import math, fractions, decimal math.isnan(fractions.Fraction(2, 3)) False math.isnan(decimal.Decimal('nan')) True Ah, apparently someone's been using Larry Wall's time machine. :-) I should have looked at documentation. In my case, though: $ python Python 2.5.1 (r251:54863, Dec 16 2010, 14:12:43) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type help, copyright, credits or license for more information. import math math.isnan Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'module' object has no attribute 'isnan' You can even handle complex NANs with the cmath module: import cmath cmath.isnan(complex(1, float('nan'))) True Would it be appropriate to have isnan() methods for Fraction, Decimal, and complex, so that you do not need to worry about whether to use math.isnan() vs cmath.isnan()? (I almost never work with complex numbers so am not sure if the or behavior -- cmath.isinf and cmath.isnan return true if either real or complex part are Infinity or NaN respectively -- is appropriate in algorithms that might be working on any of these types of numbers.) It might also be appropriate to have trivial always-False isinf and isnan methods for integers. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How to catch a line with Popen
Chris Torek wrote: In at least some versions of Python 2 [the file-type object iterators behave badly with pipes] (This may still be true in Python 3, I just have no experience with Py3k. At least some version of Python 2 means the ones I have access to, and have tried. :-) ) In article is0d44$d7m$1...@speranza.aioe.org TheSaint nob...@nowhere.net.no wrote: I'm with P3k :P. However thank you for your guidelines. Last my attempt was to use a *for* p.wait() , as mentioned earlier If you have a process that has not yet terminated and that you must stop from your own python program, calling the wait() method will wait forever (because you are now waiting for yourself, in effect -- waiting for yourself to terminate the other process). The only time to call p.wait() (or p.communicate(), which itself calls the wait() method) is when you believe the subprocess is on its wao to terminating -- in this case, after you force it to do so. That looks good enough. I noted some little delay for the first lines, mostly sure Popen assign some buffer even it is not set. According to the documentation, the default buffer size of Python 2 is 0, which is passed to fdopen() and makes the resulting files unbuffered. I recall some sort of changes discussed for Py3k though. Haven't you try a perpetual ping, how would be the line_at_a_time ? Since it is a generator that only requests another line when called, it should be fine. (Compare to the itertools cycle and repeat generators, for instance, which return an infinite sequence.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Error: child process close a socket inherited from parent
In article slrniu42cm.2s8.narkewo...@cnzuhnb904.ap.bm.net narke narkewo...@gmail.com wrote: As illustrated in the following simple sample: import sys import os import socket class Server: def __init__(self): self._listen_sock = None def _talk_to_client(self, conn, addr): text = 'The brown fox jumps over the lazy dog.\n' while True: conn.send(text) data = conn.recv(1024) if not data: break conn.close() def listen(self, port): self._listen_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) self._listen_sock.bind(('', port)) self._listen_sock.listen(128) self._wait_conn() def _wait_conn(self): while True: conn, addr = self._listen_sock.accept() if os.fork() == 0: self._listen_sock.close() # line x self._talk_to_client(conn, addr) else: conn.close() if __name__ == '__main__': Server().listen(int(sys.argv[1])) Unless I comment out the line x, I will get a 'Bad file descriptor' error when my tcp client program (e.g, telnet) closes the connection to the server. But as I understood, a child process can close a unused socket (file descriptor). It can. Do you know what's wrong here? The problem turns out to be fairly simple. The routine listen() forks, and the parent process (with nonzero pid) goes into the else branch of _wait_conn(), hence closes the newly accepted socket and goes back to waiting on the accept() call, which is all just fine. Meanwhile, the child (with pid == 0) calls close() on the listening socket and then calls self._talk_to_client(). What happens when the client is done and closes his end? Well, take a look at the code in _talk_to_client(): it reaches the if not data clause and breaks out of its loop, and calls close() on the accepted socket ... and then returns to its caller, which is _wait_conn(). What does _wait_conn() do next? It has finished if branch in the while True: loops, so it must skip the else branch and go around the loop again. Which means its very next operation is to call accept() on the listening socket it closed just before it called self._talk_to_client(). If that socket is closed, you get an EBADF error raised. If not, the child and parent compete for the next incoming connection. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
Incidentally, note: $ python ... nan = float(nan) nan nan nan is nan True nan == nan False In article 4de1e3e7$0$2195$742ec...@news.sonic.net John Nagle na...@animats.com wrote: The correct answer to nan == nan is to raise an exception, because you have asked a question for which the answer is nether True nor False. Well, in some sense, the correct answer depends on which question you *meant* to ask. :-) Seriously, some (many?) instruction sets have two kinds of comparison instructions: one that raises an exception here, and one that does not. The correct semantics for IEEE floating point look something like this: 1/0 INF INF + 1 INF INF - INF NaN INF == INF unordered NaN == NaN unordered INF and NaN both have comparison semantics which return unordered. The FPU sets a bit for this, which most language implementations ignore. Again, this depends on the implementation. This is similar to (e.g.) the fact that on the MIPS, there are two different integer add instructions (addi and addiu): one raises an overflow exception, the other performs C unsigned style arithmetic (where, e.g., 0x + 1 = 0, in 32 bits). Python should raise an exception on unordered comparisons. Given that the language handles integer overflow by going to arbitrary-precision integers, checking the FPU status bits is cheap. I could go for that myself. But then you also need a don't raise exception but give me an equality test result operator (for various special-case purposes at least) too. Of course a simple classify this float as one of normal, subnormal, zero, infinity, or NaN operator would suffice here (along with the usual extract sign and differentiate between quiet and signalling NaN operations). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How to catch a line with Popen
In article irtj2o$h0m$1...@speranza.aioe.org TheSaint nob...@nowhere.net.no wrote: Chris Rebert wrote: I just suppose to elaborate the latest line, as soon it's written on the pipe, and print some result on the screen. Imaging something like p= Popen(['ping','-c40','www.google.com'], stdout=PIPE) for line in p.stdout: print(str(line).split()[7]) I'd like to see something like *time=54.4* This is just an example, where if we remove the -c40 on the command line, I'd expect to read the latest line(s), until the program will be killed. In at least some versions of Python 2, file-like object next iterators do not work right with unbuffered (or line-buffered) pipe-file-objects. (This may or may not be fixed in Python 3.) A simple workaround is a little generator using readline(): def line_at_a_time(fileobj): Return one line at a time from a file-like object. Works around the iter behavior of pipe files in Python 2.x, e.g., instead of for line in file you can write for line in line_at_a_time(file). while True: line = fileobj.readline() if not line: return yield line Adding this to your sample code gives something that works for me, provided I fiddle with it to make sure that the only lines examined are those with actual ping times: p = subprocess.Popen([ping, -c5, www.google.com], stdout = subprocess.PIPE) for lineno, line in enumerate(line_at_a_time(p.stdout)): if 1 = lineno = 5: print line.split()[6] else: print line.rstrip('\n') p.wait() # discard final result (Presumably the enumerate() trick would not be needed in whatever you really use.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: portable way of sending notifying a process
In article 4de183e7$0$26108$426a7...@news.free.fr News123 news1...@free.fr wrote: I'm looking for a portable way (windows XP / Windows Vista and Linux ) to send a signal from any python script to another one (one signa would be enough) This turns out to be pretty hard to do reliably-and-securely even *without* crossing the Windows / Linux barrier. It seems, that neither the signals HUP / USR1 are implemented under windows. Signals are also very messy and easy to get wrong on Unix, with earlier Python versions missing a few key items to make them entirely reliable (such as the sigblock/sigsetmask/sigpause suite, and/or setting interrupt-vs-resume behavior on system calls). What would be a light weight portable way, that one process can tell another to do something? The main requirement would be to have no CPU impact while waiting (thus no polling) Your best bet here is probably to use sockets. Both systems have ways to create service sockets and to connect to a socket as a client. Of course, making these secure can be difficult: you must decide what sort of attack(s) could occur and how much effort to put into defending against them. (For instance, even if there is only a wake up, I have done something you should look at signal that you can transmit by connecting to a server and then closing the connection, what happens if someone inside or outside your local network decides to repeatedly poke that port in the hopes of causing a Denial of Service by making the server use lots of CPU time?) If nothing exists I might just write a wrapper around pyinotify and (Tim Goldens code snippet allowing to watch a directory for file changes) http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html and use a marker file in a marker directory but I wanted to be sure of not reinventing the wheel. It really sounds like you are writing client/server code in which the client writes a file into a queue directory. In this case, that may be the way to go -- or you could structure it as an actual client and server, i.e., the client connects to the server and writes the request directly (but then you have to decide about security considerations, which the OS's local file system may provide more directly). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
In article 4de31635$0$29990$c3e8da3$54964...@news.astraweb.com, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: That's also completely wrong. The correct way to test for a NAN is with the IEEE-mandated function isnan(). The NAN != NAN trick is exactly that, a trick, used by programmers when their language or compiler doesn't support isnan(). Perhaps it would be reasonable to be able to do: x.isnan() when x is a float. Without support for isinf(), identifying an INF is just as hard as identifying an NAN, and yet their behaviour under equality is the complete opposite: inf = float('inf') inf == inf True Fortunately: def isnan(x): return x != x _inf = float(inf) def isinf(x): return x == _inf del _inf both do the trick here. I would like to have both modes (non-exception-ing and exception-ing) of IEEE-style float available in Python, and am not too picky about how they would be implemented or which one would be the default. Python could also paper over the brokenness of various actual implementations (where signalling vs quiet NaNs, and so on, do not quite work right in all cases), with some performance penalty on non-conformant hardware. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: float(nan) in set or as key
In article irv6ev01...@news1.newsguy.com I wrote, in part: _inf = float(inf) def isinf(x): return x == _inf del _inf Oops, take out the del, or otherwise fix the obvious problem, e.g., perhaps: def isinf(x): return x == isinf._inf isinf._inf = float(inf) (Of course, if something like this were adopted properly, it would all be in the base float type anyway.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Code Review
: find DIRLIST -ctime +N ( -type d -o -type f ) -exec rm -rf {} \; but can also a great deal more since (a) it has many other options than just -ctime, and (b) -exec will execute any arbitrary command. --- import os import time import shutil import argparse import sys def main(): main program: parse arguments, and clean out directories. parser = argparse.ArgumentParser( description=Delete files and folders in a directory N days old, prog=directorycleaner) parser.add_argument(days, type=int, help=Numeric value: delete files and folders older than N days) parser.add_argument(directory, nargs=+, help=delete files and folders in this directory) args = parser.parse_args() for dirname in args.directory: clean_dir(dirname, args.days) def clean_dir(dirname, n_days): Clean one directory of files / subdirectories older than the given number of days. time_to_live = n_days * 86400 # 86400 = seconds-per-day current_time = time.time() try: contents = os.listdir(dirname) except OSError, err: sys.exit(can't read %s: %s % (dirname, err)) for filename in contents: # Get the path of the file name path = os.path.join(dirname, filename) # Get the creation time of the file # NOTE: this only works on Windows-like systems when_created = os.path.getctime(path) # If the file/directory has expired, remove it if when_created + time_to_live current_time: if os.path.isfile(path): print os.remove(%s) % path # It is not a file it is a directory elif os.path.isdir(path): print shutil.rmtree(%s) % path if __name__ == __main__: main() -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Condition.wait(timeout) oddities
In article 94d1d127-b423-4bd4-853c-d92da9ac7...@glegroupsg2000goo.googlegroups.com Floris Bruynooghe comp.lang.pyt...@googlegroups.com wrote: I'm a little confused about the corner cases of Condition.wait() with a timeout parameter in the threading module. When looking at the code the first thing that I don't quite get is that the timeout should never work as far as I understand it. .wait() always needs to return while holding the lock, therefore it does an .acquire() on the lock in a finally clause. Thus pretty much ignoring the timeout value. It does not do a straight acquire, it uses self._acquire_restore(), which for a condition variable, does instead: self.__block.acquire() self.__count = count self.__owner = owner (assuming that you did not override the lock argument or passed in a threading.RLock() object as the lock), due to this bit of code in _Condition.__init__(): # If the lock defines _release_save() and/or _acquire_restore(), # these override the default implementations (which just call # release() and acquire() on the lock). Ditto for _is_owned(). [snippage] try: self._acquire_restore = lock._acquire_restore except AttributeError: pass That is, lock it holds is the one on the blocking lock (the __block of the underlying RLock), which is the same one you had to hold in the first place to call the .wait() function. To put it another way, the lock that .wait() waits for is a new lock allocated for the duration of the .wait() operation: waiter = _allocate_lock() waiter.acquire() self.__waiters.append(waiter) saved_state = self._release_save() here we wait for lock waiter, with timeout self._acquire_restore(saved_state) # the last stmt is the finally clause, I've just un-indented it which is entirely different from the lock that .wait() re-acquires (and which you held when you called .wait() initially) before it returns. The second issue is that while looking around for this I found two bug reports: http://bugs.python.org/issue1175933 and http://bugs.python.org/issue10218. Both are proposing to add a return value indicating whether the .wait() timed out or not similar to the other .wait() methods in threading. However the first was rejected after some (seemingly inconclusive) discussion. Tim Peters' reply seemed pretty conclusive to me. :-) While the latter had minimal discussion and and was accepted without reference to the earlier attempt. Not sure if this was a process oversight or what, but it does leave the situation confusing. But regardless I don't understand how the return value can be used currently: yes you did time out but you're still promised to hold the lock thanks to the .acquire() call on the lock in the finally block. The return value is not generally useful for the reasons Tim Peters noted originally. Those are all still true even in the second discussion. In my small brain I just can't figure out how Condition.wait() can both respect a timeout parameter and the promise to hold the lock on return. Remember, two different locks. :-) There is a lock on the state of the condition variable itself, and then there is a lock on which one actually waits. On both entry to and return from .wait(), you (the caller) hold the lock on the state of the condition variable, so you may inspect it and proceed based on the result. In between, you give up that lock, so that other threads may obtain it and change the state of the condition variable. It seems to me that the only way to handle the timeout is to raise an exception rather then return a value because when you get an exception you can break the promise of holding the lock. That *would* be a valid way to implement a timeout -- to return with the condition variable lock itself no longer held -- but that would require changing lots of other code structure. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Converting a set into list
Chris Torek nos...@torek.net wrote: x = [3, 1, 4, 1, 5, 9, 2, 6] list(set(x)) This might not be the best example since the result is sorted by accident, while other list(set(...)) results are not. In article Xns9EE772D313153duncanbooth@127.0.0.1, Duncan Booth duncan.bo...@suttoncourtenay.org.uk wrote: A minor change to your example makes it out of order even for integers: x = [7, 8, 9, 1, 4, 1] list(set(x)) [8, 9, 1, 4, 7] or for that mattter: list(set([3, 32, 4, 32, 5, 9, 2, 6])) [32, 2, 3, 4, 5, 6, 9] Yes, but then it is no longer as easy as pi. :-) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Converting a set into list
In article 871v00j2bh@benfinney.id.au Ben Finney ben+pyt...@benfinney.id.au wrote: As pointed out: you already know how to create a set from an object; creating a list from an object is very similar: list(set(aa)) But why are you doing that? What are you trying to achieve? I have no idea why someone *else* is doing that, but I have used this very expression to unique-ize a list: x = [3, 1, 4, 1, 5, 9, 2, 6] x [3, 1, 4, 1, 5, 9, 2, 6] list(set(x)) [1, 2, 3, 4, 5, 6, 9] Of course, this trick only works if all the list elements are hashable. This might not be the best example since the result is sorted by accident, while other list(set(...)) results are not. Add sorted() or .sort() if needed: x = ['three', 'one', 'four', 'one', 'five'] x ['three', 'one', 'four', 'one', 'five'] list(set(x)) ['four', 'five', 'three', 'one'] sorted(list(set(x))) ['five', 'four', 'one', 'three'] -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: checking if a list is empty
In article 4dcab8bf$0$29980$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: When you call len(x) you don't care about the details of how to calculate the length of x. The object itself knows so that you don't have to. The same applies to truth testing. I have a data type that is an array of lists. When you call if len(x) 0 on it, it will blow up in your face, because len(x) returns a list of lengths like [12, 0, 2, 5]. But if you say if x, it will do the right thing. You don't need to care how to truth-test my data type, because it does it for you. By ignoring my type's interface, and insisting on doing the truth-test by hand, you shoot yourself in the foot. What this really points out is that if x and if len(x) 0 are *different tests*. Consider xml.etree.ElementTree Element objects. The documentation says, in part: In ElementTree 1.2 and earlier, the sequence behavior means that an element without any subelements tests as false (since it's an empty sequence), even if it contains text and attributions. ... Note: This behavior is likely to change somewhat in ElementTree 1.3. To write code that is compatible in both directions, use ... len(element) to test for non-empty elements. In this case, when x is an Element, the result of bool(x) *could* mean just x has sub-elements, but it could also/instead mean x has sub-elements, text, or attributions. The issue raised at the beginning of this thread was: which test is better when x is a list, the test that invokes bool(x), or the test that invokes len(x)? There is no answer to that, any more than there is to which ice cream flavor is best. [%] A more interesting question to ask, in any given bit of code, is whether bool(x) or len(x) is more appropriate for *all* the types x might take at that point, rather than whether one or the other is better for lists, where the result is defined as equivalent. (The biggest problem with answering that tends to be deciding what types x might take.) [% Chocolate with raspberry, or mint, or similar.] -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: checking if a list is empty
In article 4dc6a39a$0$29991$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: In English, [the word not] negates a word or statement: the cat is not on the mat -- the cat is on the mat is false. As a mostly off topic aside, English is considerably more complicated than that. There are people who use the word not as a purely boolean operator (a la computer languages), so that the cat is not not on the mat means the cat IS on the mat, but others use double negation as a form of intensifier, so that the phrase with multiple nots is simply a more emphatic claim: the cat is really, truly, *definitely*, not on that particular mat. :-) In various other natural languages -- i.e., languages meant for human-to-human communications, rather than for computers -- multiple negatives are more often (or always?) intensifiers. Some languages have the idea of negative matching in much the same sense that English has number [%] matching: the cat is on the mat and the cats are on the mat are OK because the noun and verb numbers match, but neither the cats is on the mat nor the cat are on the mat are correct. [% Number here is really 1 vs not-1: no cats, one cat, two cats.] Of course, there are descriptivists and prescriptivists, and many of the latter claim that using multi-valued boolean logic in English is nonstandard or invalid. Many of those in turn will tell you that ain't good English ain't good English. Still, one should be aware of these forms and their uses, in much the same way as one should be able to boldly split infinitives. :-) Moving back towards on-topic-ness: As an operator, not negates a true value to a false value. In mathematical Boolean algebra, there only is one true value and one false value, conventionally called True/False or 1/0. In non-Boolean algebras, you can define other values. In three-value logic, the negation of True/ False/Maybe is usually False/True/Maybe. In fuzzy logic, the logic values are the uncountable infinity (that's a technical term, not hyperbole) of real numbers between 0 and 1. Or, to put it another way, before we can communicate clearly, we have to pick out a set of rules. Most computer languages do this pretty well, and Python does a good (and reasonably conventional) job: Python uses a boolean algebra where there are many ways of spelling the true and false values. The not operator returns the canonical bool values: not any true value returns False not any false value returns True Take note of the distinction between lower-case true/false, which are adjectives, and True/False, which are objects of class bool. (At least as of current versions of Python -- in much older versions there was no real distinction between booleans and type int, presumably a a holdover from C.) [remainder snipped as I have nothing else to add] -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: What other languages use the same data model as Python?
John Nagle wrote: A reasonable compromise would be that is is treated as == on immutable objects. (Note: I have no dog in this fight, I would be happy with a changed is or with the current one -- leaky abstractions are fine with me, provided I am told *when* they may -- or sometimes may not -- leak. :-) ) On 5/5/2011 3:06 AM, Gregory Ewing wrote: That wouldn't work for tuples, which can contain references to other objects that are not immutable. On Thu, May 5, 2011 at 9:41 AM, John Nagle na...@animats.com wrote: Such tuples are still identical, even if they contain identical references to immutable objects. In article mailman.1196.1304613911.9059.python-l...@python.org Ian Kelly ian.g.ke...@gmail.com wrote: a = (1, 2, [3, 4, 5]) b = (1, 2, [3, 4, 5]) a == b True a is b # Using the proposed definition True I believe that John Nagle's proposal would make a is b false, because while a and b are both immutable, they contain *different* refernces to *mutable* objects (thus failing the identical references to immutable objects part of the claim). On the other hand, should one do: L = [3, 4, 5] a = (1, 2, L) b = (1, 2, L) then a is b should (I say) be True under the proposal -- even though they contain (identical) references to *mutable* objects. Loosely speaking, we would define the is relation as: (x is y) if and only if (id(x) == id(y) or (x is immutable and y is immutable and (for all components xi and yi of x, xi is yi))) In this case, even if the tuples a and b have different id()s, we would find that both have an immutable type, and both have components -- in this case, numbered, subscriptable tuple elements, but instances of immutable class types like decimal.Decimal would have dictionaries instead -- and thus we would recursively apply the modified is definition to each element. (For tuples, the all components implies that the lengths must be equal; for class instances, it implies that they need to have is-equal attributes, etc.) It's not entirely clear to me whether different immutable classes (i.e., different types) but with identical everything-else should compare equal under this modified is. I.e., today: $ cp /usr/lib/python2.?/decimal.py /tmp/deccopy.py $ python ... sys.path.append('/tmp') import decimal import deccopy x = decimal.Decimal('1') y = deccopy.Decimal('1') print x, y 1 1 x == y False and obviously x is y is currently False: type(x) class 'decimal.Decimal' type(y) class 'deccopy.Decimal' However, even though the types differ, both x and y are immutable [%] and obviously (because I copied the code) they have all the same operations. Since they were both created with the same starting value, x and y will behave identically given identical treatment. As such, it might be reasonable to ask that x is y be True rather than False. [% This is not at all obvious -- I have written an immutable class, and it is pretty easy to accidentally mutate an instance inside the class implementation. There is nothing to prevent this in CPython, at least. If there were a minor bug in the decimal.Decimal code such that x.invoke_bug() modified x, then x would *not* be immutable, even though it is intended to be. (As far as I know there are no such bugs in decimal.Decimal, it's just that I had them in my Money class.)] -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: What other languages use the same data model as Python?
In article GOmwp.13554$vp.9...@newsfe14.iad harrismh777 harrismh...@charter.net wrote: There may be some language somewhere that does pass-by-reference which is not implemented under the hood as pointers, but I can't think of any... 'cause like I've been saying, way down under the hood, we only have direct and indirect memory addressing in today's processors. EOS. There have been Fortran compilers that implemented modification of variables via value-result rather than by-reference. This is perhaps best illustrated by some code fragments: SUBROUTINE FOO(X, Y) INTEGER X, Y ... X = 3 Y = 4 RETURN SUBROUTINE BAR(A) FOO(A, 0) RETURN might compile to the equivalent of the following C code: void foo(int *x0, int *y0) { int x = *x0, y = *y0; ... *x0 = x; *y0 = y; } void bar(int *a0) { int a = *a0; int temp = 0; foo(a, temp); *a0 = a; } In order to allow both by-reference and value-result, Fortran forbids the programmer to peek at the machinery. That is, the following complete program is invalid: SUBROUTINE PEEK(X) INTEGER X, GOTCHA COMMON /BLOCK/ GOTCHA PRINT *, 'INITIALLY GOTCHA = ', GOTCHA X = 4 PRINT *, 'AFTER X=4 GOTCHA = ', GOTCHA RETURN PROGRAM MAIN INTEGER GOTCHA COMMON /BLOCK/ GOTCHA GOTCHA = 3 PEEK(GOTCHA) PRINT *, 'FINALLY GOTCHA = ', GOTCHA STOP END (It has been so long since I used Fortran that the above may not be quite right in ways other than the one intended. Please forgive small errors. :-) ) The trick in subroutine peek is that it refers to both a global variable (in Fortran, simulated with a common block) and a dummy variable (as it is termed in Fortran) -- the parameter that aliases the global variable -- in such a way that we can see *when* the change happens. If gotcha starts out set to 3, remains 3 after assignment to x, and changes to 4 after peek() returns, then peek() effectively used value-result to change the parameter. If, on the other hand, gotcha became 4 immediately after the assignment to x, then peek() effectively used by-reference. The key take-away here is not so much the trick by which we peeked inside the implementation (although peeking *is* useful in solving the murder mystery we have after some program aborts with a core-dump or what-have-you), but rather the fact that the Fortran language proper forbids us from peeking at all. By forbidding it -- by making the program illegal -- the language provide implementors the freedom to use *either* by-reference or value-result. All valid Fortran programs behave identically under either kind of implementation. Like it or not, Python has similar defined as undefined grey areas: one is not promised, for instance, whether the is operator is always True for small integers that are equal (although it is in CPython), nor when __del__ is called (if ever), and so on. As with the Python-named-Monty, we have rigidly defined areas of doubt and uncertainty. These exist for good reasons: to allow different implementations. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: What other languages use the same data model as Python?
In article iq1e0j02...@news2.newsguy.com I wrote, in part: Like it or not, Python has similar defined as undefined grey areas: one is not promised, for instance, whether the is operator is always True for small integers that are equal (although it is in CPython), nor when __del__ is called (if ever), and so on. As with the Python-named-Monty, we have rigidly defined areas of doubt and uncertainty. These exist for good reasons: to allow different implementations. Oops, attribution error: this comes from Douglas Adams rather than Monty Python. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Popen Question
In article 891a9a80-c30d-4415-ac81-bddd0b564...@g13g2000yqj.googlegroups.com moogyd moo...@yahoo.co.uk wrote: [sde:st...@lbux03 ~]$ python Python 2.6 (r26:66714, Feb 21 2009, 02:16:04) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2 Type help, copyright, credits or license for more information. import os, subprocess os.environ['MYVAR'] = myval p = subprocess.Popen(['echo', '$MYVAR'],shell=True) Alain Ketterlin has already explained these to some extent. Here is a bit more. This runs, underneath: ['/bin/sh', '-c', 'echo', '$MYVAR'] (with arguments expressed as a Python list). /bin/sh takes the string after '-c' as a command, and the remaining argument(s) if any are assigned to positional parameters ($0, $1, etc). If you replace the command with something a little more explicit, you can see this: p = subprocess.Popen( ...[r'echo \$0=$0 \$1=$1', 'arg0', '$MYVAR'], shell=True) $0=arg0 $1=$MYVAR p.wait() 0 (I like to call p.communicate() or p.wait(), although p.communicate() is pretty much a no-op if you have not done any redirecting. Note that p.communicate() does a p.wait() for you.) p = subprocess.Popen(['echo', '$MYVAR']) $MYVAR This time, as Alain noted, the shell does not get involved so no variable expansion occurs. However, you could do it yourself: p = subprocess.Popen(['echo', os.environ['MYVAR']]) myval p.wait() 0 p = subprocess.Popen('echo $MYVAR',shell=True) myval (here /bin/sh does the expansion, because you invoked it) p = subprocess.Popen('echo $MYVAR') Traceback (most recent call last): File stdin, line 1, in module File /usr/lib64/python2.6/subprocess.py, line 595, in __init__ errread, errwrite) File /usr/lib64/python2.6/subprocess.py, line 1106, in _execute_child raise child_exception OSError: [Errno 2] No such file or directory This attempted to run the executable named 'echo $MYVAR'. It did not exist so the underlying exec (after the fork) failed. The exception was passed back to the subprocess module, which raised it in the parent for you to see. If you were to create an executable named 'echo $MYVAR' (including the blank and dollar sign) somewhere in your path (or use an explicit path to it), it would run. I will also capture the actual output this time: $ cat '/tmp/echo $MYVAR' #! /usr/bin/awk NR1{print} this is a self-printing file anything after the first line has NR 1, so gets printed $ chmod +x '/tmp/echo $MYVAR' $ python Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type help, copyright, credits or license for more information. import subprocess p = subprocess.Popen('/tmp/echo $MYVAR', stdout=subprocess.PIPE) print p.communicate()[0] this is a self-printing file anything after the first line has NR 1, so gets printed p.returncode 0 Incidentally, fun with #!: you can make self-renaming scripts: sh-3.2$ echo '#! /bin/mv' /tmp/selfmove; chmod +x /tmp/selfmove sh-3.2$ ls /tmp/*move* /tmp/selfmove sh-3.2$ /tmp/selfmove /tmp/I_moved sh-3.2$ ls /tmp/*move* /tmp/I_moved sh-3.2$ or even self-removing scripts: sh-3.2$ echo '#! /bin/rm' /tmp/rmme; chmod +x /tmp/rmme sh-3.2$ /tmp/rmme sh-3.2$ /tmp/rmme sh: /tmp/rmme: No such file or directory (nothing to do with python, just the way #! interpreter lines work). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Interaction btw unittest.assertRaises and __getattr__. Bug?
In article c38a5bb4-6087-453c-8873-f193927fd...@d8g2000yqf.googlegroups.com Inyeol inyeol@gmail.com wrote: [snippage below] import unittest class C(): def __getattr__(self, name): raise AttributeError class Test(unittest.TestCase): def test_getattr(self): c = C() self.assertRaises(AttributeError, c.foo) unittest.main() - ... or am I missing something obvious? As Benjamin Peterson noted, the error occurs too soon, so that the unittest code never has a chance to see it. The something obvious is to defer the evaluation just long enough: self.assertRaises(AttributeError, lambda: c.foo) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: merge list of tuples with list
On Wed, Oct 20, 2010 at 1:33 PM, Daniel Wagner brocki2...@googlemail.com wrote: Any more efficient ways or suggestions are still welcome! In article mailman.58.1287547882.2218.python-l...@python.org James Mills prolo...@shortcircuit.net.au wrote: Did you not see Paul Rubin's solution: [x+(y,) for x,y in zip(a,b)] [(1, 2, 3, 7), (4, 5, 6, 8)] I think this is much nicer and probably more efficient. For a slight boost in Python 2.x, use itertools.izip() to avoid making an actual list out of zip(a,b). (In 3.x, plain zip() is already an iterator rather than a list-result function.) This method (Paul Rubin's) uses only a little extra storage, and almost no extra when using itertools.izip() (or 3.x). I think it is more straightforward than multi-zip-ing (e.g., zip(*zip(*a) + [b])) as well. The two-zip method needs list()-s in 3.x as well, making it clearer where the copies occur: list(zip(*a)) makes the list [(1, 4), (2, 5), (3, 6)] [input value is still referenced via a so sticks around] [b] makes the tuple (7, 8) into the list [(7, 8)] [input value is still referenced via b so sticks around] + adds those two lists producing the list [(1, 4), (2, 5), (3, 6), (7, 8)] [the two input values are no longer referenced and are thus discarded] list(zip(*that)) makes the list [(1, 2, 3, 7), (4, 5, 6, 8)] [the input value -- the result of the addition in the next to last step -- is no longer referenced and thus discarded] All these temporary results take up space and time. The list comprehension simply builds the final result, once. Of course, I have not used timeit to try this out. :-) Let's do that, just for fun (and to let me play with timeit from the command line): (I am not sure why I have to give the full path to the timeit.py source here) sh-3.2$ python /System/Library/Frameworks/Python.framework/\ Versions/2.5/lib/python2.5/timeit.py \ 'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]' 10 loops, best of 3: 2.55 usec per loop sh-3.2$ python [long path snipped] \ 'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]' 10 loops, best of 3: 2.56 usec per loop sh-3.2$ python [long path snipped] \ 'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + [b])' 10 loops, best of 3: 3.84 usec per loop sh-3.2$ python [long path snipped] \ 'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + [b])' 10 loops, best of 3: 3.85 usec per loop Hence, even in 2.5 where zip makes a temporary copy of the list, the list comprehension version is faster. Adding an explicit use of itertools.izip does help, but not much, with these short lists: sh-3.2$ python ... -s 'import itertools' \ 'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]' 10 loops, best of 3: 2.27 usec per loop sh-3.2$ python ... -s 'import itertools' \ 'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]' 10 loops, best of 3: 2.29 usec per loop (It is easy enough to move the assignments to a and b into the -s argument, but it makes relatively little difference since the list comprehension and two-zip methods both have the same setup overhead. The import, however, is pretty slow, so it is not good to repeat it on every trip through the 10 loops -- on my machine it jumps to 3.7 usec/loop, almost as slow as the two-zip method.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Catching a SIGSEGV signal on an import
(I realize this is old but I am recovering from dental surgery and, while on the Good Drugs for the pain, going through old stuff on purpose :-) ) On Thu, 09 Sep 2010 05:23:14 -0700, Ryan wrote: In general, is there anyway to catch a SIGSEGV on import? In article pan.2010.09.09.21.20.26.16...@nowhere.com, Nobody nob...@nowhere.com wrote: No. If SIGSEGV is raised, it often indicates that memory has been corrupted. At that point, you can't assume that the Python runtime is still functional. Indeed. Still, there *is* a way to do this, should you choose to live somewhat dangerously. First, make a copy of the original process. Using Unix as an example: pid = os.fork() if pid == 0: # child import untrustworthy os._exit(0) The import will either succeed or fail. If it fails with a SIGSEGV the child process will die; if not, the child will move on to the next statement and exit (using os._exit() to bypass exit handlers, since this is a forked child etc). The parent can then do a waitpid and see whether the child was able to do the import. The obvious flaw in this method is that something that causes Python to die with a SIGSEGV when imported probably has some serious bugs in it, and depending on the state of the importing process, these bugs might not cause a problem immediately, but instead set time-bombs that will go off later. In this case, the child import will succeed and the parent will then trust the import itself (note that you have to re-do the same import in the parent as it is completely independent after the fork()). Still, if you are dead set on the idea, the test code below that I threw together here may be helpful. --- import os, signal, sys pid = os.fork() if pid == 0: # deliberately not checking len(sys.argv) nor using try # this allows you to see what happens if you run python t.py # instead of python t.py sig or python t.py fail or # python t.py ok, for instance. if sys.argv[1] == 'sig': os.kill(os.getpid(), signal.SIGSEGV) if sys.argv[1] == 'fail': os._exit(1) # Replace the above stuff with the untrustworthy import, # assuming you like the general idea. os._exit(0) print 'parent: child =', pid wpid, status = os.waitpid(pid, 0) print 'wpid =', wpid, 'status =', status if os.WIFSIGNALED(status): print 'child died from signal', os.WTERMSIG(status) if os.WCOREDUMP(status): print '(core dumped)' elif os.WIFEXITED(status): print 'child exited with', os.WEXITSTATUS(status) # at this point the parent can repeat the import else: print 'I am confused, maybe I got the wrong pid' --- The same kind of thing can be done on other OSes, but all the details will differ. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Simple logging example doesn't work!
, in that the # name to underlying file mapping can change in the presence of # a rename. Do not use this for security-issue operations.) def fd_is_open_to(fileno, filename): try: s2 = os.stat(filename) except OSError: return False s1 = os.fstat(fileno) return s1.st_dev == s2.st_dev and s1.st_ino == s2.st_ino errs = False # Configure our logs as directed. logconf = conf['logging'] # First, adjust stderr output level. We deliberately do # this before changing other log handers, so that new debug # messages printed here can be seen. (Maybe should do raise # now and lower later, but does not seem worth the effort.) level = logging.getLevelName(logconf['stderr-level'].upper()) g.stderr_logger.setLevel(level) # Gripe about old unsupported config, if needed. if conf['USE-FAST-LOGGER']: logger.error('FAST logger no longer supported') errs = True # Now set up syslog logger, if any. syslog_to = logconf['syslog-to'] if syslog_to: # Might be nice to remember previous syslog-to (if any) # and not create and delete handler if unchanged. (But # see comments elsewhere within this function.) addr = get_syslog_addr(syslog_to) logger.debug('syslog to: %s' % str(addr)) try: sh = logging.handlers.SysLogHandler(addr, logging.handlers.SysLogHandler.LOG_DAEMON) sh.setFormatter(logging.Formatter(g.syslog_format)) except IOError, e: logger.error('syslog-to: %s', e) errs = True sh = g.syslog_logger level = logging.getLevelName(logconf['syslog-level'].upper()) if sh: sh.setLevel(level) else: logger.debug('syslog logging suppressed') sh = None # And file logger, if any. filepath = logconf['file'] if filepath: if not os.path.isabs(filepath): newpath = os.path.join(conf['NODEMGR-BASE-PATH'], filepath) logger.warning('logging file=%s: relative path converted to %s', filepath, newpath) filepath = newpath logger.debug('filelog to: %s' % str(filepath)) mode = logconf['mode'] maxsize = logconf['max-size'] try: maxsize = utils.string_to_bytes(maxsize) except ValueError: logger.error('logging max-size=%s: not a valid size', maxsize) maxsize = 1 * 1024 * 1024 # 1 MB backup_count = logconf['backup-count'] level = logging.getLevelName(logconf['level'].upper()) # If mode is 'w' and maxsize==0, this will open an existing # file for writing, truncating it. If the existing file is # our own currently-open log file, this does the wrong thing: # we really only want any new level to apply. # # (If mode is 'a', it's harmless to re-open it, and if # maxsize0 the RotatingFileHandler changes the mode to 'a'. # In these cases we want to pick up any max-size or backup-count # changes as well.) fh = g.file_logger if mode == 'w' and maxsize == 0 else None if fh and fd_is_open_to(fh.stream.fileno(), filepath): pass # use it unchanged else: try: fh = logging.handlers.RotatingFileHandler(filepath, mode, maxsize, backup_count) fh.setFormatter(logging.Formatter(g.log_format)) except IOError, e: logger.error('log to file: %s', e) errs = True fh = g.file_logger if fh: fh.setLevel(level) else: logger.debug('file logging suppressed') fh = None if not errs: # Swap out syslog and file loggers last, so that any previous # logging about syslog logging and file logging goes to the # old loggers (if any). g.syslog_logger = swapout(g.syslog_logger, sh) g.file_logger = swapout(g.file_logger, fh) return errs -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: ANN: stats 0.1a calculator statistics for Python
2010/10/17 Steven D'Aprano st...@remove-this-cybersource.com.au: http://pypi.python.org/pypi/stats In article mailman.23.1287437081.15964.python-l...@python.org Vlastimil Brom vlastimil.b...@gmail.com wrote: Thanks for this useful module! I just wanted to report a marginal error triggered in the doctests: Failed example: isnan(float('nan')) Exception raised: Traceback (most recent call last): File C:\Python25\lib\doctest.py, line 1228, in __run compileflags, 1) in test.globs File doctest __main__.isnan[0], line 1, in module isnan(float('nan')) ValueError: invalid literal for float(): nan (python 2.5.4 on win XP; this might be OS specific; probably in the newer versions float() was updated, the tests on 2.6 and 2.7 are ok ): Indeed it was; in older versions float() just invoked the C library routines, so float('nan') works on Mac OS X python 2.5, for instance, but then you run into the fact that math.isnan() is only in 2.6 and later :-) Workaround, assuming an earlier from math import *: try: isnan(0.0) except NameError: def isnan(x): x != x Of course you are still stuck with float('nan') failing on Windows. I have no quick and easy workaround for that one. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Spreadsheet-style dependency tracking
(r.S[c], nstack) #nstack.pop() #r.L.append(node) # Build set S of all cells (r.S) which gives their dependencies. # By indexing by cell, we can find cells from dependencies in visit(). for row in sheet: for cell in row: if cell: r.S[cell] = cell cell.visited = False # Now simply (initial-)visit all the cells. for cell in r.S.itervalues(): visit(cell) # Now r.L defines an evaluation order; it has at least one cycle in it # if r.cycles is nonempty. return (r.L, r.cycles) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: How to implement retrying a lock tidily in Python?
In article 4imro7-ds6@chris.zbmc.eu, tinn...@isbd.co.uk wrote: I'm writing some code that writes to a mbox file and want to retry locking the mbox file a few times before giving up. ... dest = mailbox.mbox(mbName, factory=None) for tries in xrange(3): try: dest.lock() # # # Do some stuff to the mbox file # dest.unlock() break # done what we need, carry on except mailbox.ExternalClashError: log(Destination locked, try + str(tries)) time.sleep(1) # and try again ... but this doesn't really work 'nicely' because the break after dest.unlock() takes me to the same place as running out of the number of tries in the for loop. Seems to me the right place for this is a little wrapper lock as it were: def retried_lock(max_attempts=3): for tries in xrange(max_attempts): try: self.lock() return # got the lock except mailbox.ExternalClashError: log and sleep here raise mailbox.ExternalClashError # or whatever and now instead of dest.lock() you just do a dest.retried_lock(). Plumbing (including fitting this in as a context manager so that you can just do with dest or some such) is left as an exercise, :-) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: EOF while scanning triple-quoted string literal
On 2010-10-15, Grant Edwards inva...@invalid.invalid wrote: How do you create a [Unix] file with a name that contains a NULL byte? On 2010-10-15, Seebs usenet-nos...@seebs.net wrote: So far as I know, in canonical Unix, you don't -- the syscalls all work with something like C strings under the hood, meaning that no matter what path name you send, the first null byte actually terminates it. In article i9a84m$rp...@reader1.panix.com Grant Edwards inva...@invalid.invalid wrote: Yes, all of the Unix syscalls use NULL-terminated path parameters (AKA C strings). What I don't know is whether the underlying filesystem code also uses NULL-terminated strings for filenames or if they have explicit lengths. If the latter, there might be some way to bypass the normal Unix syscalls and actually create a file with a NULL in its name -- a file that then couldn't be accessed via the normal Unix system calls. My _guess_ is that the underlying filesystem code in most all Unices also uses NULL-terminated strings, but I haven't looked yet. Multiple common on-disk formats (BSD's UFS variants and Linux's EXTs, for instance) use counted strings, so it is possible -- via disk corruption or similar -- to get impossible file names (those containing either an embedded NUL or an embedded '/'). More notoriously, earlier versions of NFS could create files with embedded slashes when serving non-Unix clients. These were easily removed with the same non-Unix client, but not on the server! :-) None of this has anything to do with the original problem, in which a triple-quoted string is left to contain arbitrary binary data (up to, of course, the closing triple-quote). Should that arbitrary binary data itself happen to include a triple-quote, this trivial encoding technique will fail. (And of course, as others have noted, it fails on some systems that distinguish betwen text and binary file formats in the first place.) This is why using some text-friendly encoding scheme, such as base64, is a good idea. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: what happens to Popen()'s parent-side file descriptors?
In message pan.2010.10.15.06.27.02.360...@nowhere.com, Nobody wrote: Another gotcha regarding pipes: the reader only sees EOF once there are no writers, i.e. when the *last* writer closes their end. In article i9atra$j4...@lust.ihug.co.nz Lawrence D'Oliveiro l...@geek-central.gen.new_zealand wrote: Been there, been bitten by that. Nobody mentioned the techniques of setting close_fds = True and passing a preexec_fn that closes the extra pipe descriptors. You can also use fcntl.fcntl() to set the fcntl.FD_CLOEXEC flag on the underlying file descriptors (this of course requires that you are able to find them). The subprocess module sets FD_CLOEXEC on the pipe it uses to pass back a failure to exec, or even to reach the exec, e.g., due to an exception during preexec_fn. One could argue that perhaps it should set FD_CLOEXEC on the parent's remaining pipe descriptors, once the child is successfully started, if it created them (i.e., if the corresponding arguments were PIPE). In fact, thinking about it now, I *would* argue that. -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: what happens to Popen()'s parent-side file descriptors?
that any as-yet-uncollected fork()ed processes are eventually waitpid()-ed for. Can anyone explain the treatment of the pipe FDs opened in the parent by Popen() to me or point me to some documentation? The best documentation seems generally to be the source. Fortunately subprocess.py is written in Python. (Inspecting C modules is less straightforward. :-) ) Also, does Popen.returncode contain only the child's exit code or is does it also contain signal info like the return of os.wait()? Documentation on this is also unclear to me. A negative value -N indicates that the child was terminated by signal N (Unix only). Again, the Python source is handy: def _handle_exitstatus(self, sts): if os.WIFSIGNALED(sts): self.returncode = -os.WTERMSIG(sts) elif os.WIFEXITED(sts): self.returncode = os.WEXITSTATUS(sts) else: # Should never happen raise RuntimeError(Unknown child exit status!) The only things left out are the core-dump flag, and stopped/suspended. The latter should never occur as os.waitpid() is called with only os.WNOHANG, not os.WUNTRACED (of course a process being traced, stopping at a breakpoint, would mess this up, but subprocess.Popen is not a debugger :-) ). It might be nice to capture os.WCOREDUMPED(sts), though. Also, while I was writing this, I discovered that appears to be a buglet in _cleanup(), with regard to abandoned Unix processes that terminate due to a signal. Note that _handle_exitstatus() will set self.returncode to (e.g.) -1 if the child exits due to SIGHUP. The _cleanup() function, however, does this in part: if inst.poll(_deadstate=sys.maxint) = 0: try: _active.remove(inst) The Unix-specific poll() routine, however, reads: if self.returncode is None: try: pid, sts = os.waitpid(self.pid, os.WNOHANG) if pid == self.pid: self._handle_exitstatus(sts) except os.error: if _deadstate is not None: self.returncode = _deadstate return self.returncode Hence if pid 12345 is abandoned (and thus on _active), and we os.waitpid(12345, os.WNOHANG) and get a status that has a termination signal, we set self.returncode to -N, and return that. Hence inst.poll returns (e.g.) -1 and we never attempt to remove it from _active. Now that its returncode is not None, though, every later poll() will continue to return -1. It seems it would be better to have _cleanup() read: if inst.poll(_deadstate=sys.maxint) is not None: (Note, this is python 2.5, which is what I have installed on my Mac laptop, where I am writing this at the moment). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: what happens to Popen()'s parent-side file descriptors?
on so this is not necessary, but again, using the communicate function will close them for you. In this case, though, I am not entirely sure subprocess is the right hammer -- it mostly will give you portablility to Windows (well, plus the magic for preexec_fn and reporting exec failure). Once again, peeking at the source is the trick :-) ... the arguments you provide for stdin, stdout, and stderr are used thus: if stdin is None: pass elif stdin == PIPE: p2cread, p2cwrite = os.pipe() elif isinstance(stdin, int): p2cread = stdin else: # Assuming file-like object p2cread = stdin.fileno() (this is repeated for stdout and stderr) and the resulting integer file descriptors (or None if not applicable) are passed to os.fdopen() on the parent side. (On the child side, the code does the usual shell-like dance to move the appropriate descriptors to 0 through 2.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: My first Python program
In article slrnibboof.29uv.usenet-nos...@guild.seebs.net Seebs usenet-nos...@seebs.net wrote: * raising `Exception` rather than a subclass of it is uncommon. Okay. I did that as a quick fix when, finally having hit one of them, I found out that 'raise Error message' didn't work. :) I'm a bit unsure as to how to pick the right subclass, though. For exceptions, you have two choices: - pick some existing exception that seems to make sense, or - define your own. The obvious cases for the former are things like ValueError or IndexError. Indeed, in many cases, you just let a work-step raise these naturally: def frobulate(self, x): ... self.table[x] += ... # raises IndexError when x out of range ... For the latter, make a class that inherits from Exception. In a whole lot of cases a trivial/empty class suffices: class SoLongAndThanksForAllTheFish(Exception): pass def ...: ... if somecondition: raise SoLongAndThanksForAllTheFish() Since Exception provides a base __init__() function, you can include a string: raise SoLongAndThanksForAllTheFish('RIP DNA') which becomes the .message field: x = SoLongAndThanksForAllTheFish('RIP DNA') x.message 'RIP DNA' -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: My first Python program
In article mailman.1673.1286992432.29448.python-l...@python.org Jonas H. jo...@lophus.org wrote: On 10/13/2010 06:48 PM, Seebs wrote: Is it safe for me to assume that all my files will have been flushed and closed? I'd normally assume this, but I seem to recall that not every language makes those guarantees. Not really. Files will be closed when the garbage collector collects the file object, but you can't be sure the GC will run within the next N seconds/instructions or something like that. So you should *always* make sure to close files after using them. That's what context managers were introduced for. with open('foobar') as fileobject: do_something_with(fileobject) basically is equivalent to (simplified!) fileobject = open('foobar') try: do_something_with(fileobject) finally: fileobject.close() So you can sure `fileobject.close()` is called in *any* case. Unfortunately with is newish and this code currently has to support python 2.3 (if not even older versions). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Class-level variables - a scoping issue
In article 4cb14f8c$0$1627$742ec...@news.sonic.net John Nagle na...@animats.com wrote: Here's an obscure bit of Python semantics which is close to being a bug: [assigning to instance of class creates an attribute within the instance, thus obscuring the class-level version of the attribute] This is sort of a feature, but one I have been reluctant to use: you can define default values for instances within the class, and only write instance-specific values into instances as needed. This would save space in various cases, for instance. Python protects global variables from similar confusion by making them read-only when referenced from an inner scope without a global statement. But that protection isn't applied to class-level variables referenced through 'self'. Perhaps it should be. It's not really clear to me how one would distinguish between accidental and deliberate creation of these variables, syntactically speaking. If you want direct, guaranteed access to the class-specific variable, using __class__ is perhaps the Best Way right now: class K: ... x = 42 ... def __init__(self): pass ... inst = K() inst.x # just to show that we're getting K.x here 42 inst.x = 'hah' inst.x 'hah' inst.__class__.x 42 One could borrow the nonlocal keyword to mean I know that there is potential confusion here between instance-specific attribute and class-level attribute, but the implication seems backwards: nonlocal self.foo implies that you want self.foo to be shorthand for self.__class__.foo, not that you know that self.__class__.foo exists but you *don't* want to use that. If Python had explicit local variable declarations, then: local self.foo would be closer to the implied semantics here. As it is, I think Python gets this pretty much right, and if you think this is more a bug than a feature, you can always insert assert statements in key locations, e.g.: assert 'foo' not in inst.__class__.__dict__, \ 'overwriting class var foo' (you can even make that a function using introspection, although it could get pretty hairy). -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: list parameter of a recursive function
In article rsuun7-eml@rama.universe TP tribulati...@paralleles.invalid wrote: I have a function f that calls itself recursively. It has a list as second argument, with default argument equal to None (and not [], as indicated at: http://www.ferg.org/projects/python_gotchas.html#contents_item_6 ) This is the outline of my function: def f ( argument, some_list = None ): if some_list == None: some_list = [] [...] # creation of a new_argument # the function is called recursively only if some condition is respected if some_condition: some_list.append( elem ) f( new_argument, some_list ) # otherwise, we have reached a leaf of the a branch of the recursive tree # (said differently, terminal condition has been reached for this branch) print Terminal condition The problem is that when the terminal condition is reached, we return back to some other branch of the recursive tree, but some_list has the value obtained in the previous branch! Yes, this is the way it is supposed to work. :-) So, it seems that there is only one some_list list for all the recursive tree. To get rid of this behavior, I have been compelled to do at the beginning of f: import copy from copy [from copy import copy, rather] some_list = copy( some_list ) I suppose this is not a surprise to you: I am compelled to create a new some_list with the same content. The above will work, or for this specific case, you can write: some_list = list(some_list) which has the effect of making a shallow copy of an existing list: base = [1, 2] l1 = base l2 = list(l1) l1 is l2 False l1[0] is l2[0] True base.append(3) l2 [[1, 2, 3]] but will also turn *any* iterator into a (new) list; the latter may often be desirable. So, if I am right, all is happening as if the parameters of a function are always passed by address to the function. Whereas in C, they are always passed by copy (which gives relevance to pointers). Am I right? Mostly. Python distinguishes between mutable and immutable items. Mutable items are always mutable, immutable items are never mutable, and the mutability of arguments is attached to their fundamental mutability rather than to their being passed as arguments. This is largely a point-of-view issue (although it matters a lot to people writing compilers, for instance). Note that if f() is *supposed* to be able to modify its second parameter under some conditions, you would want to make the copy not at the top of f() but rather further in, and in this case, that would be trivial: def f(arg, some_list = None): if some_list is None: some_list = [] ... if some_condition: # make copy of list and append something f(new_arg, some_list + [elem]) elif other_condition: # continue modifying same list f(new_arg, some_list) ... (Note: you can use also the fact that list1 + list2 produces a new list to make a copy by writing x = x + [], or x = [] + x, but x = list(x) is generally a better idea here.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Inheritance and name clashes
here ... we're not using # that so I omit it func = None if func is not None: logger.debug('%s: %s%s', format_ipaddr(self.client_address), method, str(params)) try: return func(*params) except MgrError, e: # Given, e.g., MgrError(ValueError('bad value'))), # send the corresponding exc_type / exc_val back # via xmlrpclib, which transforms it into a Fault. raise e.exc_type, e.exc_val except xmlrpclib.Fault: # Already a Fault, pass it back unchanged. raise except TypeError, e: # If the parameter count did not match, we will get # a TypeError with the traceback ending with our own # call at func(*params). We want to pass that back, # rather than logging it. # # If the TypeError happened inside func() or one of # its sub-functions, the traceback will continue beyond # here, i.e., its tb_next will not be None. if sys.exc_info()[2].tb_next is None: raise # else fall through to error-logging code except: pass # fall through to error-logging code # Any other exception is assumed to be a bug in the server. # Log a traceback for server debugging. # is logger.error exc_info thread-safe? let's assume so logger.error('internal failure in %s', method, exc_info = True) # traceback.format_exc().rstrip() raise xmlrpclib.Fault(2000, 'internal failure in ' + method) else: logger.info('%s: bad request: %s%s', format_ipaddr(self.client_address), method, str(params)) raise Exception('method %s is not supported' % method) # Tests of the form: # c = new_class_object(params) # if c: ... # are turned into calls to the class's __nonzero__ method. # We don't do if server: in our own server code, but if we did # this would get called, and it's reasonable to just define it as # True. Probably the existing SimpleXMLRPCServer (or one of its # base classes) should have done this, but they did not. # # For whatever reason, the xml-rpc library routines also pass # a client's __nonzero__ (on his server proxy connection) to us, # which reaches our dispatcher above. By registering this in # our __init__, clients can do if server: to see if their # connection is up. It's a frill, I admit def __nonzero__(self): return True def register_admin_function(self, f, name = None): ... more stuff snipped out ... # --END-- threading XML RPC server code -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list
Re: partial sums problem
In article i7trs4$9e...@reader1.panix.com kj no.em...@please.post wrote: The following attempt to get a list of partial sums fails: s = 0 [((s += t) and s) for t in range(1, 10)] File stdin, line 1 [((s += t) and s) for t in range(1, 10)] ^ SyntaxError: invalid syntax What's the best way to get a list of partial sums? Well, define best; but curiously enough, I wrote this just a few days ago for other purposes, so here you go, a slightly cleaned-up / better documented version of what I wrote: def iaccumulate(vec, op): Do an accumulative operation on a vector (any iterable, really). The result is a generator whose first call produces vec[0], second call produces vec[0] op vec[1], third produces (vec[0] op vec[1]) op vec[2], and so on. Mostly useful with + and *, probably. iterable = iter(vec) acc = iterable.next() yield acc for x in iterable: acc = op(acc, x) yield acc def cumsum(vec): Return a list of the cumulative sums of a vector. import operator return list(iaccumulate(vec, operator.add)) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603 email: gmail (figure it out) http://web.torek.net/torek/index.html -- http://mail.python.org/mailman/listinfo/python-list