Re: Writing a Carriage Return in Unicode
MRAB wrote: u'\u240D' isn't a carriage return (that's u'\r') but a symbol (a visible "CR" graphic) for carriage return. Windows programs normally expect lines to end with '\r\n'; just use u'\n' in programs and open the text files in text mode ('r' or 'w'). This is the one thing from standards that I believe Microsoft got right where others did not. The ASCII (American Standard for Information Interchange) standard end of line is _both_ carriage return (\r) _and_ line feed (\n) -- I believe in that order. The Unix operating system, in its enthusiasm to make _everything_ simpler (against Einstein's advice, "Everything should be made as simple as possible, but not simpler.") decided that end-of-line should be a simple line feed and not carriage return line feed. Before they made that decision, there was debate about the order of cr-lf or lf-cr, or inventing a new EOL character ('\037' == '\x1F' was the candidate). If you've actually typed on a physical typewriter, you know that moving the carriage back is a distinct operation from rolling the platen forward; both operations are accomplished when you push the carriage back using the bar, but you know they are distinct. Hell, MIT even had "line starve" character that moved the cursor up (or rolled the platen back). Lots of people talk about "dos-mode files" and "windows files" as if Microsoft got it wrong; it did not -- Unix made up a convenient fiction and people went along with it. (And, yes, if Unix had been there first, their convention was, in fact, better). So, sorry for venting, but I have bee wanting to say this in public for years. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: FYI: ConfigParser, ordered options, PEP 372 and OrderedDict + big thank you
Jonathan Fine wrote:... A big thanks to Armin Ronacher and Raymond Hettinger for PEP 372: Adding an ordered dictionary to collections ... I prototyped (in about an hour). I then thought - maybe someone has been down this path before So all that I want has been done already, and will be waiting for me when I move to Python3. So a big thank you is in order. And thank you for, having done that, not simply smiling because your work was lighter. Instead you described a great work path and handed an attaboy to a pair of people that richly deserve attaboys. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: TODO and FIXME tags
Martin P. Hellwig wrote: Ben Finney wrote: Chris Rebert writes: 2009/11/16 Yasser Almeida Hernández : How is the sintaxis for set the TODO and FIXME tags...? ... There's no widely-followed “syntax” for this convention, though. Except for _not_ doing what is suggested in those comments, which appears to be the biggest convention :-) Perhaps: "The comments are a directive to delete the comment if you happen do this." --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: ZipFile - file adding API incomplete?
Glenn Maynard wrote: I want to do something fairly simple: read files from one ZIP and add them to another, so I can remove and replace files. This led me to a couple things that seem to be missing from the API. zip.write() only takes the filename and compression method, not a ZipInfo; writestr takes a ZipInfo but only accepts a string, not a file. Is there an API call I'm missing? (This seems like the fundamental API for adding files, that write and writestr should be calling.) Simple answer: its not there in the API. Defining that API correctly is tricky, and fraught with issues about access to the ZipFile object (from both the same thread and from other threads) while it is mid-modification. Nonetheless, a carefully done API that addresses those issues would be valuable. If you do spend the time to get something reliable going, put it someplace public and I predict it will get use. The approach I fiddled with was: * Define a calls to read _portions_ of the raw (compressed, encrypted, whatever) data. * Define a call that locks the ZipFile object and returns a write handle for a single new file. At that point the new file doesn't exist, but reading of other portions of the zip file are allowed. * Only on successful close of the "write handle" is the new directory written. Unfortunately, I never worked very hard at the directory entries, and I realize that the big flaw in this design is that from the moment you start overwriting the existing master directory until you write a new master at the end, your do not have a valid zip file. Also note that you'll have to research standards about _exactly_ what the main header should look like if you use particular features. My stuff did bzip compression as well, and about the "find which bits means what" was where my process broke down. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: python gui builders
me wrote: I have looked at the Tk stuff that is built into Python -> not acceptable. Such insightful analysis, and it is _so_ helpful in stating your needs. [a lot of guff about unacceptable things] What Python gui builder is well supported, does not require me to learn another framework/library, and can crank out stuff for multiple platforms ? Well, let's see. You want to do gui work without learning things. Good luck with that. If you discover how, I'd like to learn tensor analysis without using symbols or operations more complex than addition and subtraction. Maybe your groundwork can help me out with that. I must be in a really cranky mood today. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: overriding __getitem__ for a subclass of dict
Steve Howell wrote: ... Eventually, I realized that it was easier to just monkeypatch Django while I was in test mode to get a more direct hook into the behavior I was trying to monitor, and then I didn't need to bother with overriding __getitem__ or creating complicated wrapper objects Since nobody else has mentioned it, I'd point you at Mock objects: http://python-mock.sourceforge.net/ for another way to skin the cat that it sounds like has been biting you. They are surprisingly useful for exploratory and regression testing. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Serious Privileges Problem: Please Help
Dave Angel wrote: Victor Subervi wrote: On Mon, Nov 9, 2009 at 2:30 PM, Victor Subervi wrote: On Mon, Nov 9, 2009 at 2:27 PM, Rami Chowdhury wrote: Hold everything. Apparently line-endings got mangled. What I don't ... What I've diagnosed as happening when a python script with Windows line-ending was posted on my server's cgi environment: The actual error seemed to be a failure to find the python interpreter, since some Unix shells take the shebang line to include the \r character that preceded the newline. Seems to me they could be more tolerant, since I don't think control characters are likely in the interpreter file name. You could work around this by creating a symlink (or even hard link to the python executable named "python\r" --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: list comprehension problem
Terry Reedy wrote: What immutability has to do with identity is that 'two' immutable objects with the same value *may* actually be the same object, *depending on the particular version of a particular implementation*. t1 = (1,2,3) # an immutable object t2 = (1,2,3) # another immutable object Whether or not this is 'another' object or the same object is irrelevant for all purposes except identity checking. It is completely up to the interpreter. t1 is t2 False In this case, but it could have been True. t1 == t2 True A more telling example: >>> t1 = (1, 2) + (3,) # an immutable object >>> t2 = (1,) + (2, 3) # another immutable object >>> t1 is t2 >> False >>>>> t1 is t2 >> False Here you make obvious that (assuming an optimizer that is not far more aggressive than Python is used to), in order to make equal immutable values identical, you'd have to end each operation producing an immutable result with a search of all appropriately typed values for one that was equal. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Another (simple) unicode question
John Machin wrote: On Oct 29, 10:02 pm, Rustom Mody wrote:... I thought of trying to port it to python3 but it barfs on some unicode related stuff (after running 2to3) which I am unable to wrap my head around. Can anyone direct me to what I should read to try to understand this? to which Jon replied with some good links to start, and then: In any case, it's a debugging problem, isn't it? Could you possibly consider telling us the error message, the traceback, a few lines of the 3.x code around where the problem is, and the corresponding 2.x lines? Are you using 3.1.1 and 2.6.4? Does your test work in 2.6? Also consider how 2to3 translates the problem section(s). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: lambda forms within a loop
Michal Ostrowski wrote: ... [a,b] = MakeLambda() print a(10) print b(10) Here is yet another way to solve the problem: import functools def AddPair(x, q): return x + q a, b = [functools.partial(AddPair, x) for x in [1, 2]] print a(10) print b(10) Or even, since these are numbers: a, b = [x.__add__ for x in [1, 2]] print a(10) print b(10) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: IDLE python shell freezes after running show() of matplotlib
Forrest Sheng Bao wrote: I am having a weird problem on IDLE. After I plot something using show () of matplotlib, the python shell prompt in IDLE just freezes that I cannot enter anything and there is no new ">>>" prompt show up. I tried ctrl - C and it didn't work. I have to restart IDLE to use it again. My system is Ubuntu Linux 9.04. I used apt-get to install IDLE. You should really look at smart questions; I believe you have a problem, and that you have yet to imagine how to give enough information for someone else to help you. http://www.catb.org/~esr/faqs/smart-questions.html Hint: I don't know your CPU, python version, IDLE version, matplotlib version, nor do you provide a small code example that allows me to easily reproduce your problem (or not). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: [ANN] Python(x,y) 2.6.3.0 released
Pierre Raybaut wrote: Hi all, I'm quite pleased (and relieved) to announce that Python(x,y) version 2.6.3.0 has been released. It is the first release based on Python 2.6 -- note that Python(x,y) version number will now follow the included Python version (Python(x,y) vX.Y.Z.N will be based on Python vX.Y.Z). Python(x,y) is a free Python distribution providing a ready-to-use scientific development software for numerical computations, data analysis and data visualization based on Python programming language, Qt graphical user interfaces (and development framework), Eclipse integrated development environment and Spyder interactive development environment. Its purpose is to help scientific programmers used to interpreted languages (such as MATLAB or IDL) or compiled languages (C/C++ or Fortran) to switch to Python. It is now available for Windows XP/Vista/7 (as well as for Ubuntu through the pythonxy-linux project -- note that included software may differs from the Windows version): http://www.pythonxy.com Major changes since v2.1.17: * Python 2.6.3 * Spyder 1.0.0 -- the Scientific PYthon Development EnviRonment, a powerful MATLAB-like development environment introducing exclusive features in the scientific Python community (http://packages.python.org/spyder/) * MinGW 4.4.0 -- including gcc 4.4.0 and gfortran * Pydev 1.5.0 -- now including the powerful code analysis features of Pydev Extensions (formerly available as a commercial extension to the free Pydev plugin) * Enthought Tool Suite 3.3.0 * PyQt 4.5.4 and PyQwt 5.2.0 * VTK 5.4.2 * ITK 3.16 -- Built for Python 2.6 thanks to the help of Charl Botha, DeVIDE (Delft Visualisation and Image processing Development Environment) Complete release notes: http://www.pythonxy.com/download.php - Pierre The really sad part is that you'll have to do 2.6.4.0 so soon. Actually, it is not so sad, since so little has changed (except, probably) the bits you have been struggling with. Please _do_ check out the release candidate soonest (since it will become production _very_ soon) -- get to python dev immediately if you have problems with the release candidate. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: a simple unicode question
George Trojan wrote: Scott David Daniels wrote: ... And if you are unsure of the name to use: >>> import unicodedata >>> unicodedata.name(u'\xb0') 'DEGREE SIGN' > Thanks for all suggestions. It took me a while to find out how to > configure my keyboard to be able to type the degree sign. I prefer to > stick with pure ASCII if possible. > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found > http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt > Is that the place to look? I thought the mention of unicodedata would make it clear. >>> for n in xrange(sys.maxunicode+1): try: nm = unicodedata.name(unichr(n)) except ValueError: pass else: if 'tortoise' in nm.lower(): print n, nm --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: a simple unicode question
Mark Tolonen wrote: Is there a better way of getting the degrees? It seems your string is UTF-8. \xc2\xb0 is UTF-8 for DEGREE SIGN. If you type non-ASCII characters in source code, make sure to declare the encoding the file is *actually* saved in: # coding: utf-8 s = '''48° 13' 16.80" N''' q = s.decode('utf-8') # next line equivalent to previous two q = u'''48° 13' 16.80" N''' # couple ways to find the degrees print int(q[:q.find(u'°')]) import re print re.search(ur'(\d+)°',q).group(1) Mark is right about the source, but you needn't write unicode source to process unicode data. Since nobody else mentioned my favorite way of writing unicode in ASCII, try: IDLE 2.6.3 >>> s = '''48\xc2\xb0 13' 16.80" N''' >>> q = s.decode('utf-8') >>> degrees, rest = q.split(u'\N{DEGREE SIGN}') >>> print degrees 48 >>> print rest 13' 16.80" N And if you are unsure of the name to use: >>> import unicodedata >>> unicodedata.name(u'\xb0') 'DEGREE SIGN' --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: No module named os
smi...@home.com wrote: 'import site' failed; use -v for traceback Traceback (most recent call last): File "./setup.py", line 3, in import sys, os, glob ImportError: No module named os I'm trying to build a small program and I get the above error. I have had this error popup in the past while trying to build other programs. What can I do? Thanks Go to a command line and type: $ python -v setup.py which will tell you which includes are tried in which order. If this doesn't make it painfully obvious, try: $ python -v -v setup.py which will tell you what locations are being checked for files. Normally you should: 1) tell us python version and which OS (and OS version) you are using. 2) include a pasted copy of exactly what did not work, along with the resulting output, and why you did not expect the output you got. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Rules regarding a post about a commercial product
Ken Elkabany wrote: I am hoping to get feedback for a new, commercial platform that targets the python programming language and its users. The product is currently in a closed-beta and will be free for at least a couple months. After reviewing the only rules I could find (http://www.python.org/community/lists/), I wanted to ask one last time to make sure that such a post would be appropriate. You might want to go for comp.lang.python.announce I am certain you are welcome if you don't spray the area with ads, see, for example, ActiveState's behavior. I trust that if you so start making real money from it, like ActiveState you'll help out the community that is giving you its support. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: When ‘super’ is not a good idea
Ben Finney wrote: Scott David Daniels wrote: ... class Initialized(ClassBase): @classmethod def _init_class(class_): class_.a, class_.b = 1, 2 super(Initialized, class_)._init_class() Mea culpa: Here super is _not_ a good idea, […] Why is ‘super’ not a good idea here? class Initialized(ClassBase): @classmethod def _init_class(class_): class_.a, class_.b = 1, 2 ClassBase._init_class() What makes this implementation better than the one using ‘super’? Well, it doesn't end with an error message :-) The reason for the error message is that super is built for instance methods, not class methods. You'd need a class method style super to get to "the next superclass in the __mro__ with an '_init_class' method." Personally I don't see the need. You could of course do it like this: class MyOtherType(type): def __new__(class_, name, bases, dct): result = type.__new__(class_, name, bases, dct) result()._init_class() return result class OtherClassBase(object): __metaclass__ = MyOtherType def _init_class(self): print 'initializing class' class Initialized(OtherClassBase): def _init_class(self): self.__class__.a, self.__class__.b = 1, 2 super(Initialized, self)._init_class() This code is a problem because the point of this exercise is to do initialization _before_ building an instance (think of building tables used in __init__). Before you decide that super should simply check if the second arg to super is a subclass of the first arg, and operate differently in that case (as my first code naively did), realize there is a problem. I saw the problem in trying the code, and simply tacked in the proper parent call and ran off to work. Think about the fact that classes are now objects as well; a class itself has a class (type or in these classes MyType or MyOtherType) with its own needs for super, and the combination would be a mess. I'm certain you'd get inadvertent switches across the two subtype hierarchies, but that belief may just be my fear of the inevitable testing and debugging issues such an implementation would require. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: PIL : How to write array to image ???
Mart. wrote: On Oct 5, 5:14 pm, Martin wrote: On Oct 4, 10:16 pm, "Mart." wrote: On Oct 4, 9:47 am, Martin wrote: On Oct 3, 11:56 pm, Peter Otten <__pete...@web.de> wrote: Martin wrote: Dear group I'm trying to use PIL to write an array (a NumPy array to be exact) to an image. Peace of cake, but it comes out looking strange. I use the below mini code, that I wrote for the purpose. The print of a looks like expected: [[ 200. 200. 200. ...,0.0.0.] [ 200. 200. 200. ...,0.0.0.] [ 200. 200. 200. ...,0.0.0.] ..., [ 0.0.0. ..., 200. 200. 200.] [ 0.0.0. ..., 200. 200. 200.] [ 0.0.0. ..., 200. 200. 200.]] But the image looks nothing like that. Please see the images on: http://hvidberg.net/Martin/temp/quat_col.png http://hvidberg.net/Martin/temp/quat_bw.png or run the code to see them locally. Please – what do I do wrong in the PIL part ??? :-? Martin import numpy as np from PIL import Image from PIL import ImageOps maxcol = 100 maxrow = 100 a = np.zeros((maxcol,maxrow),float) for i in range(maxcol): for j in range(maxrow): if (i<(maxcol/2) and j<(maxrow/2)) or (i>=(maxcol/2) and j>= (maxrow/2)): a[i,j] = 200 else: a[i,j] = 0 print a pilImage = Image.fromarray(a,'RGB') pilImage.save('quat_col.png') pilImage = ImageOps.grayscale(pilImage) pilImage.save('quat_bw.png') The PIL seems to copy the array contents directly from memory without any conversions or sanity check. In your example The float values determine the gray value of 8 consecutive pixels. If you want a[i,j] to become the color of the pixel (i, j) you have to use an array with a memory layout that is compatible to the Image. Here are a few examples: import numpy from PIL import Image a = numpy.zeros((100, 100), numpy.uint8) a[:50, :50] = a[50:, 50:] = 255 Image.fromarray(a).save("tmp1.png") b = numpy.zeros((100, 100, 3), numpy.uint8) b[:50, :50, :] = b[50:, 50:, :] = [255, 0, 0] Image.fromarray(b).save("tmp2.png") c = numpy.zeros((100, 100), numpy.uint32) c[:50, :50] = c[50:, 50:] = 0xff808000 Image.fromarray(c, "RGBA").save("tmp3.png") Peter Thanks All - That helped a lot... The working code ended with: imga = np.zeros((imgL.shape[1],imgL.shape[0]),np.uint8) for ro in range(imgL.shape[1]): for co in range(imgL.shape[0]): imga[ro,co] = imgL[ro,co] Image.fromarray(imga).save('_a'+str(lev)+'.png') Without knowing how big your image is (can't remember if you said!). Perhaps rather than looping in the way you might in C for example, the numpy where might be quicker if you have a big image. Just a thought... And a good thought too... I think what Martin is telling you is: Look to numpy to continue working on the array first. byte_store = imgL.astype(np.uint8) Image.fromarray(byte_store).save('_a%s.png' % lev) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: 'Once' properties.
Scott David Daniels wrote: ... Look into metaclasses: ... class Initialized(ClassBase): @classmethod def _init_class(class_): class_.a, class_.b = 1, 2 super(Initialized, class_)._init_class() Mea culpa: Here super is _not_ a good idea, and I had tried that and recoded, but cut and pasted the wrong code. I just noticed that I had done so this morning. class Initialized(ClassBase): @classmethod def _init_class(class_): class_.a, class_.b = 1, 2 ClassBase._init_class() print Initialized.a, Initialized.b Much better. There is probably a way to get to the MRO, but for now, this should do. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: 'Once' properties.
menomnon wrote: Does python have a ‘once’ (per class) feature? ‘Once’, as I’ve know it is in Eiffel. May be in Java don’t. The first time you instantiate a given class into an object it constructs, say, a dictionary containing static information. In my case static is information that may change once a week at the most and there’s no need to be refreshing this data during a single running of the program (currently maybe 30 minutes). So you instantiate the same class into a second object, but instead of going to the databases again and recreating the same dictionary a second time, you get a pointer or reference to the one already created in the first object – copies into the second object that is. And the dictionary, no matter how many instances of the object you make, is always the same one from the first object. So, as we put it, once per class and not object. Saves on both time and space. Look into metaclasses: class MyType(type): def __new__(class_, name, bases, dct): result = type.__new__(class_, name, bases, dct) result._init_class() return result class ClassBase(object): __metaclass__ = MyType @classmethod def _init_class(class_): print 'initializing class' class Initialized(ClassBase): @classmethod def _init_class(class_): class_.a, class_.b = 1, 2 super(Initialized, class_)._init_class() print Initialized.a, Initialized.b --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Q: sort's key and cmp parameters
Paul Rubin wrote: I still have never understood why cmp was removed. Sure, key is more convenient a lot (or maybe most) of the time, but it's not always. Not just more convenient. cmp will always be N log N, in that _every_ comparison runs your function, while key is linear, in that it is run once per element. Most cases are moreeasily done with key, and it is a good idea to make the most accessible way to a sort be the most efficient one. In the rare case that you really want each comparison, the cmp-injection function will do nicely (and can be written as a recipe. In short, make the easy path the fast path, and more will use it; provide two ways, and the first that springs to mind is the one used. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Restarting IDLE without closing it
candide wrote: Hi I was wondering if there exists somme way to clear memory of all objects created during a current IDLE session (with the same effect as if one starts an IDLE session). Thanks. Different than "Shell / Restart Shell (Ctrl+F6)" ? Of course this doesn't work if you started Idle ith the "-n" switch. -- http://mail.python.org/mailman/listinfo/python-list
Re: Idiom for "last word in a string"
Grant Edwards wrote: I recently ran across this construct for grabbing the last (whitespace delimited) word in a string: s.rsplit(None,1)[1] ... I've always done this: s.split()[-1] I was wondering what the advantage of the rsplit(None,1)[1] approach would be ... Others have pointed out the efficiency reason (asking the machine to do a pile of work that you intend to throw away). But nobody warned you: s.rsplit(None, 1)[-1] would be better in the case of 'single_word'.rsplit(None, 1) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Want to call a method only once for unittest.TestCase--but not sure how?
Oltmans wrote: ... All of our unit tests are written using built-in 'unittest' module. We've a requirement where we want to run a method only once for our unit tests > So I'm completely stumped as to how to create a method that will only > be called only once for Calculator class. Can you please suggest any > ideas? Any help will be highly appreciated. Thanks in advance. Just inherit your classes from something like (untested): class FunkyTestCase(unittest.TestCase): needs_initial = True def initialize(self): self.__class__.needs_initial = False def setUp(self): if self.needs_initial: self.initialize() And write your test classes like: class Bump(FunkyTestCase): def initialize(self): super(Bump, self).initialize() print 'One time Action' ... --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Detecting changes to a dict
Steven D'Aprano wrote: I'm pretty sure the answer to this is No, but I thought I'd ask just in case... Is there a fast way to see that a dict has been modified? ... Of course I can subclass dict to do this, but if there's an existing way, that would be better. def mutating(method): def replacement(self, *args, **kwargs): try: return method(self, *args, **kwargs) finally: self.serial += 1 replacement.__name__ = method.__name__ return replacement class SerializedDictionary(dict): def __init__(self, *arg, **kwargs): self.serial = 0 super(SerializedDictionary).__init__(self, *arg, **kwargs) __setitem__ = mutating(dict.__setitem__) __delitem__ = mutating(dict.__delitem__) clear = mutating(dict.clear) pop = mutating(dict.pop) popitem = mutating(dict.popitem) setdefault = mutating(dict.setdefault) update = mutating(dict.update) d = SerializedDictionary(whatever) Then just use dict.serial to see if there has been a change. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: string interpolation mystery in Python 2.6
Alan G Isaac wrote: George Brandl explained it to me this way: It's probably best explained with a bit of code: >>> class C(object): ... def __str__(self): return '[str]' ... def __unicode__(self): return '[unicode]' ... >>> "%s %s" % ('foo', C()) 'foo [str]' >>> "%s %s" % (u'foo', C()) u'foo [unicode]' I.e., as soon as a Unicode element is interpolated into a string, further interpolations automatically request Unicode via __unicode__, if it exists. Even more fun (until you know what is going on): >>> c = C() >>> "%s %s %s" % (c, u'c', c) u'[str] c [unicode]' --Scott David Daniels Scott David dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: numpy NaN, not surviving pickle/unpickle?
Steven D'Aprano wrote: On Sun, 13 Sep 2009 17:58:14 -0500, Robert Kern wrote: Exactly -- there are 2**53 distinct floats on most IEEE systems, the vast majority of which might as well be "random". What's the point of caching numbers like 2.5209481723210079? Chances are it will never come up again in a calculation. You are missing a few orders of magnitude here; there are approx. 2 ** 64 distinct floats. 2 ** 53 is the mantissa of regular floats. There are 2**52 floats X where 1.0 <= X < 2.0. The number of "normal" floats is 2 ** 64 - 2 ** 52 + 1. The number including denormals and -0.0 is 2 ** 64 - 2 ** 53. There are approx. 2 ** 53 NaNs (half with the sign bit on). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple inheritance - How to call method_x in InheritedBaseB from method_x in InheritedBaseA?
The Music Guy wrote: ... def main(): ... class MyMixin(object): This is a mistake. If Mixins inherit from CommonBase as well, no order of class definition can catch you out. If it doesn't, you can get yourself in trouble. def method_x(self, a, b, c): super(MyMixin, self).method_x(a, b, c) print "MyMixin.method_x(%s, %s, %s, %s)" % (repr(self), repr(a), repr(b), repr(c)) class CommonBase(object): def method_x(self, a, b, c): print "CommonBase.method_x(%s, %s, %s, %s)" % (repr(self), repr(a), repr(b), repr(c)) class BaseA(CommonBase): ... Redoing this example for small prints: def main(): for n, class_ in enumerate( (BaseA, BaseB, BaseC, FooV, FooW, FooX, FooY, FooZ, BarW, BarX, BarY, BarZ)): instance = class_() instance.method_x(n, n * '-', hex(n*13)) print class CommonBase(object): def method_x(self, a, b, c): # really, %r is the way to go. print "CommonBase.method_x(%r, %r, %r, %r)" % (self, a, b, c) def __repr__(self): # Just so we have a more compact repr return '%s.%s' % (self.__class__.__name__, id(self)) class Mixin(CommonBase): def method_x(self, a, b, c): super(Mixin, self).method_x(a, b, c) print "Mixin", class MyMixin(CommonBase): def method_x(self, a, b, c): super(MyMixin, self).method_x(a, b, c) print "MyMixin", class BaseA(CommonBase): def method_x(self, a, b, c): super(BaseA, self).method_x(a, b, c) print "BaseA", class BaseB(CommonBase): def method_x(self, a, b, c): super(BaseB, self).method_x(a, b, c) print "BaseB", class BaseC(CommonBase): pass class FooV(Mixin, BaseA): def method_x(self, a, b, c): super(FooV, self).method_x(a, b, c) print "FooV", class FooW(Mixin, MyMixin, BaseA): def method_x(self, a, b, c): super(FooW, self).method_x(a, b, c) print "FooW", class FooX(MyMixin, BaseA): def method_x(self, a, b, c): super(FooX, self).method_x(a, b, c) print "FooX", class FooY(MyMixin, BaseB): pass class FooZ(MyMixin, BaseC): def method_x(self, a, b, c): super(FooZ, self).method_x(a, b, c) print "FooZ", class BarW(Mixin, BaseA, MyMixin): def method_x(self, a, b, c): super(BarW, self).method_x(a, b, c) print "BarW", class BarX(BaseA, MyMixin): def method_x(self, a, b, c): super(BarX, self).method_x(a, b, c) print "BarX", class BarY(BaseB, MyMixin): def method_x(self, a, b, c): super(BarY, self).method_x(a, b, c) print "BarY", class BarZ(BaseB, Mixin): def method_x(self, a, b, c): super(BarZ, self).method_x(a, b, c) print "BarZ", >>> main() # prints CommonBase.method_x(BaseA.18591280, 0, '', '0x0') BaseA ... CommonBase.method_x(FooZ.18478384, 7, '---', '0x5b') MyMixin FooZ CommonBase.method_x(BarW.18480592, 8, '', '0x68') MyMixin BaseA Mixin BarW ... If you make of Mixin and MyMixin inherit from object you get: CommonBase.method_x(BaseA.18613328, 0, '', '0x0') BaseA ... CommonBase.method_x(FooZ.18480592, 7, '---', '0x5b') MyMixin FooZ CommonBase.method_x(BarW.18591280, 8, '', '0x68') BaseA Mixin BarW ... Note that in the BarW case (with object), not all mixins are called. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: [Tkinter] messed callbacks
Giacomo Boffi wrote: Giacomo Boffi writes: ... | def create_cb(a,b): | return lambda: output(a+'->'+b) | | def doit(fr,lst): | for c1,c2 in zip(lst[::2], lst[1::2]): | subframe=Frame(fr) | Label(subframe,text=c1+' <-> '+c2).pack(side='left',expand=1,fill='both') | Button(subframe,text='>',command=create_cb(c1,c2)).pack() | Button(subframe,text='<',command=create_cb(c2,c1)).pack() | subframe.pack(fill='x',expand=1) ... works ok, now i have to fully understand my previous error This is really why functools.partial exists. Now that you know what was going wrong, you can understand its value. You can accomplish the same thing as above with: from functools import partial ... def doit(fr,lst): for c1, c2 in zip(lst[::2], lst[1::2]): subframe = Frame(fr) Label(subframe, text=c1 + ' <-> ' + c2 ).pack(side='left', expand=1, fill='both') Button(subframe, text='>', command=partial(output, c1 + '->' + c2)).pack() Button(subframe, text='<', command=partial(output, c2 + '->' + c1)).pack() subframe.pack(fill='x', expand=1) ... Also note from Pep 8, spaces are cheap and make the code easier to read. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: simple string question
D'Arcy J.M. Cain wrote: On Mon, 7 Sep 2009 15:29:23 +1000 "jwither" wrote: Given a string (read from a file) which contains raw escape sequences, (specifically, slash n), what is the best way to convert that to a parsed string, where the escape sequence has been replaced (specifically, by a NEWLINE token)? I don't know what your actual requirement is but maybe this fits: exec("print '%s'" % x) Lots of fun when preceded by: x = "'; sys.exit(); print 'b" or far nastier things. Exec is the same level of dangerous as eval. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple inheritance - How to call method_x in InheritedBaseB from method_x in InheritedBaseA?
The Music Guy wrote: I have a peculiar problem that involves multiple inheritance and method calling. I have a bunch of classes, one of which is called MyMixin and doesn't inherit from anything. MyMixin expects that it will be inherited along with one of several other classes that each define certain functionality. ... ... This all appears fine at first, but ... One might be tempted to amend MyMixin's method_x so that it calls the parent's method_x before doing anything else: class MyMixin(object): def method_x(self, a, b, c): super(MyMixin, self).method_x(a, b, c) ... ...but of course, that will fail with an AttributeError because MyMixin's only superclass is object, which does not have a method_x. Here the fix below works. The only way I can think to solve the problem would be to implement a method_x for each Foo that calls the method_x for each of the bases: ... So, does anyone have an idea about how to remedy this, or at least work around it? The diamond inheritance stuff is meant to allow you to deal with exactly this issue. If you define a class, MixinBase, with do- nothing entries for all the methods you are inventing, and you make all of your Mixin classes (and your main class) inherit from MixinBase, you are guaranteed that all of the Mixins you use will be earlier on the method resolution order (mro in the docs) than MixinBase. If the set of actual methods is small and pervasive, I might even be tempted rename MixinBase to "Object": >>> if 1: class MixinBase(object): '''Base for solving mixin strategy. Also a nice common place to describe the args and meaning. ''' def method_x(self, a, b, c): '''Suitable docstring''' print 'MixinBase' class MyMixin(MixinBase): def method_x(self, a, b, c): super(MyMixin, self).method_x(a, b, c) print 'Mixin' class BaseA(MixinBase): def method_x(self, a, b, c): super(BaseA, self).method_x(a, b, c) print 'BaseA' class BaseB(MixinBase): pass class BaseC(MixinBase): def method_x(self, a, b, c): super(BaseC, self).method_x(a, b, c) print 'BaseC' class FooX(MyMixin, BaseA): def method_x(self, a, b, c): super(FooX, self).method_x(a, b, c) print 'FooX' class FooY(MyMixin, BaseB): pass class FooZ(MyMixin, BaseC): def method_x(self, a, b, c): super(FooZ, self).method_x(a, b, c) print 'FooZ' >>> FooZ().method_x(1,2,3) MixinBase BaseC Mixin FooZ >>> FooY().method_x(1,2,3) MixinBase Mixin >>> FooX().method_x(1,2,3) MixinBase BaseA Mixin FooX >>> BaseA().method_x(1,2,3) MixinBase BaseA >>> --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Usage of main()
Carl Banks wrote: On Sep 3, 11:39 pm, Simon Brunning wrote: 2009/9/4 Manuel Graune : How come the main()-idiom is not "the standard way" of writing a python-program (like e.g. in C)? Speaking for myself, it *is* the standard way to structure a script. I find it more readable, since I can put my main function at the very top where it's visible, with the classes and functions it makes use of following in some logical sequence. I suspect that this is the case for many real-world scripts. Perhaps it's mainly in books and demos where the extra stuff is left out so the reader can focus on what the writer is demonstrating? Speaking for myself, I almost never put any logic at the top level in anything other than tiny throwaway scripts. Top level is for importing, and defining functions, classes, and constants, and that's it. Even when doing things like preprocessing I'll define a function and call it rather than putting the logic at top-level. Sometimes I'll throw in an if-test at top level (for the kind of stuff I might choose an #if preprocessor statement in C for) but mostly I just put that in functions. If you structure your programs this way, you can get another speedup for frequently used programs. Create a little program consisting of: import actual_program actual_program.main() Your larger program will only be compiled once, and the dinky one compiles quickly. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: possible attribute-oriented class
Ken Newton wrote: ... I would appreciate comments on this code. First, is something like this already done? Second, are there reasons for not doing this? ... class AttrClass(object): ... def __repr__(self): return "%s(%s)" % (self.__class__.__name__, self.__dict__.__repr__()) def __str__(self): ll = ['{'] for k,v in self.__dict__.iteritems(): ll.append("%s : %s" % (k, str(v))) return '\n'.join(ll) + '}' Yes, I've done stuff something like this (I use setattr / getattr rather than direct access to the __dict__). You'd do better to sort the keys before outputting them, so that you don't confuse the user by printing two similarly built parts in different orders. Personally, I'd filter the outputs to avoid names beginning with '_', as they may contribute to clutter without adding much information. An equality operator would be nice as well (don't bother with ordering though, you get lost in a twisty maze of definitions all different). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: The future of Python immutability
John Nagle wrote: ... Suppose, for discussion purposes, we had general "immutable objects". Objects inherited from "immutableobject" instead of "object" would be unchangeable once "__init__" had returned. Where does this take us? Traditionally in Python we make that, "once __new__ had returned." --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: using queue
Tim Arnold wrote: "MRAB" wrote in message news:mailman.835.1251886213.2854.python-l...@python.org... I don't need that many threads; just create a few to do the work and let each do multiple chapters, something like this: a very pretty implementation with worker code: while True: chapter = self.chapter_queue.get() if chapter is None: # A None indicates that there are no more chapters. break chapter.compile() # Put back the None so that the next thread will also see it. self.chapter_queue.put(None) and loading like: for c in self.document.chapter_objects: chapter_queue.put() chapter_queue.put(None) ... # The threads will finish when they see the None in the queue. for t in thread_list: t.join() hi, thanks for that code. It took me a bit to understand what's going on, but I think I see it now. Still, I have two questions about it: (1) what's wrong with having each chapter in a separate thread? Too much going on for a single processor? Many more threads than cores and you spend a lot of your CPU switching tasks. (2) The None at the end of the queue...I thought t.join() would just work. Why do we need None? Because your workers aren't finished, they are running trying to get something more to do out of the queue. The t.join() would cause a deadlock w/o the None. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Putting together a larger matrix from smaller matrices
Matjaz Bezovnik wrote: If you are using numpy (which it sounds like you are): IDLE 2.6.2 >>> import numpy as np >>> v = np.array([[0,1,2],[3,4,5],[6,7,8]], dtype=float) >>> v array([[ 0., 1., 2.], [ 3., 4., 5.], [ 6., 7., 8.]]) >>> w = np.array([[10,11,12],[13,14,15],[16,17,18]], dtype=float) >>> w array([[ 10., 11., 12.], [ 13., 14., 15.], [ 16., 17., 18.]]) >>> r = np.zeros((6,6)) >>> r array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.]]) >>> r[:3,:3] = v >>> r array([[ 0., 1., 2., 0., 0., 0.], [ 3., 4., 5., 0., 0., 0.], [ 6., 7., 8., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0.]]) >>> r[3:,3:] = w >>> r array([[ 0., 1., 2., 0., 0., 0.], [ 3., 4., 5., 0., 0., 0.], [ 6., 7., 8., 0., 0., 0.], [ 0., 0., 0., 10., 11., 12.], [ 0., 0., 0., 13., 14., 15.], [ 0., 0., 0., 16., 17., 18.]]) >>> In general, make the right-sized array of zeros, and at various points: and you can ssign to subranges of the result array: N = 3 result = np.zeros((len(parts) * N, len(parts) * N), dtype=float) for n, chunk in enumerate(parts): base = n * 3 result[base : base + 3, base : base + 3] = chunk --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Numeric literals in other than base 10 - was Annoying octal notation
Piet van Oostrum wrote: Scott David Daniels (SDD) wrote: SDD> James Harris wrote:... Another option: 0.(2:1011), 0.(8:7621), 0.(16:c26b) where the three characters "0.(" begin the sequence. Comments? Improvements? SDD> I did a little interpreter where non-base 10 numbers SDD> (up to base 36) were: SDD> .7.100 == 64 (octal) SDD> .9.100 == 100 (decimal) SDD> .F.100 == 256 (hexadecimal) SDD> .1.100 == 4 (binary) SDD> .3.100 == 9 (trinary) SDD> .Z.100 == 46656 (base 36) I wonder how you wrote that interpreter, given that some answers are wrong. Obviously I started with a different set of examples and edited after starting to make a table that could be interpretted in each base. After doing that, I forgot to double check, and lo and behold .F.1000 = 46656, while .F.100 = 1296. Since it has been decades since I've had access to that interpreter, this is all from memory. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Numeric literals in other than base 10 - was Annoying octal notation
James Harris wrote:... Another option: 0.(2:1011), 0.(8:7621), 0.(16:c26b) where the three characters "0.(" begin the sequence. Comments? Improvements? I did a little interpreter where non-base 10 numbers (up to base 36) were: .7.100 == 64 (octal) .9.100 == 100 (decimal) .F.100 == 256 (hexadecimal) .1.100 == 4 (binary) .3.100 == 9 (trinary) .Z.100 == 46656 (base 36) Advantages: Tokenizer can recognize chunks easily. Not visually too confusing, No issue of what base the base indicator is expressed in. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: generate keyboard/mouse event under windows
Ray wrote: On Aug 19, 2:07 pm, yaka wrote: Read this and see if it helps: http://kvance.livejournal.com/985732.html is there a way to generate a 'true' keyboard event? (works like user pressed a key on keyboard) not send the 'send keyboard event to application' ? If there is such a spot, it is a major security weakness. You'd be able to automate password attacks. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: debugger
flagmino wrote: To get familiar with the debugger, I have loaded this program: import math def s1(x, y): a = (x + y) print("Answer from s1"), a return def s2(x, y): b = (x - y) print("This comes from s2"), b #print z print("call from s2: "), s1(x, y) return I am trying to debug: I press shift-F9 and F7. I end up in the interpreter where I enter s2 (1, 2). From that point if I press F7, the program restart all over. If I press Enter, the program gets out of debug mode. Please help me figuring out how I can use the dbugger. You are welcome to send a sound file if this is easier for you. Thanks ray You need to tell us: Which Python version (e.g. 2.6.2) Which "platform" (hardware & OS) (e.g. 64-bit AMD FreeBSD) Which debugger (e.g. Idle) What you expected to happen that did not, and why you expected it. or What happened and why you did not expect it. Often you can lots of this information by going to your debugger window and doing Help // About, and go to your Python environment and type: import sys print sys.version # cut the results and paste in your message as "sys.version says, "'2.6.2 (r262:71605, ...'" [don't do dots yourself] To understand more of why we need this on every question, see: http://www.mikeash.com/getting_answers.html or google for "smart questions". --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: flatten a list of list
Steven D'Aprano wrote: On Sun, 16 Aug 2009 02:47:42 -0700, Terry wrote: Is there a simple way (the pythonic way) to flatten a list of list? Chris' suggestion using itertools seems pretty good: from timeit import Timer setup = """\\ ... L = [ [None]*5000 for _ in xrange(%d) ] ... from itertools import chain ... """ Timer("list(chain.from_iterable(L))", setup % 4).repeat(number=1000) [0.61839914321899414, 0.61799716949462891, 0.62065696716308594] Timer("list(chain.from_iterable(L))", setup % 8).repeat(number=1000) [1.2618398666381836, 1.3385050296783447, 3.9113419055938721] Timer("list(chain.from_iterable(L))", setup % 16).repeat(number=1000) [3.1349358558654785, 4.8554730415344238, 5.431217987061] OK, it definitely helps to get a size estimate before building: >>> setup = """\\ L = [ [None]*5000 for _ in xrange(%d) ] import itertools class Holder(object): def __init__(self, list_of_lists): self._list = list_of_lists def __iter__(self): return itertools.chain.from_iterable(self._list) def __len__(self): return sum(len(x) for x in self._list) """ >>> timeit.Timer("list(Holder(L))", setup % 4).repeat(number=1000) [0.59912279353940789, 0.59505886921382967, 0.59474989139681611] >>> timeit.Timer("list(Holder(L))", setup % 8).repeat(number=1000) [1.1898235669617208, 1.194797383466323, 1.1945367358141823] >>> timeit.Timer("list(Holder(L))", setup % 16).repeat(number=1000) [2.4244464031043123, 2.4261885239604482, 2.4050011942858589] vs straight chain.from_iterable (on my machine): [0.7828263089303249, 0.79326171343005925, 0.80967664884783019] [1.499510971366476, 1.5263249938190455, 1.5599706107899181] [3.4427520816193109, 3.632409426337702, 3.5290488036887382] --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: unittest
Mag Gam wrote: I am writing an application which has many command line arguments. For example: foo.py -args "bar bee" I would like to create a test suit using unittest so when I add features to "foo.py" I don't want to break other things. I just heard about unittest and would love to use it for this type of thing. so my question is, when I do these tests do I have to code them into foo.py? I prefer having a footest.py which will run the regression tests. Any thoughts about this? TIA I avoid putting the tests in foo.py, simply because the bulk of my tests would make the code harder to read. So, no, unittest does not require that you code things into foo.py. You will find that you may bend your coding style within foo.py in order to make it more testable, but (if you do it right) that should also make the code clearer. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: callable virtual method
Jean-Michel Pichavant wrote: Steven D'Aprano wrote: On Fri, 14 Aug 2009 18:49:26 +0200, Jean-Michel Pichavant wrote: Sorry guys (means guys *and* gals :op ), I realized I've not been able to describe precisely what I want to do. I'd like the base class to be virtual (aka abstract). However it may be abstract but it does not mean it cannot do some usefull stuff. Here is the schema of my abstract methods : class Interface(object): def method(self): # - # some common stuff executed here # - print 'hello world' # - # here shall stand child specific stuff (empty in the interface method) # - if self.__class__.method == Interface.method: raise NotImplementedError('You should have read the f** manual ! You must override this method.') Okay, so I want to sub-class your Interface class. As you said, the methods in the abstract class are still useful, so in my class, I don't need any extra functionality for some methods -- I'm happy with just the "common stuff". So I use normal OO techniques and over-ride just the methods I need to over-ride: Sometimes the base is doing cool stuff but incomplete stuff which requires knowledge only hold by the sub class. In my case the interface is a high level interface for a software that can run on multiple hardware platforms. Only the sub class has knowledge on how to operate the hardware, but no matter the hardware it still produces the same effect. Let's say I have 50 different hardwares, I'll have 50 sub classes of Interface with the 'start' method to define. It wouldn't be appropriate (OO programming)to write 50 times '_log.debug('Starting %s' % self)' in each child start method when the simple task of logging the call can be nicely handled by the base class. In the meantime, I must make sure the user, who is not a python guru in this case, has implemented the start method for his hardware, because only him knows how to effectively start this hardware. I don't want him to come to me saying, "I got no error, still my hardware does not start". You can then blame him for not reading the docs, but it will still be less expensive to throw a nice exception with an accurate feedback. [snip] class VerboseGoodChild(Interface): # forced to over-ride methods for no good reason Definitely no !! This is the purpose of an interface class: to force people to write these methods. They *are* required, if they were not, they would not belong to the Interface. JM But there _is_ one moment when you can check those things, then avoid checking thereafter: object creation. So you can complicate your __init__ (or __new__) with those checks that make sure you instantiate only fully defined subclasses: # obviously not tested except in concept: class Base(object_or_whatever): def __init__(self, ...): class_ = self.__class__ if class_ is Base: raise TypeError('Attempt to instantiate Base class') for name in 'one two three four': if getattr(Base, name) is not getattr(Base, name): raise NotImplementedError( '%s implementation missing' % name) ... --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Format Code Repeat Counts?
MRAB wrote: Scott David Daniels wrote: MRAB wrote: The shortest I can come up with is: "[" + "][".join(letters) + "]" Maybe a golf shot: "][".join(letters).join("[]") Even shorter: "["+"][".join(letters)+"]" :-) I was going by PEP8 rules. ;-) --Scott David Daniels Scott David dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: i Don't get why it makes trouble
azrael wrote: ... A lot of people a not aware of SQL injection. My friend from college asked me and a couple of other guys for Pen testing of an website. His SQL injection mistake made him an epic fail. And some people are unaware of the unofficial official Python citation for SQL injection explanations: http://xkcd.com/327/ --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Format Code Repeat Counts?
MRAB wrote: The shortest I can come up with is: "[" + "][".join(letters) + "]" Maybe a golf shot: "][".join(letters).join("[]") --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Programming by Contract
Charles Yeomans wrote: On Aug 11, 2009, at 3:30 PM, Ethan Furman wrote: Ethan Furman wrote: Greetings! I have seen posts about the assert statement and PbC (or maybe it was DbC), and I just took a very brief look at pycontract (http://www.wayforward.net/pycontract/) and now I have at least one question: Is this basically another way of thinking about unit testing, or is the idea of PbC more along the lines of *always* checking the input/output of functions to ensure they are correct? (*Contstant vigilance!* as Prof Moody would say ;) I know asserts can be turned off, so they obviously won't work for the latter case, and having seen the sample of pycontract it seems it only does its thing during debugging. So is Design (Programming) by Contract a fancy way of saying "Document your inputs/outputs!" or is there more to it? ~Ethan~ Hmmm... Well, from the (apparently) complete lack of interest, I shall take away the (better?) documentation ideas and unit testing ideas, and not worry about the rest. :) Design by contract is complementary to unit testing (I notice that the author of PEP 316 appears confused about this). DbC is, roughly speaking, about explicit allocation of responsibility. Consider this contrived example. def foo(s): require(s is not None) //code ensure(hasattr(returnValue, '__iter__')) yo might want two flags, REQUIRE_OFF, and ENSURE_ON that control testing, and change the code above to: require(REQUIRE_OFF or s is not None) //code ensure(ENSURE_OFF or hasattr(returnValue, '__iter__')) Python has no good way to turn off argument calculation by manipulating function definition (at least that I know of). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: better way?
Pet wrote: On 11 Aug., 22:19, "Rami Chowdhury" wrote: Ah, my apologies, I must have been getting it confused with ON UPDATE [things]. Thanks for correcting me. On Tue, 11 Aug 2009 13:10:03 -0700, Matthew Woodcraft wrote: "Rami Chowdhury" writes: IIRC Postgres has had ON DUPLICATE KEY UPDATE functionality longer than MySQL... PostgreSQL does not have ON DUPLICATE KEY UPDATE. The SQL standard way to do what the OP wants is MERGE. PostgreSQL doesn't have that either. So, I'm doing it in right way? What about building columns? map(lambda s: s + ' = %s', fields) Is that o.k.? Isn't t = [field + ' = %s' for field in fields] clearer than t = map(lambda s: s + ' = %s', fields) ? your call of course. I don't quite understand why you are building the SQL from data but constructing the arguments in source. I'd actually set the SQL up directly as a string, making both the SQL and Python more readable. To the original question, you could unconditionally perform a queries vaguely like: UPDATE_SQL = '''UPDATE table ... WHERE id = %s AND location = %s;''' INSERT_SQL = '''INSERT INTO table(... WHERE NOT EXISTS(SELECT * FROM table WHERE id = %s AND location = %s;);''' I'd put the NOW() and constant args (like the 1) in the SQL itself. then your code might become: row = (self.wl, name, location, id) self._execQuery(db, UPDATE_SQL, [row]) self._execQuery(db, INSERT_SQL, [row + (location, id)]) if _execQuery is like the standard Python DB interfaces. Having the SQL do the checking means you allows the DB to check its index and use that result to control the operation, simplifying the Python code without significantly affecting the the DB work needed. The "SELECT *" form in the EXIST test is something DB optimizers look for, so don't fret about wasted data movement. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Unrecognized escape sequences in string literals
Douglas Alan wrote: So, what's the one obvious right way to express "foo\zbar"? Is it "foo\zbar" or "foo\\zbar" And if it's the latter, what possible benefit is there in allowing the former? And if it's the former, why does Python echo the latter? Actually, if we were designing from fresh (with no C behind us), I might advocate for "\s" to be the escape sequence for a backslash. I don't particularly like that it is hard to see if the following string contains a tab: "abc\table". The string rules reflect C's rules, and I see little excuse for trying to change them now. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Why all the __double_underscored_vars__?
kj wrote: ... I find it quite difficult to explain to my students (who are complete newcomers to programming) all the __underscored__ stuff that even rank noobs like them have to deal with. =C2=A0(Trust me, to most of them your reply to my post would be as clear as mud.) Believe me, it's not me who's bringing this stuff up: *they* specifically ask. That's precisely my point: it is *they* who somehow feel they can't avoid finding out about this stuff; they must run into such __arcana__ often enough to cause them to wonder. If at least some rank beginners (i.e. some of my students) feel this way, I suggest that some of this alleged __arcana__ should be demoted to a more mundane everyday status, without the scare-underscores. E.g. maybe there should be a built-in is_main(), or some such, so that beginners don't have to venture into the dark underworld of __name__ and "__main__". Do you know about Kirby Urner's technique of calling such symbols, "ribs," -- the access to the "stuff" Python is built from? One nice thing about Python is that you can experiment with what these "__ribs__" do without having to learn yet another language. It seems nice to me that you can use a rule that says, "stick to normal names and you don't have to worry about mucking with the way Python itself works, but if you are curious, looks for those things and fiddle with them." --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Bug or feature: double strings as one
Grant Edwards wrote: On 2009-08-07, durumdara wrote: In other languages, like Delphi (Pascal), Javascript, SQL, etc., I must concatenate the strings with some sign, like "+" or "||". In other languages like Ruby, awk, C, C++, etc. adjacent string constants are concatenated. I must learn this "etc." language, I hear it mentioned all the time :-) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: how to overload operator "< <" (a < x < b)?
Benjamin Kaplan wrote: Python does not support compound comparisons like that. You have to do "a > b and b > c". Funny, my python does. This has been around a long time. I am not certain whether 1.5.2 did it, but "chained comparisons" have been around for a long time. >>> 'a'< 'd' <'z' True >>> 'a'< 'D' <'z' False --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: M2Crypto: How to generate subjectKeyIdentifier / authorityKeyIdentifier
Matthias Güntert wrote: M2Crypto has a couple of bugs open related that, with potential workarounds that I haven't yet deemed polished enough to checkin, but which might help you out: https://bugzilla.osafoundation.org/show_bug.cgi?id=7530 https://bugzilla.osafoundation.org/show_bug.cgi?id=12151 ... Generating the 'subjectKeyIdentifier': > ... def get_public_key_fingerprint(self): h = hashlib.new('sha1') h.update(self.keypair.as_der()) client_serial = h.hexdigest().upper() client_serial_hex = '' for byte in xrange(20): client_serial_hex += client_serial[byte*2] + client_serial[byte*2 +1] if byte < 19: client_serial_hex += ':' return client_serial_hex ... More tersely (code golf?): def get_public_key_fingerprint(self): digest = hashlib.sha1(self.keypair.as_der()).hexdigest().upper() return ':'.join(digest[pos : pos+2] for pos in range(0, 40, 2)) --Scott David Daniels scott.dani...@acm.org --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem in installing PyGreSQL
Dennis Lee Bieber wrote: On Thu, 6 Aug 2009 16:00:15 +0530, "Thangappan.M" declaimed the following in gmane.comp.python.general: File "./setup.py", line 219, in finalize_options except (Warning, w): NameError: global name 'w' is not defined What would be the solution? Otherwise can you tell how to install DB-API in debian machine. Sorry... 1) I run on WinXP; 2) I don't build packages, relying on pre-built binaries; 3) I run MySQL. However, based upon the examples in the Tutorial, that line should not have the (, ). A parenthesised (tuple) is suppose to contain a list of exceptions, and the parameter to catch the exception specifics has to be outside the list. Best I can suggest is editing that particular line and removing the (, ) -- then try rebuilding. I'll also re-ask: All you are installing is the Python adapter to the database. DO YOU HAVE A RUNNING PostgreSQL server that you can connect to? Just to be a bit more explict: Change file setup.py's line 219 from: >> except (Warning, w): to either (OK in Python 2.6 and greater): except Warning as w: or (works for Python 2.X): except Warning, w: --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble with complex numbers
alex23 wrote: Piet van Oostrum wrote: That should be z += 0j Pardon my ignorance, but could anyone explain the rationale behind using 'j' to indicate the imaginary number (as opposed to the more intuitive 'i')? (Not that I've had much call to use complex numbers but I'm curious) I think it explained in the complex math area, but basically EE types use j, math types use i for exactly the same thing. Since i is so frequently and index in CS, and there is another strong convention, why not let the EE types win? --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird wrote: Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 I've tried various methods, which all fail. Does anyone have an idea how to do this? Thank you very much! Jay I once had to do this for finding nested block structure. The key for me was a sort order: start, -final. Having not seen it here (though I looked a bit), here's one: class Entry(object): '''An entry is a name and range''' def __init__(self, line): self.name, startstop = line.split() start, stop = startstop.split('-') self.start, self.stop = int(start), int(stop) def combined_ranges(lines): '''Create Entries in "magic order", and produce ranges. The "magic order" makes least element with longest range first, so overlaps show up in head order, with final tail first among equals. ''' # Fill in our table (ignoring blank lines), then sort by magic order elements = [Entry(line) for line in lines if line.strip()] elements.sort(key=lambda e: (e.start, -e.stop)) # Now produce resolved ranges. Grab the start gen = iter(elements) first = gen.next() # For the remainder, combine or produce for v in gen: if v.start <= first.stop: # on overlap, merge in new element (may update stop) first.name += '.' + v.name if first.stop < v.stop: first.stop = v.stop else: yield first first = v # And now produce the last element we covered yield first # Demo: sample = '''part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 ''' source = iter(sample.split('\n')) # source of lines, opened file? ignored = source.next() # discard heading for interval in combined_range(source): print '%s %s-%s' % (interval.name, interval.start, interval.stop) Prints: c.a.b 3-10 d.e 15-23 --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Seeding the rand() Generator
Fred Atkinson wrote: How does one seed the rand() generator when retrieving random recordings in MySQL? It is not entirely clear what you are asking. If you are talking about MySQL's random number generator, you are talking in the wrong newsgroup. If you are talking about Python's, does this work? import random random.seed(123542552) I'm not quite sure how you came to believe that Python controls MySQL, as opposed to using its services. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Predefined Variables
Piet van Oostrum wrote: Scott David Daniels (SDD) wrote: SDD> Stephen Cuppett (should have written in this order): "Fred Atkinson" wrote ... Is there a pre-defined variable that returns the GET line... os.environment('QUERY_STRING') SDD> Maybe you mean: SDD> os.environ['USER'] Let's take the best of both: os.environ['QUERY_STRING'] Sorry about that. I was testing expression before posting, and I don't do that much cgi stuff. I forgot to restore the variable name. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: fast video encoding
gregorth wrote: for a scientific application I need to save a video stream to disc for further post processing. My cam can deliver 8bit grayscale images with resolution 640x480 with a framerate up to 100Hz, this is a data rate of 30MB/s. Writing the data uncompressed to disc hits the data transfer limits of my current system and creates huge files. Therefore I would like to use video compression, preferably fast and high quality to lossless encoding. Final file size is not that important. Well, it sounds like it better be enough to affect bandwidth. I am a novice with video encoding. I found that few codecs support gray scale images. Any hints to take advantage of the fact that I only have gray scale images? You might try to see if there is a primitive .MNG encoder around. That could give you lossless with perhaps enough compression to make you happy, and I'm sure it will handle the grayscale. .MNG is pictures only, but that doesn't hurt you in the least. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Predefined Variables
Stephen Cuppett (should have written in this order): "Fred Atkinson" wrote ... Is there a pre-defined variable that returns the GET line (http://www.php.net/index.php?everythingafterthequestionmark) as a single variable (rather than individual variables)? > os.environment('QUERY_STRING') Maybe you mean: os.environ['USER'] --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: regex: multiple matching for one string
ru...@yahoo.com wrote: Nick Dumas wrote: On 7/23/2009 9:23 AM, Mark Lawrence wrote: scriptlear...@gmail.com wrote: For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I will like to take out the values (valuea, valueb, and valuec). How do I do that in Python? The group method will only return the matched part. Thanks. p = re.compile('#a=*;b=*;c=*;') m = p.match(line) if m: print m.group(), IMHO a regex for this is overkill, a combination of string methods such as split and find should suffice. You're saying that something like the following is better than the simple regex used by the OP? [untested] values = [] parts = line.split(';') if len(parts) != 4: raise SomeError() for p, expected in zip (parts[-1], ('#a','b','c')): name, x, value = p.partition ('=') if name != expected or x != '=': raise SomeError() values.append (value) print values[0], values[1], values[2] I call straw man: [tested] line = "#a=valuea;b=valueb;c=valuec;" d = dict(single.split('=', 1) for single in line.split(';') if single) d['#a'], d['b'], d['c'] If you want checking code, add: if len(d) != 3: raise ValueError('Too many keys: %s in %r)' % ( sorted(d), line)) Blech, not in my book. The regex checks the format of the string, extracts the values, and does so very clearly. Further, it is easily adapted to other similar formats, or evolutionary changes in format. It is also (once one is familiar with regexes -- a useful skill outside of Python too) easier to get right (at least in a simple case like this.) The posted regex doesn't work; this might be homework, so I'll not fix the two problems. The fact that you did not see the failure weakens your claim of "does so very clearly." --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue combining gzip and subprocess
Piet van Oostrum wrote: Scott David Daniels (SDD) schreef: ... SDD> Or even: SDD> proc = subprocess.Popen(['ls','-la'], stdout=subprocess.PIPE) SDD> with gzip.open(filename, 'w') as dest: SDD> for line in iter(proc.stdout, ''): SDD> f.write(line) If it would work. 1) with gzip... is not supported in Python < 3.1 2) for line in iter(proc.stdout), i.e. no second argument. 3) dest <==> f should be the same identifier. Lesson: if you post code either: - test it and copy verbatim from your test, or - write a disclaimer Totally chagrined and embarassed. Chastisement absorbed and acknowledged. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Issue combining gzip and subprocess
Piet van Oostrum wrote: ... f = gzip.open(filename, 'w') proc = subprocess.Popen(['ls','-la'], stdout=subprocess.PIPE) while True: line = proc.stdout.readline() if not line: break f.write(line) f.close() Or even: proc = subprocess.Popen(['ls','-la'], stdout=subprocess.PIPE) with gzip.open(filename, 'w') as dest: for line in iter(proc.stdout, ''): f.write(line) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple versions of python
CCW wrote: On 21 July, 15:19, Dave Angel wrote: The other thing you may want to do in a batch file is to change the file associations so that you can run the .py file directly, without typing "python" or "pythonw" in front of it. The relevant Windows commands are: assoc and ftype And on a related note, you may want to edit the PATHEXT environment variable, to add .PY and .PYW Thanks for this - this way made a bit more sense to me. I've now got C:\commands with the 4 .bat files in, and C:\commands in my path. It all seems to work :) I think I've missed the point of the @ though - it doesn't seem to make any difference.. I'm also a bit confused with the associations - since I've got python 2.6 and 3.1, surely the command I type (python26 or python31) is the only way to force a script to be run using a specific interpreter at runtime without having to change the .bat file every time I want to run a script using 3.1 instead of 2.6? OK, for me currently: C:\> assoc .py .py=Python.File C:\> assoc .pyw .pyw=Python.NoConFile C:\> ftype Python.File Python.File="C:\Python31\python.exe" "%1" %* C:\> ftype Python.NoConFile Python.NoConFile="C:\Python31\pythonw.exe" "%1" %* C:\> ftype Python.File Python.File="C:\Python31\python.exe" "%1" %* Now imagine instead that you've added: C:\> ftype Python31.File="C:\Python31\python.exe" "%1" %* C:\> ftype Python31.NoConFile="C:\Python31\pythonw.exe" "%1" %* C:\> ftype Python26.File="C:\Python26\python.exe" "%1" %* C:\> ftype Python26.NoConFile="C:\Python26\pythonw.exe" "%1" %* Then you can do the following: C:\> assoc .py=Python26.File C:\> fumble.py C:\> assoc .py=Python31.File C:\> fumble.py That is the basic idea, but at the moment, I don't see a simple demo working for me. SO, if you want to pursue this, you can probably get it to work. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Mechanize not recognized by py2exe
OrcaSoul wrote: ...it's too late to name my first born after you, but I did have a cat named Gabriel - she was a great cat! Remember Gabriel in English uses a hard A, as in the horn player, not Gabrielle. I know because when I first read his posts I made the same trick in my head, and hence imagined a woman. I suspect it would come to irk one almost enough to become a Gabe. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Multiple versions of python
ChrisW wrote: Hi, I have installed 2 versions of python on my Windows XP computer - I originally had 3.0.1, but then found that the MySQL module only supported 2.*, so I've now installed that. I have found that if I change the Windows Environment Variable path, then I can change the version of python called when I type 'python' into a command line. However, I'd like to be able to choose which version I use. I know that if I change C:\Python26\python.exe to C:\Python26\python2.exe and C:\Python30\python.exe to C: \Python26\python3.exe, then typing 'python2' or 'python3' will invoke the correct interpreter. However, is it safe just to rename the executable files? Is there a more elegant way to achieve the same task? I wouldn't rename them. You can, of course, copy them (so you have two executables), or you can pick a somedir on your path (I made a directory "C:\cmds" that I add to my path, but tastes vary). C:> copy con \py25.cmd C:\Python25\python\python.exe %* ^Z C:> copy con \py31.cmd C:\Python31\python\python.exe %* ^Z I'd use the two-digit form, as that is where interface changes happen; trying code with py24, py25, py26 can be convenient. By the way, install Python 3.1 rather than 3.0; think of 3.0 as the alpha of the 3.X branches (it will get no love at all now that 3.1 is out). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Mutable Strings - Any libraries that offer this?
Steven D'Aprano wrote: On Mon, 20 Jul 2009 21:08:22 +1000, Ben Finney wrote: What is it you're trying to do that makes you search for a mutable string type? It's likely that a better approach can be found. When dealing with very large strings, it is wasteful to have to duplicate the entire string just to mutate a single character. However, when dealing with very large strings, it's arguably better to use the rope data structure instead. The general problem is that whether strings are mutable or not is an early language design decision, and few languages provide both. Mutable strings need lots of data copying to be safe passing args to unknown functions; immutable strings need lots of copying for ncremental changes. The rope is a great idea for some cases. I'd argue Python works better with immutable strings, because Python is too slow at per-character operations to be running up and down strings a character at a time, changing here and there. So it becomes more natural to deal with strings as chunks to pass around, and it is nice not to have to copy the strings when doing that passing around. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: UnicodeEncodeError: 'ascii' codec can't encode character u'\xb7' in position 13: ordinal not in range(128)
akhil1988 wrote: > Nobody-38 wrote: On Thu, 16 Jul 2009 15:43:37 -0700, akhil1988 wrote: ... In Python 3 you can't decode strings because they are Unicode strings and it doesn't make sense to decode a Unicode string. You can only decode encoded things which are byte strings. So you are mixing up byte strings and Unicode strings. ... I read a byte string from sys.stdin which needs to converted to unicode string for further processing. In 3.x, sys.stdin (stdout, stderr) are text streams, which means that they read and write Unicode strings, not byte strings. I cannot just remove the decode statement and proceed? This is it what it looks like: for line in sys.stdin: line = line.decode('utf-8').strip() if line == '': #do something here If I remove the decode statement, line == '' never gets true. Did you inadvertently remove the strip() as well? ... unintentionally I removed strip() I get this error now: File "./temp.py", line 488, in main() File "./temp.py", line 475, in main for line in sys.stdin: File "/usr/local/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data (1) Do not top post. (2) Try to fully understand the problem and proposed solution, rather than trying to get people to tell you just enough to get your code going. (3) The only way sys.stdin can possibly return unicode is to do some decoding of its own. your job is to make sure it uses the correct decoding. So, if you know your source is always utf-8, try something like: import sys import io sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding='utf8') for line in sys.stdin: line = line.strip() if line == '': #do something here --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: missing 'xor' Boolean operator
Ethan Furman wrote: and returns the last object that is "true" A little suspect this. _and_ returns the first object that is not "true," or the last object. or returns the first object that is "true" Similarly: _or_ returns the first object that is "true," or the last object. so should xor return the only object that is "true", else False/None? Xor has the problem that in two cases it can return neither of its args. Not has behavior similar in those cases, and we see it returns False or True. The Pythonic solution is therefore to use False. def xor(a, b) if a and b: return None elif a: return a elif b: return b else: return None def xor(a, b): if bool(a) == bool(b): return False else: return a or b Side-effect counting in applications of bool(x) is ignored here. If minimizing side-effects is needed: def xor(a, b): if a: if not b: return a elif b: return b return False --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: why did you choose the programming language(s)you currently use?
Aahz wrote: In article <4a5ccdd6$0$32679$9b4e6...@newsspool2.arcor-online.net>, Stefan Behnel wrote: Deep_Feelings wrote: So you have chosen programming language "x" so shall you tell us why you did so , and what negatives or positives it has ? *duck* Where do you get the duck programming language? It shares a type system with Python, of course. :-) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: How to check if any item from a list of strings is in a big string?
Nobody wrote: On Tue, 14 Jul 2009 02:06:04 -0300, Gabriel Genellina wrote: Matt, how many words are you looking for, in how long a string ? Were you able to time any( substr in long_string ) against re.compile ( "|".join( list_items )) ? There is a known algorithm to solve specifically this problem (Aho-Corasick), a good implementation should perform better than R.E. (and better than the gen.expr. with the advantage of returning WHICH string matched) Aho-Corasick has the advantage of being linear in the length of the patterns, so the setup may be faster than re.compile(). The actual searching won't necessarily be any faster (assuming optimal implementations; I don't know how safe that assumption is). Having done a fast Aho-Corasick implementation myself, I can assure you that the actual searching can be incredibly fast. RE conversion usually goes to a slightly more general machine than the Aho-Corasick processing requires. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Proposal: Decimal literals in Python.
Tim Roberts wrote: My favorite notation for this comes from Ada, which allows arbitrary bases from 2 to 16, and allows for underscores within numeric literals: x23_bin : constant := 2#0001_0111#; x23_oct : constant := 8#27#; x23_dec : constant := 10#23#; x23_hex : constant := 16#17#; And mine is one w/o the base 10 bias: .f.123 == 0x123 .7.123 == 0o123 .1.1101 == 0b1101 That is, .. -- show the base by showing base-1 in the base. I actually built this into "OZ," an interpretter. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient binary search tree stored in a flat array?
Piet van Oostrum wrote: Douglas Alan (DA) wrote: DA> On Jul 13, 3:57 pm, a...@pythoncraft.com (Aahz) wrote: Still, unless your list is large (more than thousands of elements), that's the way you should go. See the bisect module. Thing is, the speed difference between C and Python means the constant for insertion and deletion is very very small relative to bytecode speed. Keep in mind that Python's object/binding model means that you're shuffling pointers in the list rather than items. DA> Thank you. My question wasn't intended to be Python specific, though. DA> I am just curious for purely academic reasons about whether there is DA> such an algorithm. All the sources I've skimmed only seem to the DA> answer the question via omission. Which is kind of strange, since it DA> seems to me like an obvious question to ask. It may well be that there is no good simple solution, and people avoid writing about non-existent algorithms. I certainly cannot imagine trying to write an article that carefully covered ideas which don't have well-studied data structures available, and calling them out only to say, "we don't know how to do this well." If such an algorithm were simple and obvious, I dare say you'd be taught about it around the time you learn binary search. Of course you can take any BST algorithm and replace pointers by indices in the array and allocate new elements in the array. But then you need array elements to contain the indices for the children explicitely. And you loower your locality of reference (cache-friendliness). Note the insert in Python, for example, is quite cache-friendly. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: PDF: finding a blank image
DrLeif wrote: I have about 6000 PDF files which have been produced using a scanner with more being produced each day. The PDF files contain old paper records which have been taking up space. The scanner is set to detect when there is information on the backside of the page (duplex scan). The problem of course is it's not the always reliable and we wind up with a number of PDF files containing blank pages. What I would like to do is have python detect a "blank" pages in a PDF file and remove it. Any suggestions? I'd check into ReportLab's commercial product, it may well be easily capable of that. If no success, you might contact PJ at Groklaw, she has dealt with a _lot_ of PDFs (and knows people who deal with PDFs in bulk). --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: tough-to-explain Python
Steven D'Aprano wrote: Even *soup stock* fits the same profile as what Hendrik claims is almost unique to programming. On its own, soup stock is totally useless. But you make it, now, so you can you feed it into something else later on. Or instant coffee. I think I'll avoid coming to your house for a cup of coffee. :-) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: hoe to build a patched socketmodule.c
jacopo mondi wrote: Roger Binns wrote: jacopo mondi wrote: Hi all, I need to patch socketmodule.c (the _socket module) in order to add support to an experimental socket family. You may find it considerably easier to use ctypes since that will avoid the need for any patching. You'll also be able to control how read and write are done (eg read vs recvfrom vs recvmsg vs readv). You can use os.fdopen to convert your raw file descriptor into a Python file object if appropriate. The typical Python way of dealing with this is an additional module, not a modified module placed back in the library. So, take the sources and edit, but change the module name. Even better is figure out how to use _socket.pyd, to create a smaller _socketexpmodule.c and use that. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Sorry about that, the Counter class is there.
Scott David Daniels wrote: Raymond Hettinger wrote: [Scott David Daniels] def most_frequent(arr, N): ... In Py2.4 and later, see heapq.nlargest(). I should have remembered this one In Py3.1, see collections.Counter(data).most_common(n) This one is from Py3.2, I think. Oops -- egg all over my face. I thought I was checking with 3.1, and it was 2.6.2. I _did_ make an explicit check, just poorly. Again, apologies. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
Raymond Hettinger wrote: [Scott David Daniels] def most_frequent(arr, N): ... In Py2.4 and later, see heapq.nlargest(). I should have remembered this one In Py3.1, see collections.Counter(data).most_common(n) This one is from Py3.2, I think. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: tough-to-explain Python
Steven D'Aprano wrote: On Fri, 10 Jul 2009 08:28:29 -0700, Scott David Daniels wrote: Steven D'Aprano wrote: Even *soup stock* fits the same profile as what Hendrik claims is almost unique to programming. On its own, soup stock is totally useless. But you make it, now, so you can you feed it into something else later on. Or instant coffee. I think I'll avoid coming to your house for a cup of coffee. :-) I meant the instant coffee powder is prepared in advance. It's useless on it's own, but later on you feed it into boiling water, add sugar and milk, and it's slightly less useless. I know, but the image of even a _great_ soup stock with instant coffee poured in, both appalled me and made me giggle. So, I thought I'd share. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: property using a classmethod
Emanuele D'Arrigo wrote: class MyClass(object): @classmethod def myClassMethod(self): print "ham" myProperty = property(myClassMethod, None, None) ... doesn't work and returns a TypeError: So, how do I do this? Ultimately all I want is a non-callable class-level attribute MyClass.myProperty that gives the result of MyClass.myClassMethod(). properties affect instances, and classes are instances of types. What you want is a new metaclass: class MyType(type): @property def demo(class_): return class_.a + 3 class MyClass(object): __metaclass__ = MyType a = 5 print MyClass.a, MyClass.demo --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: ISO library ref in printed form
kj wrote: Does anyone know where I can buy the Python library reference in printed form? (I'd rather not print the whole 1200+-page tome myself.) I'm interested in both/either 2.6 and 3.0. Personally, I'd get the new Beazley's Python Essential Reference, which is due out "real soon now," and then use the provided docs as a addon. Also consider grabbing Gruet's "Python Quick Reference" page. When I was working in a printer site I printed the color version of Gruet's page two-sided; it was neither too bulky nor too sketchy for my needs (and he uses color to distinguish version-to-version changes). http://rgruet.free.fr/ Sadly, I no longer work there, so my copy is gone. :-( --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Cleaning up after failing to contructing objects
brasse wrote: I have been thinking about how write exception safe constructors in Python. By exception safe I mean a constructor that does not leak resources when an exception is raised within it. ... > As you can see this is less than straight forward. Is there some kind > of best practice that I'm not aware of? Not so tough. Something like this tweaked version of your example: class Foo(object): def __init__(self, name, fail=False): self.name = name if not fail: print '%s.__init__(%s)' % (type(self).__name__, name) else: print '%s.__init__(%s), FAIL' % (type(self).__name__, name) raise ValueError('Asked to fail: %r' % fail) def close(self): print '%s.close(%s)' % (type(self).__name__, self.name) class Bar(object): def __init__(self): unwind = [] try: self.a = Foo('a') unwind.append(a) self.b = Foo('b', fail=True) unwind.append(b) ... except Exception, why: while unwind): unwind.pop().close() raise bar = Bar() --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Why is my code faster with append() in a loop than with a large list?
Piet van Oostrum wrote: Dave Angel (DA) wrote: DA> It would probably save some time to not bother storing the zeroes in the DA> list at all. And it should help if you were to step through a list of DA> primes, rather than trying every possible int. Or at least constrain DA> yourself to odd numbers (after the initial case of 2). ... # Based upon http://code.activestate.com/recipes/117119/ D = {9: 6} # contains composite numbers XXX Dlist = [2, 3] # list of already generated primes Elist = [(2, 4), (3, 9)] # list of primes and their squares XXX def sieve(): XXX '''generator that yields all prime numbers''' XXX global D XXX global Dlist def sieve2(): '''generator that yields all primes and their squares''' # No need for global declarations, we alter, not replace XXX for p in Dlist: XXX yield p XXX q = Dlist[-1]+2 for pair in Elist: yield pair q = pair[0] + 2 while True: if q in D: p = D[q] x = q + p while x in D: x += p D[x] = p else: XXX Dlist.append(q) XXX yield q XXX D[q*q] = 2*q square = q * q pair = q, square Elist.append(pair) yield pair D[square] = 2 * q q += 2 def factorise(num): """Returns a list of prime factor powers. For example: factorise(6) will return [2, 2] (the powers are returned one higher than the actual value) as in, 2^1 * 3^1 = 6.""" powers = [] power = 0 XXX for factor in sieve(): for factor, limit in sieve2(): power = 0 while num % factor == 0: power += 1 num /= factor XXX if power > 0: if power: # good enough here, and faster # if you really want the factors then append((factor, power)) powers.append(power+1) XXX if num == 1: XXX break XXX return powers if num < limit: if num > 1: # if you really want the factors then append((num, 1)) powers.append(2) return powers OK, that's a straightforward speedup, _but_: factorize(6) == [2, 2] == factorize(10) == factorize(15) So I am not sure exactly what you are calculating. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
Peter Otten wrote: Scott David Daniels wrote: Scott David Daniels wrote: t = timeit.Timer('sum(part[:-1]==part[1:])', 'from __main__ import part') What happens if you calculate the sum in numpy? Try t = timeit.Timer('(part[:-1]==part[1:]).sum()', 'from __main__ import part') Good idea, I hadn't thought of adding numpy bools. (part[:-1]==part[1:]).sum() is only a slight improvement over len(part[part[:-1]==part[1:]]) when there are few elements, but it is almost twice as fast when there are a lot (reflecting the work of allocating and copying). >>> import numpy >>> import timeit >>> original = numpy.random.normal(0, 100, (1000, 1000)).astype(int) >>> data = original.flatten() >>> data.sort() >>> t = timeit.Timer('sum(part[:-1]==part[1:])', 'from __main__ import part') >>> u = timeit.Timer('len(part[part[:-1]==part[1:]])', 'from __main__ import part') >>> v = timeit.Timer('(part[:-1]==part[1:]).sum()', 'from __main__ import part') >>> part = data[::100] >>> (part[:-1]==part[1:]).sum() 9390 >>> t.repeat(3, 10) [0.56368281443587875, 0.55615057220961717, 0.55465764503594528] >>> u.repeat(3, 1000) [0.89576580263690175, 0.89276374511291579, 0.8937328626963108] >>> v.repeat(3, 1000) [0.24798598704592223, 0.24715431709898894, 0.24498979618920202] >>> >>> part = original.flatten()[::100] >>> (part[:-1]==part[1:]).sum() 27 >>> t.repeat(3, 10) [0.57576898739921489, 0.56410158274297828, 0.56988248506445416] >>> u.repeat(3, 1000) [0.27312186325366383, 0.27315007913011868, 0.27214492344683094] >>> v.repeat(3, 1000) [0.28410342655297427, 0.28374053126867693, 0.28318990262732768] >>> Net result: go back to former definition of candidates (a number, not the actual entries), but calculate that number as matches.sum(), not len(part[matches]). Now the latest version of this (compressed) code: > ... > sampled = data[::stride] > matches = sampled[:-1] == sampled[1:] > candidates = sum(matches) # count identified matches > while candidates > N * 10: # 10 -- heuristic > stride *= 2 # # heuristic increase > sampled = data[::stride] > matches = sampled[:-1] == sampled[1:] > candidates = sum(matches) > while candidates < N * 3: # heuristic slop for long runs > stride //= 2 # heuristic decrease > sampled = data[::stride] > matches = sampled[:-1] == sampled[1:] > candidates = sum(matches) > former = None > past = 0 > for value in sampled[matches]: > ... is: ... sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = matches.sum() # count identified matches while candidates > N * 10: # 10 -- heuristic stride *= 2 # # heuristic increase sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = matches.sum() while candidates < N * 3: # heuristic slop for long runs stride //= 2 # heuristic decrease sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = matches.sum() former = None past = 0 for value in sampled[matches]: ... Now I think I can let this problem go, esp. since it was mclovin's problem in the first place. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating alot of class instances?
Steven D'Aprano wrote: ... That's the Wrong Way to do it -- you're using a screwdriver to hammer a nail Don't knock tool abuse (though I agree with you here). Sometimes tool abuse can produce good results. For example, using hammers to drive screws for temporary strong holds led to making better nails. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Clarity vs. code reuse/generality
Andre Engels wrote: On Mon, Jul 6, 2009 at 9:44 AM, Martin Vilcans wrote: On Fri, Jul 3, 2009 at 4:05 PM, kj wrote: I'm will be teaching a programming class to novices, and I've run into a clear conflict between two of the principles I'd like to teach: code clarity vs. code reuse. I'd love your opinion about it. In general, code clarity is more important than reusability. Unfortunately, many novice programmers have the opposite impression. I have seen too much convoluted code written by beginners who try to make the code generic. Writing simple, clear, to-the-point code is hard enough as it is, even when not aiming at making it reusable. If in the future you see an opportunity to reuse the code, then and only then is the time to make it generic. Not just that, when you actually get to that point, making simple and clear code generic is often easier than making complicated-and-supposedly-generic code that little bit more generic that you need. First, a quote which took me a bit to find: Thomas William Körner paraphrasing Polya and Svego in A Companion to Analysis: Recalling that 'once is a trick, twice is a method, thrice is a theorem, and four times a theory,' we seek to codify this insight. Let us apply this insight: Suppose in writing code, we pretty much go with that. A method is something you notice, a theorem is a function, and a theory is a generalized function. Even though we like DRY ("don't repeat yourself") as a maxim, let it go the first time and wait until you see the pattern (a possible function). I'd go with a function first, a pair of functions, and only then look to abstracting the function. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
Scott David Daniels wrote: ... Here's a heuristic replacement for my previous frequency code: I've tried to mark where you could fudge numbers if the run time is at all close. Boy, I cannot let go. I did a bit of a test checking for cost to calculated number of discovered samples, and found after: import timeit import numpy original = numpy.random.random(0, 100, (1000, 1000)).astype(int) data = original.flatten() data.sort() part = data[::100] t = timeit.Timer('sum(part[:-1]==part[1:])', 'from __main__ import part') v = timeit.Timer('len(part[part[:-1]==part[1:]])', 'from __main__ import part') I got: >>> t.repeat(3, 10) [0.58319842326318394, 0.57617574300638807, 0.57831819407238072] >>> v.repeat(3, 1000) [0.93933027801040225, 0.93704535073584339, 0.94096260837613954] So, len(part[mask]) is almost 50X faster! I checked: >>> sum(part[:-1]==part[1:]) 9393 >>> len(part[part[:-1]==part[1:]]) 9393 That's an awful lot of matches, so I with high selectivity: data = original.flatten() # no sorting, so runs missing part = data[::100] >>> t.repeat(3, 10) [0.58641335700485797, 0.58458854407490435, 0.58872594142576418] >>> v.repeat(3, 1000) [0.27352554584422251, 0.27375686015921019, 0.27433291102624935] about 200X faster >>> len(part[part[:-1]==part[1:]]) 39 >>> sum(part[:-1]==part[1:]) 39 So my new version of this (compressed) code: ... sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = sum(matches) # count identified matches while candidates > N * 10: # 10 -- heuristic stride *= 2 # # heuristic increase sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = sum(matches) while candidates < N * 3: # heuristic slop for long runs stride //= 2 # heuristic decrease sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = sum(matches) former = None past = 0 for value in sampled[matches]: ... is: ... sampled = data[::stride] candidates = sampled[sampled[:-1] == sampled[1:]] while len(candidates) > N * 10: # 10 -- heuristic stride *= 2 # # heuristic increase sampled = data[::stride] candidates = sampled[sampled[:-1] == sampled[1:]] while len(candidates) < N * 3: # heuristic slop for long runs stride //= 2 # heuristic decrease sampled = data[::stride] candidates = sampled[sampled[:-1] == sampled[1:]] former = None past = 0 for value in candidates: ... This change is important, for we try several strides before settling on a choice, meaning the optimization can be valuable. This also means we could be pickier at choosing strides (try more values), since checking is cheaper than before. Summary: when dealing with numpy, (or any bulk <-> individual values transitions), try several ways that you think are equivalent and _measure_. In the OODB work I did we called this "impedance mismatch," and it is likely some boundary transitions are _much_ faster than others. The sum case is one of them; I am getting numpy booleans back, rather than numpy booleans, so conversions aren't going fastpath. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
mclovin wrote: On Jul 4, 12:51 pm, Scott David Daniels wrote: mclovin wrote: OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see my frustration. So it doesn't need to be "real time" but it would be nice it was done sorting this month. Is there a "bet guess" strategy where it is not 100% accurate but much faster? Well, I timed a run of a version of mine, and the scan is approx 5X longer than the copy-and-sort. Time arr_of_arr.flatten().sort() to see how quickly the copy and sort happens.So you could try a variant exploiting the following property: If you know the minimum length of a run that will be in the top 25, then the value for each of the most-frequent run entries must show up at positions n * stride and (n + 1) * stride (for some n). That should drastically reduce the scan cost, as long as stride is reasonably large sum(flattened[:-stride:stride] == flattened[stride::stride]) == 1000 So there are only 1000 points to investigate. With any distribution other than uniform, that should go _way_ down. So just pull out those points, use bisect to get their frequencies, and feed those results into the heap accumulation. --Scott David Daniels I dont quite understand what you are saying but I know this: the times the most common element appears varies greatly. Sometimes it appears over 1000 times, and some times it appears less than 50. It all depends on the density of the arrays I am analyzing. Here's a heuristic replacement for my previous frequency code: I've tried to mark where you could fudge numbers if the run time is at all close. def frequency(arr_of_arr, N, stride=100) '''produce (freq, value) pairs for data in arr_of_arr. Tries to produce > N pairs. stride is a guess at half the length of the shortest run in the top N runs. ''' # if the next two lines are too slow, this whole approach is toast data = arr_of_arr.flatten() # big allocation data.sort() # a couple of seconds for 25 million ints # stride is a length forcing examination of a run. sampled = data[::stride] # Note this is a view into data, and is still sorted. # We know that any run of length 2 * stride - 1 in data _must_ have # consecutive entries in sampled. Compare them "in parallel" matches = sampled[:-1] == sampled[1:] # matches is True or False for stride-separated values from sampled candidates = sum(matches) # count identified matches # while candidates is huge, keep trying with a larger stride while candidates > N *10: # 10 -- heuristic stride *= 2 # # heuristic increase sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = sum(matches) # if we got too few, move stride down: while candidates < N * 3: # heuristic slop for long runs stride //= 2 # heuristic decrease sampled = data[::stride] matches = sampled[:-1] == sampled[1:] candidates = sum(matches) # Here we have a "nice" list of candidates that is likely # to include every run we actually want. sampled[matches] is # the sorted list of candidate values. It may have duplicates former = None past = 0 # In the loop here we only use sampled to the pick values we # then go find in data. We avoid checking for same value twice for value in sampled[matches]: if value == former: continue # A long run: multiple matches in sampled former = value # Make sure we only try this one once # find the beginning of the run start = bisect.bisect_left(data, value, past) # find the end of the run (we know it is at least stride long) past = bisect.bisect_right(data, value, start + stride) yield past - start, value # produce frequency, value data --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Clarity vs. code reuse/generality
Paul Rubin wrote: Invalid input data is not considered impossible and doesn't imply a broken program, so assert statements are not the appropriate way to check for it. I like to use a function like def check(condition, msg="data error"): if not condition: raise ValueError, msg ... check (x >= 0, "invalid x") # raises ValueError if x is negative y = sqrt(x) And I curse such uses, since I don't get to see the troublesome value, or why it is troublesome. In the above case, y = sqrt(x) at least raises ValueError('math domain error'), which is more information than you are providing. How about: ... if x >= 0: raise ValueError('x = %r not allowed (negative)?' % x) ... --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
mclovin wrote: OK then. I will try some of the strategies here but I guess things arent looking too good. I need to run this over a dataset that someone pickled. I need to run this 480,000 times so you can see my frustration. So it doesn't need to be "real time" but it would be nice it was done sorting this month. Is there a "bet guess" strategy where it is not 100% accurate but much faster? Well, I timed a run of a version of mine, and the scan is approx 5X longer than the copy-and-sort. Time arr_of_arr.flatten().sort() to see how quickly the copy and sort happens.So you could try a variant exploiting the following property: If you know the minimum length of a run that will be in the top 25, then the value for each of the most-frequent run entries must show up at positions n * stride and (n + 1) * stride (for some n). That should drastically reduce the scan cost, as long as stride is reasonably large. For my uniformly distributed 0..1024 values in 5M x 5M array, About 2.5 sec to flatten and sort. About 15 sec to run one of my heapish thingies. the least frequency encountered: 24716 so, with stride at sum(flattened[:-stride:stride] == flattened[stride::stride]) == 1000 So there are only 1000 points to investigate. With any distribution other than uniform, that should go _way_ down. So just pull out those points, use bisect to get their frequencies, and feed those results into the heap accumulation. --Scott David Daniels -- http://mail.python.org/mailman/listinfo/python-list
Re: Reversible Debugging
Dave Angel wrote: Scott David Daniels wrote: Patrick Sabin wrote: Horace Blegg schrieb: You might consider using a VM with 'save-points'. You run the program (in a debugger/ida/what have you) to a certain point (logical point would be if/ifelse/else statements, etc) and save the VM state. Once you've saved, you continue. If you find the path you've taken isn't what you are after, you can reload a previous save point and start over, trying a different path the next time. That was my idea to implement it. I thought of taking snapshots of the current state every time a "unredoable instruction", e.g random number generation, is done. Remember, storing into a location is destruction. Go over a list of VM instructions and see how many of them are undoable. Read his suggested approach more carefully. He's not "undoing" anything. He's rolling back to the save-point, and then stepping forward to the desired spot. Right, I did misread "unredoable" as "undoable." However, I suspect a surprising amount of stuff is "unredoable" -- iff the random number generator counts as one of those things. The random number seeder is unredoable with empty args, but running the generator once seeded is predictable (by design). If you don't capture the random number state as part of your "snapshot," _lots_ of C space storage will be in the same class, and you are stuck finding the exceptional "safe to use" cases, rather than the exceptional "unsafe to use." Similarly, system calls about time or _any_ callback (when and where executed) create snapshot points, and I suspect roll forwards will be relatively short. In fact, in some sense the _lack_ of a callback is unredoable. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
Vilya Harvey wrote: 2009/7/4 Andre Engels : On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote: Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements)... Try flattening the arrays into a single large array & sorting it. Then you can just iterate over the large array counting as you go; you only ever have to insert into the dict once for each value and there's no lookups in the dict Actually the next step is to maintain a min-heap as you run down the sorted array. Something like: import numpy as np import heapq def frequency(arr): '''Generate frequency-value pairs from a numpy array''' clustered = arr.flatten() # copy (so can safely sort) clustered.sort() # Bring identical values together scanner = iter(clustered) last = scanner.next() count = 1 for el in scanner: if el == last: count += 1 else: yield count, last last = el count = 1 yield count, last def most_frequent(arr, N): '''Return the top N (freq, val) elements in arr''' counted = frequency(arr) # get an iterator for freq-val pairs heap = [] # First, just fill up the array with the first N distinct for i in range(N): try: heap.append(counted.next()) except StopIteration: break # If we run out here, no need for a heap else: # more to go, switch to a min-heap, and replace the least # element every time we find something better heapq.heapify(heap) for pair in counted: if pair > heap[0]: heapq.heapreplace(heap, pair) return sorted(heap, reverse=True) # put most frequent first. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Reversible Debugging
Patrick Sabin wrote: Horace Blegg schrieb: You might consider using a VM with 'save-points'. You run the program (in a debugger/ida/what have you) to a certain point (logical point would be if/ifelse/else statements, etc) and save the VM state. Once you've saved, you continue. If you find the path you've taken isn't what you are after, you can reload a previous save point and start over, trying a different path the next time. That was my idea to implement it. I thought of taking snapshots of the current state every time a "unredoable instruction", e.g random number generation, is done. Remember, storing into a location is destruction. Go over a list of VM instructions and see how many of them are undoable. -- http://mail.python.org/mailman/listinfo/python-list
Re: question of style
upwestdon wrote: if not (self.higher and self.lower): return self.higher or self.lower self.lower = 0 self.higher = 123 ??? More than just None is False -- http://mail.python.org/mailman/listinfo/python-list
Re: Sequence splitting
Steven D'Aprano wrote: I've never needed such a split function, and I don't like the name, and the functionality isn't general enough. I'd prefer something which splits the input sequence into as many sublists as necessary, according to the output of the key function. Something like itertools.groupby(), except it runs through the entire sequence and collates all the elements with identical keys. splitby(range(10), lambda n: n%3) => [ (0, [0, 3, 6, 9]), (1, [1, 4, 7]), (2, [2, 5, 8]) ] Your split() would be nearly equivalent to this with a key function that returns a Boolean. Well, here is my go at doing the original with iterators: def splitter(source, test=bool): a, b = itertools.tee((x, test(x)) for x in source) return (data for data, decision in a if decision), ( data for data, decision in b if not decision) This has the advantage that it can operate on infinite lists. For something like splitby for grouping, I seem to need to know the cases up front: def _make_gen(particular, src): return (x for x, c in src if c == particular) def splitby(source, cases, case): '''Produce a dict of generators for case(el) for el in source''' decided = itertools.tee(((x, case(x)) for x in source), len(cases)) return dict((c, _make_gen(c, src)) for c, src in zip(cases, decided)) example: def classify(n): '''Least prime factor of a few''' for prime in [2, 3, 5, 7]: if n % prime == 0: return prime return 0 for k,g in splitby(range(50), (2, 3, 5, 7, 0), classify).items(): print('%s: %s' % (k, list(g))) 0: [1, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47] 2: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48] 3: [3, 9, 15, 21, 27, 33, 39, 45] 5: [5, 25, 35] 7: [7, 49] --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Direct interaction with subprocess - the curse of blocking I/O
#x27;f_read' for 'timeout_select' see module 'select' """ text_lines = '' # for result accumulation while True: # as long as there are bytes to read rlist, wlist, xlist = select.select([f_read], [], [], timeout_select) DPRINT("rlist=%r, wlist=%r, xlist=%r", rlist, wlist, xlist) if rlist: text_read = f_read.readline() # get a line DPRINT("after read/readline text_read:%r, len=%s", text_read, len(text_read)) if text_read: # there were some bytes text_lines += text_read DPRINT("text_lines:%r", text_lines) continue # Got some chars, keep going. break # Nothing new found, let's get out. return text_lines or None --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: question of style
Duncan Booth wrote: Simon Forman wrote: ... if self.higher is self.lower is None: return ... As a matter of style however I wouldn't use the shorthand to run two 'is' comparisons together, I'd write that out in full if it was actually needed here. Speaking only to the style issue, when I've wanted to do something like that, I find: if self.higher is None is self.lower: ... more readable, by making clear they are both being compared to a constant, rather than compared to each other. More often, I've used code like: if is not None is not : ... since I am usually working on non-defaulting cases in the body. I find the form above simpler to read than: if is not None and is not None: ... I do draw the line at two, though, and with three or more I'll paren-up a list of parallel comparisons: if ( is not None and is not None and is not None): ... --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Basic question from pure beginner
Charles Yeomans wrote: Let me offer a bit of editing Finally, I'd remove correct_password_given from the loop test, and replace it with a break statement when the correct password is entered. password = "qwerty" correct_password_given = False attemptcount = 0 MaxAttempts = 3 while attemptcount < MaxAttempts: guess = raw_input("Enter your password: ") guess = str(guess) if guess != password: print "Access Denied" attemptcount = attemptcount + 1 else: print "Password Confirmed" correct_password_given = True break And even simpler: PASSWORD = "qwerty" MAXRETRY = 3 for attempt in range(MAXRETRY): if raw_input('Enter your password: ') == PASSWORD: print 'Password confirmed' break # this exits the for loop print 'Access denied: attempt %s of %s' % (attempt+1, MAXRETRY) else: # The else for a for statement is not executed for breaks, # So indicates the end of testing without a match raise SystemExit # Or whatever you'd rather do. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Multi thread reading a file
Gabriel Genellina wrote: ... def convert(in_queue, out_queue): while True: row = in_queue.get() if row is None: break # ... convert row out_queue.put(converted_line) These loops work well with the two-argument version of iter, which is easy to forget, but quite useful to have in your bag of tricks: def convert(in_queue, out_queue): for row in iter(in_queue.get, None): # ... convert row out_queue.put(converted_line) --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: invoking a method from two superclasses
> Mitchell L Model wrote: Sorry, after looking over some other responses, I went back and re-read your reply. I'm just making sure here, but: > Scott David Daniels wrote: Below compressed for readability in comparison: class A: def __init__(self): super().__init__(); print('A') class B: def __init__(self): super().__init__(); print('B') class C(A, B): def __init__(self): super().__init__(); print('C') C() And, if you are doing it with a message not available in object: Renamed to disambiguate later discussion class root: def prints(self): print('root') # or pass if you prefer class D(root): def prints(self): super().prints(); print('D') class E(root): def prints(self): super().prints(); print('E') class F(D, E): def prints(self): super().prints(); print('F') F().prints() What I was missing is that each path up to and including the top of the diamond must include a definition of the method, along with super() calls to move the method calling on its way up. Actually, not really true. In the F through root example, any of the "prints" methods except that on root may be deleted and the whole thing works fine. The rootward (closer to object) end must contain the method in question, possibly only doing a pass as the action, and _not_ calling super. The other methods (those in D, E, and F above are all optional (you can freely comment out those methods where you like), but each should call super() in their bodies. Note that you can also add a class: class G(E, D): def prints(self): super().prints(); print('G') G().prints() Also note that the inheritances above can be "eventually inherits from" as well as direct inheritance. Is this what the documentation means by "cooperative multiple inheritance"? Yes, the phrase is meant to convey "No magic (other than super itelf) is involved in causing the other methods to be invoked. If you want "all" prints methods called, make sure all but the last method do super calls. Of course, any method that doesn't do the super call will be the last by definition (no flow control then), but by putting the non-forwarding version at or below the lower point of the diamond, the mro order guarantees that you will have a "good" place to stop. Think of the mro order as a solution to the partial order constraints that a class must appear before any of its direct superclasses, and (by implication) after any of its subclasses. If your correction of my example, if you remove super().__init__ from B.__init__ the results aren't affected, because object.__init__ doesn't do anything and B comes after A in C's mro. However, if you remove super().__init__ from A.__init__, it stops the "supering" process dead in its tracks. Removing the super from B.__init__ means that you don't execute object.__init__. It turns out that object does nothing in its __init__, but without knowing that, removing the super from B.__init__ is also a mistake. So, you may well already have it all right, but as long as I'm putting in the effort to get the "operational rules about using super" out, I thought I'd fill in this last little bit. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list