[ANN] ftputil 2.2 released
ftputil 2.2 is now available from http://ftputil.sschwarzer.net/download . Changes since version 2.1 - - Results of stat calls (also indirect calls, i. e. listdir, isdir/isfile/islink, exists, getmtime etc.) are now cached and reused. This results in remarkable speedups for many use cases. Thanks to Evan Prodromou for his permission to add his lrucache module under ftputil's license. - The current directory is also locally cached, resulting in further speedups. - It's now possible to write and plug in custom parsers for directory formats which ftputil doesn't support natively. - File-like objects generated via ``FTPHost.file`` now support the iterator protocol (for line in some_file: ...). - The documentation has been updated accordingly. Read it under http://ftputil.sschwarzer.net/trac/wiki/Documentation . Possible incompatibilities: - This release requires at least Python 2.3. (Previous releases worked with Python versions from 2.1 up.) - The method ``FTPHost.set_directory_format`` has been removed, since the directory format (Unix or MS) is set automatically. (The new method ``set_parser`` is a different animal since it takes a parser object to parse foreign formats, not a string.) What is ftputil? ftputil is a high-level FTP client library for the Python programming language. ftputil implements a virtual file system for accessing FTP servers, that is, it can generate file-like objects for remote files. The library supports many functions similar to those in the os, os.path and shutil modules. ftputil has convenience functions for conditional uploads and downloads, and handles FTP clients and servers in different timezones. License --- ftputil 2.2 is Open Source software, released under the revised BSD license (see http://www.opensource.org/licenses/bsd-license.php ). Stefan -- Dr.-Ing. Stefan Schwarzer SSchwarzer.com - Softwareentwicklung f??r Technik und Wissenschaft http://sschwarzer.com -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
Re: Use a Thread to reload a Module?
Gregory Piñero [EMAIL PROTECTED] wrote: To: Hendrik van Rooyen [EMAIL PROTECTED] On 12/24/06, Hendrik van Rooyen [EMAIL PROTECTED] wrote: Gregory Piñero [EMAIL PROTECTED] wrote: ... open( filename[,flag='c'[,protocol=None[,writeback=False[,binary=None) Open a persistent dictionary. The filename specified is the base filename for the underlying database. As a side-effect, an extension may be added to the filename and more than one file may be created. By default, the underlying database file is opened for reading and writing. The optional flag parameter has the same interpretation as the flag parameter of anydbm.open. hth - Hendrik So how is that better than using marshal as I am now? Is it faster to load? Perhaps I could do speed tests to compare. -Greg I doubt it - originally I asked the question about a persistant dict because I thought that the scheme of periodically updating your live data could fruitfully be replaced by a trickle updating scheme to keep the stuff in better sync, rather than taking big lumps at a time... This may or may not be feasible - depends on what you are actually doing and if it can in fact be updated a little bit at a time instead of in a large lump at longer intervals. - Hendrik -- http://mail.python.org/mailman/listinfo/python-list
Re: method names in __slots__ ??
John Machin wrote in news:1167008799.074885.250770@ 73g2000cwn.googlegroups.com in comp.lang.python: Given a = Adder(), a.tally = 0 gets AttributeError: 'Adder' object attribute 'tally' is read-only a.notinslots = 1 gets AttributeError: 'Adder' object attribute 'notinslots' is read-only So is there some magic class-fu going down here, or is this just a waste of memory space in the instances? Haven't you, with your 2 examples above, answered your own question ? Clearly from your example it doesn't make any difference if you add a class attribute to the slots, one way or another its as if you hadn't put it in there in the first place. This will give the same error, which shows its about class attributes and not just methods: class Adder(object): __slots__ = [ 'class_name' ] class_name = 3 a = Adder() a.class_name = 2 It would seem that the interpreter removes any names it finds as class attribute names from the list it finds in __slots__ before it creates the instance. Of course if my guessing above isn't good enough, we could look at the documentation: http://docs.python.org/ref/slots.html#l2h-218 __slots__ are implemented at the class level by creating descriptors (3.4.2) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment. So its that the __slots__ assignment makes the descriptors and then the subsiquent method defenitions and class attribute bindings remove them. Rob. -- http://www.victim-prime.dsl.pipex.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: regular expression
Rad [Visual C# MVP] wrote: On Sun, 24 Dec 2006 16:36:31 +0100, Stef Mientki wrote: Dustan wrote: Kleine Aap wrote: Asper Faner wrote: I seem to always have hard time understaing how this regular expression works, especially how on earth do people bring it up as part of computer programming language. Natural language processing seems not enough to explain by the way. Why no eliminate it ? Hi folks, fairly new to the list(Python is my first programming language, so I'm fairly new to the world of programming too)but this is a question I've been wondering about since I started learning about the re module. Are regular expressions what makes mark up languages interpretable by webbrowsers? -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python never add itself to the Windows path?
Ben Sizer wrote: I've installed several different versions of Python across several different versions of MS Windows, and not a single time was the Python directory or the Scripts subdirectory added to the PATH environment variable. Personally, I hate Windows applications that add themselves to the PATH. So much crap gets put in there that I don't even use the default system PATH and just set my own explicitly. Every time, I've had to go through and add this by hand, to have something resembling a usable Python installation. If you're installing multiple versions of Python on one machine you're going to have to do this anyways to ensure the version of Python you want first in the path actually is first. No such problems on Linux, whether it be Mandrake/Mandriva, Fedora Core, or Kubuntu. Linux distributions normally install themselves somewhere that's normally in the path already. I suppose you can do the same thing on Windows if you want, just choose to install Python into directory that's already in your path. Though installing to something like C:\WINDOWS\SYSTEM32 is probably not a good idea. Ross Ridge -- http://mail.python.org/mailman/listinfo/python-list
Re: regular expression
[EMAIL PROTECTED] wrote: Rad [Visual C# MVP] wrote: On Sun, 24 Dec 2006 16:36:31 +0100, Stef Mientki wrote: Dustan wrote: Kleine Aap wrote: Asper Faner wrote: I seem to always have hard time understaing how this regular expression works, especially how on earth do people bring it up as part of computer programming language. Natural language processing seems not enough to explain by the way. Why no eliminate it ? Hi folks, fairly new to the list(Python is my first programming language, so I'm fairly new to the world of programming too)but this is a question I've been wondering about since I started learning about the re module. Are regular expressions what makes mark up languages interpretable by webbrowsers? I don't actually know the answer, but my educated guess: Regular expressions are just the simplest way to parse any text, but there are other ways. Webbrowsers most likely depend on regular expressions just because it's a relatively quick and easy way to interpret the language. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to stop program when threads is sleeping
many_years_after wrote: Hi, pythoners: There is a problem I couldn't dispose. I start a thread in the my program. The thread will do something before executing time.sleep(). When the user give a signal to the main thread (such as click the 'end' button or close the window), the thread should end it's running. But how to end the threading when it's sleeping? I set an flag to the thread, but it doesn't work. I also thought to put 'time.sleep()' to the main thread. But I think the main thread will not response to user's action because it is executing sleep(). Any ideas? Thanks. The thread should not sleep - just take short cat naps. The main should then set some kind of counter, and the thread check the counter between naps... -- http://mail.python.org/mailman/listinfo/python-list
Re: One module per class, bad idea?
Carl Banks wrote: Kent Johnson wrote: Carl Banks wrote: Now, I think this is the best way to use modules, but you don't need to use modules to do get higher-level organization; you could use packages instead. It's a pain if you're working on two different classes in the same system you have to keep switching files; but I guess some people prefer to switch files rather than to scroll for some reason. That would be me. I strongly prefer to switch files rather than scroll. I use an editor that makes it easy to switch files. For me it is much easier to switch between files than to scroll between two parts of a file, and I don't lose my place when I switch back. I like to be able to see things side by side. Man, I don't know you do it. Say I'm sitting there concentrating on programming something, and I see that I'll have to make a change in another file. All of a sudden, I have to recall some filename out of thin air. Totally breaks my train of thought, sometimes I space out trying to think of it because I have to cold-start an entirely different part of my brain. It's less of a mental distraction to just scroll. But then to go back to where you were, you have to scroll back and find your place. For me, just a click or keystroke to restore the last file with the cursor or selection exactly where I left it. And if I am going back and forth between the two, each switch is equally easy after the first (opening the file). (BTW, any decent editor will let you view different positions of the same file side-by-side.) Right, at a cost of showing you half as much of the one you care about. Anyway, I'm not trying to convince anyone to change, just pointing out that there are different styles of editing that make sense to those who use them, if not to outside observers ;-) Kent -- http://mail.python.org/mailman/listinfo/python-list
Re: How to stop program when threads is sleeping
On Sun, 2006-12-24 at 22:55 -0800, many_years_after wrote: Hi, pythoners: There is a problem I couldn't dispose. I start a thread in the my program. The thread will do something before executing time.sleep(). When the user give a signal to the main thread (such as click the 'end' button or close the window), the thread should end it's running. But how to end the threading when it's sleeping? I set an flag to the thread, but it doesn't work. Is the thread supposed to do some additional work after being woken up? If not, there is no point in going to sleep in the first place and the thread should just terminate when it has completed its task. If yes, I'd use a threading.Event object to .wait() on in the sub-thread rather than putting it to sleep, and then .set() the event object in the main thread when it's time to wake up the sub-thread. Hope this helps, Carsten. -- http://mail.python.org/mailman/listinfo/python-list
Re: method names in __slots__ ??
Rob Williscroft wrote: John Machin wrote in news:1167008799.074885.250770@ 73g2000cwn.googlegroups.com in comp.lang.python: Given a = Adder(), a.tally = 0 gets AttributeError: 'Adder' object attribute 'tally' is read-only a.notinslots = 1 gets AttributeError: 'Adder' object attribute 'notinslots' is read-only So is there some magic class-fu going down here, or is this just a waste of memory space in the instances? Haven't you, with your 2 examples above, answered your own question ? No. Clearly from your example it doesn't make any difference if you add a class attribute to the slots, one way or another its as if you hadn't put it in there in the first place. Clearly? Not so. It takes up memory. A list of 1 million Adder instances takes up about 68 Mb (Python 2.5 on Windows XP). With the method names removed from the __slots__, it takes only about 44 Mb. [For comparison: with no __slots__ at all, it takes about 180 Mb] This will give the same error, which shows its about class attributes and not just methods: class Adder(object): __slots__ = [ 'class_name' ] class_name = 3 a = Adder() a.class_name = 2 It would seem that the interpreter removes any names it finds as class attribute names from the list it finds in __slots__ before it creates the instance. It doesn't seem so to me. If it did that, the memory usage would not increase. Of course if my guessing above isn't good enough, we could look at the documentation: http://docs.python.org/ref/slots.html#l2h-218 __slots__ are implemented at the class level by creating descriptors (3.4.2) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment. I have read that, before I posted. Asides: (1) It would be useful if it stated the empirically determined fact that the result is that the class attribute is thusly made read-only. (2) The second sentence is not a model of clarity. In any case I can't see how the paragraph gives any support for your next statement: So its that the __slots__ assignment makes the descriptors and then the subsiquent method defenitions and class attribute bindings remove them. Errrmmm ... if the descriptors are removed, how is it that the behaviour is read-only? Cheers, John -- http://mail.python.org/mailman/listinfo/python-list
Re: method names in __slots__ ??
John Machin wrote in news:[EMAIL PROTECTED] in comp.lang.python: Rob Williscroft wrote: John Machin wrote in news:1167008799.074885.250770@ 73g2000cwn.googlegroups.com in comp.lang.python: Given a = Adder(), a.tally = 0 gets AttributeError: 'Adder' object attribute 'tally' is read-only a.notinslots = 1 gets AttributeError: 'Adder' object attribute 'notinslots' is read-only So is there some magic class-fu going down here, or is this just a waste of memory space in the instances? Haven't you, with your 2 examples above, answered your own question ? No. Clearly from your example it doesn't make any difference if you add a class attribute to the slots, one way or another its as if you hadn't put it in there in the first place. Clearly? Not so. It takes up memory. A list of 1 million Adder instances takes up about 68 Mb (Python 2.5 on Windows XP). With the method names removed from the __slots__, it takes only about 44 Mb. [For comparison: with no __slots__ at all, it takes about 180 Mb] 68 - 44 = 24 24 / 4 = 6 So thats 6 pointers for 5 methods, probably 5 pointers and and 4 bytes round up to the nearest allocation unit. So the slots in the instance are staying arround, even though they are no longer accesable (see below). [snip] It would seem that the interpreter removes any names it finds as class attribute names from the list it finds in __slots__ before it creates the instance. It doesn't seem so to me. If it did that, the memory usage would not increase. It was a guess, and an incorrect guess, but thats why I quoted the docs below. Of course if my guessing above isn't good enough, we could look at the documentation: http://docs.python.org/ref/slots.html#l2h-218 __slots__ are implemented at the class level by creating descriptors (3.4.2) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment. I have read that, before I posted. Asides: (1) It would be useful if it stated the empirically determined fact that the result is that the class attribute is thusly made read-only. (2) The second sentence is not a model of clarity. In any case I can't see how the paragraph gives any support for your next statement: So its that the __slots__ assignment makes the descriptors and then the subsiquent method defenitions and class attribute bindings remove them. Errrmmm ... if the descriptors are removed, how is it that the behaviour is read-only? The descriptors are part of the class object, they are removed when the class attributes are rebound, further rebinding of the class attributes will work fine: Adder.tally = 0 They are not assignable in the instance as the class descriptors that would have forwarded the assignment to the instances slots have been replaced. The memory usage is higher because the slots in the instance are still there even though the descriptors that would allow them to be assigned have been removed. Rob. -- http://www.victim-prime.dsl.pipex.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Elliptic Curve Library
Jaap Spies wrote: Mike Tammerman wrote: I need an elliptic curve library that can be used by python. I googled but couldn't find a one. I'll appreciate, if you could show me. You could look at http://sage.scipy.org/sage/ http://sage.scipy.org/sage/features.html Jaap Sorry, don't know about those areas. Hope the other reply is of help. Vasudev www.dancingbison.com -- http://mail.python.org/mailman/listinfo/python-list
ANNOUNCE: Mod_python 3.3.0b (Beta)
The Apache Software Foundation and The Apache HTTP Server Project are pleased to announce the 3.3.0b (Beta) release of mod_python. Version 3.3.0b of mod_python features several new functions and attributes providing better access to apache internals, as well as many bug fixes and various performance and security improvements. A detailed description of the changes is available in Appendix A of the mod_python manual, also available here http://www.modpython.org/live/mod_python-3.3.0b/doc-html/app-changes-from-3.2.10.html Beta releases are NOT considered stable and usually contain bugs. This release is intended to solicit widespread testing of the code. We strongly recommend that you try out your existing applications and experiment with new features in a non-production environment using this version and report any problems you may encounter so that they can be addressed before the final release. Preferred method of reporting problems is the mod_python user list [EMAIL PROTECTED] Mod_python 3.3.0b is available for download from: http://httpd.apache.org/modules/python-download.cgi For more information about mod_python visit http://www.modpython.org/ Regards, The Apache mod_python team. -- http://mail.python.org/mailman/listinfo/python-list
Re: Unescaping URLs in Python
Lawrence D'Oliveiro wrote: In message [EMAIL PROTECTED], John Nagle wrote: Here's a URL from a link on the home page of a major company. a href=/adsk/servlet/index?siteID=123112amp;id=1860142About Us/a What's the appropriate Python function to call to unescape a URL which might contain things like that? Just use any HTML-parsing library. I think the standard Python HTMLParser will do the trick, provided there aren't any errors in the HTML. I'm using BeautifulSoup, because I need to process real world HTML. At least by default, it doesn't unescape URLs like that. Nor, on the output side, does it escape standalone characters, as in text like Sales Advertising Department. But there are various BeautifulSoup options; more on this later. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: One module per class, bad idea?
Kent Johnson wrote: Carl Banks wrote: Kent Johnson wrote: Carl Banks wrote: Now, I think this is the best way to use modules, but you don't need to use modules to do get higher-level organization; you could use packages instead. It's a pain if you're working on two different classes in the same system you have to keep switching files; but I guess some people prefer to switch files rather than to scroll for some reason. That would be me. I strongly prefer to switch files rather than scroll. I use an editor that makes it easy to switch files. For me it is much easier to switch between files than to scroll between two parts of a file, and I don't lose my place when I switch back. I like to be able to see things side by side. Man, I don't know you do it. Say I'm sitting there concentrating on programming something, and I see that I'll have to make a change in another file. All of a sudden, I have to recall some filename out of thin air. Totally breaks my train of thought, sometimes I space out trying to think of it because I have to cold-start an entirely different part of my brain. It's less of a mental distraction to just scroll. But then to go back to where you were, you have to scroll back and find your place. See, I find that to be a lot less of a mental disruption than recalling a filename on the spot. For me, just a click or keystroke to restore the last file with the cursor or selection exactly where I left it. And if I am going back and forth between the two, each switch is equally easy after the first (opening the file). Ok, but doesn't your editor have bookmarks? (I don't use them, because remembering a bookmark name is the same mental disruption for me as remembering a filename. Sometimes I use an interactive search will get me to where I want to go if it's more than a screen or two.) (BTW, any decent editor will let you view different positions of the same file side-by-side.) Right, at a cost of showing you half as much of the one you care about. I presume if you're looking at two different files side-by-side it's at the same cost? Anyway, I'm not trying to convince anyone to change, just pointing out that there are different styles of editing that make sense to those who use them, if not to outside observers ;-) That's fine; I'm not knocking anyone's style. But maybe you should just leave it at, I just prefer small files, and cease with the editor-based arguments. To be sure, there are probably editors out there that can load several files into the same buffer, which would mean I could avoid recalling filenames even when editing multiple files. (In fact, I think I'll look for such a solution the next time I find myself editing Java.) It's really not about the editor; I just think that the module is the best place for higher-level organization. But the package will do. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list
Re: Unescaping URLs in Python
John Nagle wrote: What's the appropriate Python function to call to unescape a URL which might contain things like that? xml.sax.saxutils.unescape() Will this interfere with the usual % type escapes in URLs? Nope, and urllib.unquote() can be used to translate URL escapes manually. Jeffrey -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python never add itself to the Windows path?
[EMAIL PROTECTED] wrote: I don't seem to have any problem running python programs regardless of where they are. My platform is windows xp and I have run both 2.4 and 2.5 more details about what version of windows you are running might be helpfull I don't think the Windows version is relevant. I did point out that this happens across different incarnations of Windows (98SE and XP the 2 I have on hand to test), and that the problem wasn't specifically about running python programs. Basically if you go to a command prompt and type python, it won't do anything on a plain Python install on Windows. Try it on Linux, and probably Mac too, and it'll do something useful. Similarly, if you install a Python package that adds to the scripts directory, you can typically expect to run those scripts from the command line without having to use the full path - not on Windows. -- Ben Sizer -- http://mail.python.org/mailman/listinfo/python-list
Re: Elliptic Curve Library
Mike Tammerman [EMAIL PROTECTED] (MT) wrote: MT Hi, MT I need an elliptic curve library that can be used by python. I googled MT but couldn't find a one. I'll appreciate, if you could show me. OpenSSL contains elliptic curve stuff (donated by SUN). M2Crypto is a Python interface to SSL. -- Piet van Oostrum [EMAIL PROTECTED] URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4] Private email: [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: ANNOUNCE: Mod_python 3.3.0b (Beta)
How long does it usually take for these things to make there way into the Fedora (or other distro) repositories? Gregory (Grisha) Trubetskoy wrote: The Apache Software Foundation and The Apache HTTP Server Project are pleased to announce the 3.3.0b (Beta) release of mod_python. Version 3.3.0b of mod_python features several new functions and attributes providing better access to apache internals, as well as many bug fixes and various performance and security improvements. A detailed description of the changes is available in Appendix A of the mod_python manual, also available here http://www.modpython.org/live/mod_python-3.3.0b/doc-html/app-changes-from-3.2.10.html Beta releases are NOT considered stable and usually contain bugs. This release is intended to solicit widespread testing of the code. We strongly recommend that you try out your existing applications and experiment with new features in a non-production environment using this version and report any problems you may encounter so that they can be addressed before the final release. Preferred method of reporting problems is the mod_python user list [EMAIL PROTECTED] Mod_python 3.3.0b is available for download from: http://httpd.apache.org/modules/python-download.cgi For more information about mod_python visit http://www.modpython.org/ Regards, The Apache mod_python team. -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python never add itself to the Windows path?
Ross Ridge wrote: Ben Sizer wrote: I've installed several different versions of Python across several different versions of MS Windows, and not a single time was the Python directory or the Scripts subdirectory added to the PATH environment variable. Personally, I hate Windows applications that add themselves to the PATH. So much crap gets put in there that I don't even use the default system PATH and just set my own explicitly. Personally I hate programs that ask to be installed to the root folder of my hard drive, but Python suggests that as a default too. ;) In an ideal world, Python should operate pretty much the same across all platforms. Unfortunately, as it stands, you need to have different instructions for running things on Windows. eg. The standard python setup.py install invocation isn't going to do a damn thing unless you've fixed up the path beforehand. The same goes for python ez_setup.py, another common favourite. The scripts directory is important too: TurboGears installs a tg-admin script which you're supposed to run from your project's directory: which on Windows means you need to type something like c:\python24\scripts\tg-admin each time. Half of the people who develop on Mac and Linux don't realise or acknowledge this. and so the instructions for using their packages don't work for the average person new to Python who probably just ran the Windows installer program and thought that would suffice. Linux distributions normally install themselves somewhere that's normally in the path already. I suppose you can do the same thing on Windows if you want, just choose to install Python into directory that's already in your path. Though installing to something like C:\WINDOWS\SYSTEM32 is probably not a good idea. The Windows way is typically to install things in Program Files and then point things there as necessary. Installing it the Linux way would just cause a different set of problems. Adding it to the PATH variable is not going to cause problems for the vast majority of people, and it's far easier to edit up the PATH to remove an entry you don't want, than to move an installed program from one place to another. -- Ben Sizer -- http://mail.python.org/mailman/listinfo/python-list
Re: Website Capture
Not sure what OS you are on, but a2ps is one way you could probably do this: http://www.infres.enst.fr/~demaille/a2ps/delegations.html That will get you as far as PostScript and I imagine it is pretty straightforward from there to get things into a gif. Jonathan Curran wrote: On Monday 25 December 2006 00:57, [EMAIL PROTECTED] wrote: Hi, I want to capture a web site into gif image using Python. I installed the PIL if it can help. It can't be done with just Python PIL, that much is for sure. First of all you need some sort of module/program/(whatever you want to call it) that can render a given webpage. Then you would proceed to capturing the graphical output of that module/program/... Maybe what you should look into are graphical toolkits that possibly provide a HTML widget from which you can get a graphical display. WxWidgets/WxPython is one option, and the only other I can think of is GTK/pyGTK. I hope this helps a little, and Merry Christmas! - Jonathan Curran -- http://mail.python.org/mailman/listinfo/python-list
Re: regular expression
On Mon, 25 Dec 2006 06:17:00 -0800, [EMAIL PROTECTED] wrote: Hi folks, fairly new to the list(Python is my first programming language, so I'm fairly new to the world of programming too)but this is a question I've been wondering about since I started learning about the re module. Are regular expressions what makes mark up languages interpretable by webbrowsers? Web browsers have to render HTML, which implies they must be able to parse and interpret at least one markup language. _How_ they parse the markup is up to the browser developers. Since regular expressions are very good at certain types of text parsing, and are widely available, it is probable that regular expressions are used in some (many? all?) markup parsers, simply because it is widely available and is a good tool for some (but not all) parsing tasks. For example, Python's xmllib module uses reg exps to parse xml. Essentially, a regular expression engine is a super-charged find command on steroids. But that doesn't mean that reg exps are the only tool for the job. A markup parser doesn't necessarily need to use regexes. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating all permutations from a regexp
On 12/22/06, Fredrik Lundh [EMAIL PROTECTED] wrote: BJörn Lindqvist wrote: With regexps you can search for strings matching it. For example, given the regexp: foobar\d\d\d. foobar123 would match. I want to do the reverse, from a regexp generate all strings that could match it. The regexp: [A-Z]{3}\d{3} should generate the strings AAA000, AAA001, AAA002 ... AAB000, AAB001 ... ZZZ999. Is this possible to do? Obviously, for some regexps the set of matches is unbounded (a list of everything that matches * would be very unpractical), but how would you do it for simple regexps like the one above? here's a start: http://mail.python.org/pipermail/python-list/2001-August/102739.html Thankyou! Who would have known that there is a sre_parse module... makes things quite a bit easier. -- mvh Björn -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating all permutations from a regexp
With some help from the sre_parse module (and the Python Cookbook), here is the completed module: # -*- coding: utf-8 -*- import itertools from sre_constants import * import sre_parse import string category_chars = { CATEGORY_DIGIT : string.digits, CATEGORY_SPACE : string.whitespace, CATEGORY_WORD : string.digits + string.letters + '_' } def unique_extend(res_list, list): for item in list: if item not in res_list: res_list.append(item) def handle_any(val): This is different from normal regexp matching. It only matches printable ASCII characters. return string.printable def handle_branch((tok, val)): all_opts = [] for toks in val: opts = permute_toks(toks) unique_extend(all_opts, opts) return all_opts def handle_category(val): return list(category_chars[val]) def handle_in(val): out = [] for tok, val in val: out += handle_tok(tok, val) return out def handle_literal(val): return [chr(val)] def handle_max_repeat((min, max, val)): Handle a repeat token such as {x,y} or ?. subtok, subval = val[0] if max 5000: # max is the number of cartesian join operations needed to be # carried out. More than 5000 consumes way to much memory. raise ValueError(To many repetitions requested (%d) % max) optlist = handle_tok(subtok, subval) iterlist = [] for x in range(min, max + 1): joined = join([optlist] * x) iterlist.append(joined) return (''.join(it) for it in itertools.chain(*iterlist)) def handle_range(val): lo, hi = val return (chr(x) for x in range(lo, hi + 1)) def handle_subpattern(val): return list(permute_toks(val[1])) def handle_tok(tok, val): Returns a list of strings of possible permutations for this regexp token. handlers = { ANY: handle_any, BRANCH : handle_branch, CATEGORY : handle_category, LITERAL: handle_literal, IN : handle_in, MAX_REPEAT : handle_max_repeat, RANGE : handle_range, SUBPATTERN : handle_subpattern} try: return handlers[tok](val) except KeyError, e: fmt = Unsupported regular expression construct: %s raise ValueError(fmt % tok) def permute_toks(toks): Returns a generator of strings of possible permutations for this regexp token list. lists = [handle_tok(tok, val) for tok, val in toks] return (''.join(it) for it in join(lists)) def join(iterlist): Cartesian join as an iterator of the supplied sequences. Borrowed from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/302478 def rloop(seqin, comb): if seqin: for item in seqin[0]: newcomb = comb + [item] for item in rloop(seqin[1:], newcomb): yield item else: yield comb return rloop(iterlist, []) ## PUBLIC API def ipermute(p): toks = [tok_n_val for tok_n_val in sre_parse.parse(p)] return permute_toks(toks) def permute(p): return list(ipermute(p)) Used like this: from permute import ipermute for s in ipermute('[A-Z]\d'): print s Almost all regular expression constructs are supported except for '*' (which in the Python sre_parse implementation matches 0 to 65535 times), '+' and '^'. Non-ASCII characters doesn't work, but I think that is a limitation in the sre_parse module. It works by first parsing the regexp to string sequences so that [A-Z] becomes ['A', 'B', ... 'Z'], \d becomes ['1', ... , '9']. Then a Cartesian join is applied on all string sequences to get all possible permutations of them all. Suggestions for improvements very much appreciated. -- mvh Björn -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating all permutations from a regexp
Paul McGuire [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] On Dec 22, 8:30 am, BJörn Lindqvist [EMAIL PROTECTED] wrote: With regexps you can search for strings matching it. For example, given the regexp: foobar\d\d\d. foobar123 would match. I want to do the reverse, from a regexp generate all strings that could match it. The regexp: [A-Z]{3}\d{3} should generate the strings AAA000, AAA001, AAA002 ... AAB000, AAB001 ... ZZZ999. Is this possible to do? Here is a first cut at your problem (http://pyparsing-public.wikispaces.com/space/showimage/invRegex.py). I used pyparsing to identify repeatable ranges within a regex, then attached generator-generating classes to parse actions for each type of regex element. Some limitations: - unbounded '*' and '+' repetition is not allowed - only supports \d, \w, and \s macros = Download the latest version of this file. It is now importable as its own module, with the invert method that takes a regexp string and returns a generator that yields all the possible matching strings. This file also includes a simple count method, which returns the number of elements returned by a generator (as opposed to calling len(list(invert(...))), which generates an intermediate list just to invoke len on it). The reg exp features that have been added are: - alternation using '|' - min-max repetition using {min,max} format - '.' wildcard character Also fixed some repetition bugs, where foobar{2} was treated like (foobar){2} - now both cases are handled correctly. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating all permutations from a regexp
On 23 Dec 2006 04:23:09 -0800, Chris Johnson [EMAIL PROTECTED] wrote: BJörn Lindqvist wrote: With regexps you can search for strings matching it. For example, given the regexp: foobar\d\d\d. foobar123 would match. I want to do the reverse, from a regexp generate all strings that could match it. The regexp: [A-Z]{3}\d{3} should generate the strings AAA000, AAA001, AAA002 ... AAB000, AAB001 ... ZZZ999. Is this possible to do? Obviously, for some regexps the set of matches is unbounded (a list of everything that matches * would be very unpractical), but how would you do it for simple regexps like the one above? For a very small number of characters, it would be feasible. For any finite number of characters, it would be possible (though it wouldn't take much to take longer than the age of the universe). For reference, in your simple example, you have 17,576,000 matching strings. I'm curious as to why you would wish to do this. I certainly understand considering hard problems for their own sake, but when I formulate them, there's always some impetus that makes me say Huh. Now I wonder... I have a thousand use cases in mind. For example: 1. Generate sentences for a bot: ipermute((I|You|He|She|It|They) do( not)? (dis)?like (you|him|her|me|they)): Generates many grammatically incorrect sentences but you get the point. 2. Generate URL:s to web resources: The following should generate URL:s to all c.l.p digests from the mail archive: def download_clp(): year_re = (199\d|200[0-6]) months = [January, February, March, April, May, June, July, August, September, October, November, December] month_re = '(' + '|'.join(months) + ')' fmt = http://mail\.python\.org/pipermail/python-list/%s-%s\.txt; url_re = fmt % (year_re, month_re) for x in ipermute(url_re): print Downloading, x code to download here The same approach could be used to download all threads in a forum for example, or all articles on Slashdot. 3. Visualising regular expressions. I think it would be easier to write regular expressions if you could see what kind of data they would match. 4. Port scanning: ip_tuple = (\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]) for x in ipermute(192\.168\. + \..join([ip_tuple] * 2)): scan_ip(x) -- mvh Björn -- http://mail.python.org/mailman/listinfo/python-list
Re: some OT: how to solve this kind of problem in our program?
[EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Using Psyco this version is much faster, you can test it on your PC compared to the other one (the whole running time, Psyco compilation too): Psyco is unable to speed up generator functions, so you have to return true lists. Giving the func to the permutation function, you can avoid lot of list copying and unpacking. try: import psyco psyco.full() except ImportError: pass d0, d1 = 1, 2 def func(p): a0,a1,a2,b0,b1,b2,c0,c1,c2 = p # do application evaluation here b1b2 = 10*b1+b2 a1a2 = 10*a1+a2 c1c2 = 10*c1+c2 if d1*a0*b1b2*c1c2 + d1*b0*a1a2*c1c2 + d1*c0*a1a2*b1b2 \ == d0*a1a2*b1b2*c1c2: return sorted( [[a0, a1, a2], [b0, b1, b2], [c0, c1, c2]] ) else: return None def accepted_permutations(alist, func): # func must return None for the unacceptable results # Algoritm from Phillip Paul Fuchs, modified result = [] items = alist[:] n = len(alist) p = range(n+1) i = 1 r = func(alist) if r is not None: result.append(r) while i n: p[i] -= 1 if i 1: j = p[i] else: j = 0 alist[j], alist[i] = alist[i], alist[j] r = func(alist) if r is not None: result.append(r) i = 1 while p[i] == 0: p[i] = i i += 1 return result def main(): result = [] for aresult in accepted_permutations(range(1, 10), func): if aresult not in result: result.append(aresult) [[a0, a1, a2], [b0, b1, b2], [c0, c1, c2]] = aresult print ' %0d %0d %0d %0d' % (a0, b0, c0, d0) print '--- + --- + --- = ---' print ' %0d%0d%0d%0d%0d%0d %0d'%(a1,a2,b1,b2,c1,c2,d1) print main() Bye, bearophile Nice and neat. I guess what appeals to me is that this is essentially a brute force approach. Instead of a goal-seeking constraint solver, this just brute force tries every possible permutation. Of course, this breaks down quickly when the size of the input list grows large, but the OP was trying to work with permutations of the digits 1-9 using an unwieldy nesting of for loops and set manipulations. Using permutations is no more or less smart of an algorithm than in the original post, it just cleans up the for loops and the set arithmetic. For example, for all the complexity in writing Sudoku solvers, there are fewer than 3.3 million possible permutations of 9 rows of the digits 1-9, and far fewer permutations that match the additional column and box constraints. Why not just compute the set of valid solutions, and compare an input mask with these? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
BeautifulSoup vs. loose chars
I've been parsing existing HTML with BeautifulSoup, and occasionally hit content which has something like Design Advertising, that is, an instead of an amp;. Is there some way I can get BeautifulSoup to clean those up? There are various parsing options related to handling, but none of them seem to do quite the right thing. If I write the BeautifulSoup parse tree back out with prettify, the loose is still in there. So the output is rejected by XML parsers. Which is why this is a problem. I need valid XML out, even if what went in wasn't quite valid. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python never add itself to the Windows path?
Ben Sizer wrote: I've installed several different versions of Python across several different versions of MS Windows, and not a single time was the Python directory or the Scripts subdirectory added to the PATH environment variable. Every time, I've had to go through and add this by hand, to have something resembling a usable Python installation. No such problems on Linux, whether it be Mandrake/Mandriva, Fedora Core, or Kubuntu. So why is the Windows install half-crippled by default? I just rediscovered this today when trying to run one of the Turbogears scripts, but this has puzzled me for years now. Well, after Python is installed on a Windows platform, files with extention .py or .pyw are automatically associated with python or pythonw. If a python script is double-clicked or input something like sth.py in the cmd box, the python interpreter is automatically called. I don't see any proplem or inconvenience with this. -- http://mail.python.org/mailman/listinfo/python-list
Re: Why does Python never add itself to the Windows path?
WaterWalk wrote: Ben Sizer wrote: I've installed several different versions of Python across several different versions of MS Windows, and not a single time was the Python directory or the Scripts subdirectory added to the PATH environment variable. Every time, I've had to go through and add this by hand, to have something resembling a usable Python installation. No such problems on Linux, whether it be Mandrake/Mandriva, Fedora Core, or Kubuntu. So why is the Windows install half-crippled by default? I just rediscovered this today when trying to run one of the Turbogears scripts, but this has puzzled me for years now. Well, after Python is installed on a Windows platform, files with extention .py or .pyw are automatically associated with python or pythonw. If a python script is double-clicked or input something like sth.py in the cmd box, the python interpreter is automatically called. I don't see any proplem or inconvenience with this. In the command line, entering python does not run the python interpreter (unless you modify the settings yourself). -- http://mail.python.org/mailman/listinfo/python-list
Re: textwrap.dedent replaces tabs?
Frederic Rentsch wrote: It this works, good for you. I can't say I understand your objective. (You dedent common leading tabs, except if preceded by common leading spaces (?)). I dedent common leading whitespace, and tabs aren't equivalent to spaces. E.g. if some text is indented exclusively with tabs, then the leading tabs are stripped appropriately. If some other text is indented with common leading spaces, those are stripped appropriately. If the text to be stripped has some lines starting with spaces and others starting with tabs, there are no /common/ leading whitespace characters, and thus nothing is stripped. Neither do I understand the existence of indentations made up of tabs mixed with spaces, but that is another topic. At one point it was a fairly common cry in the How To Indent Python discussions. Maybe that cry has faded. I have been wasting a lot of time with things of this nature coding away before forming a clear conception in my mind of what my code was supposed to accomplish. Sounds stupid. Doesn't sound stupid, but there are in fact some fairly straight forward methods that can be put in place to alleviate that problem. The encounter with the devil in the details can be put off but not avoided. Best to get it over with from the start and write an exhaustive formal description of the problem. Follows an exhaustive formal description of the rules for its solution. Good lord, that's an amazingly 1970s way to look at programming! Modern software engineering practices have in some ways made these problems go away. In other words, coding should be the translation of a logical system into a language a machine understands. It should not be the construction of the logical system. This, anyway, is the conclusion I have arrived at, to my advantage I believe. To each their own, eh? I've been doing this a long time and have found that it is by far superior (for me) to refine the logical system as it is being implemented, as long as the business rules are encoded in such a way as to disallow the programmer from straying beyond them. My unit tests are far from exhaustive, but with code this simple it didn't seem terribly important since I was doing it more as a proof of concept, proving that I could do this sort of thing in not-many-more- lines-than-the-original-code-that-does-not-operate-to-its-published- specification. -tom! -- -- http://mail.python.org/mailman/listinfo/python-list
Re: ANNOUNCE: Mod_python 3.3.0b (Beta)
derekl00 wrote: Gregory (Grisha) Trubetskoy wrote: The Apache Software Foundation and The Apache HTTP Server Project are pleased to announce the 3.3.0b (Beta) release of mod_python. How long does it usually take for these things to make there way into the Fedora (or other distro) repositories? Given that this is a beta, I would hope that the distro people don't get on to it quickly or at all. It was a right pain the last time a beta of mod_python was released and RPMs got distributed, as it took us a lot longer to get rid of the beta version out of the pipeline when the non beta was actually released. Thus, if you really want to try out this beta, compile it from source code, otherwise wait for the official non beta release. If you don't like compiling from source, then you possibly aren't the sort of person who should be using this beta version in the first place, especially when compilation on different platforms is always a part of what we want tested in releasing a beta. Graham -- http://mail.python.org/mailman/listinfo/python-list
[ANN] ftputil 2.2 released
ftputil 2.2 is now available from http://ftputil.sschwarzer.net/download . Changes since version 2.1 - - Results of stat calls (also indirect calls, i. e. listdir, isdir/isfile/islink, exists, getmtime etc.) are now cached and reused. This results in remarkable speedups for many use cases. Thanks to Evan Prodromou for his permission to add his lrucache module under ftputil's license. - The current directory is also locally cached, resulting in further speedups. - It's now possible to write and plug in custom parsers for directory formats which ftputil doesn't support natively. - File-like objects generated via ``FTPHost.file`` now support the iterator protocol (for line in some_file: ...). - The documentation has been updated accordingly. Read it under http://ftputil.sschwarzer.net/trac/wiki/Documentation . Possible incompatibilities: - This release requires at least Python 2.3. (Previous releases worked with Python versions from 2.1 up.) - The method ``FTPHost.set_directory_format`` has been removed, since the directory format (Unix or MS) is set automatically. (The new method ``set_parser`` is a different animal since it takes a parser object to parse foreign formats, not a string.) What is ftputil? ftputil is a high-level FTP client library for the Python programming language. ftputil implements a virtual file system for accessing FTP servers, that is, it can generate file-like objects for remote files. The library supports many functions similar to those in the os, os.path and shutil modules. ftputil has convenience functions for conditional uploads and downloads, and handles FTP clients and servers in different timezones. License --- ftputil 2.2 is Open Source software, released under the revised BSD license (see http://www.opensource.org/licenses/bsd-license.php ). Stefan -- Dr.-Ing. Stefan Schwarzer SSchwarzer.com - Softwareentwicklung f??r Technik und Wissenschaft http://sschwarzer.com -- http://mail.python.org/mailman/listinfo/python-list
Re: How to stop program when threads is sleeping
Carsten Haese wrote: On Sun, 2006-12-24 at 22:55 -0800, many_years_after wrote: Hi, pythoners: There is a problem I couldn't dispose. I start a thread in the my program. The thread will do something before executing time.sleep(). When the user give a signal to the main thread (such as click the 'end' button or close the window), the thread should end it's running. But how to end the threading when it's sleeping? I set an flag to the thread, but it doesn't work. Is the thread supposed to do some additional work after being woken up? If not, there is no point in going to sleep in the first place and the thread should just terminate when it has completed its task. If yes, I'd use a threading.Event object to .wait() on in the sub-thread rather than putting it to sleep, and then .set() the event object in the main thread when it's time to wake up the sub-thread. Hope this helps, Carsten. While , there is something wrong in my expression. What I mean is the thread will wait some time after doing some tasks. I want to know is there any method to end the thread or make it out of execution of waiting. I use time.sleep() to let the thread wait. Thanks. -- http://mail.python.org/mailman/listinfo/python-list
Can Python help?
On my website I allow users to upload files. I would like a user to see how much time is left before a file is uploaded. So, I would like to have a progress bar during a file uploading. Can Python help me with that?Or how can be a progress bar made? Thank you for ideas. La. -- http://mail.python.org/mailman/listinfo/python-list
How to depress the output of an external module ?
Hi, I'm writing a program which imports an external module writing in C and calls a function provided by the module to do my job. But the method produces a lot of output to the stdout, and this consumes most of the running time. My question is, is there a way to depress the output produced by the function and hence make my program run faster? It's too complicated for me to modify the source code and recompile the external module. Any hints will be greatly appreciated. Regards, xiaojf -- http://mail.python.org/mailman/listinfo/python-list
[ python-Bugs-1605110 ] logging %(module)s reporting wrong modules
Bugs item #1605110, was opened at 2006-11-29 01:29 Message generated for change (Comment added) made by sf-robot You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=105470aid=1605110group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Closed Resolution: Invalid Priority: 5 Private: No Submitted By: Pieter Zieschang (mad-marty) Assigned to: Vinay Sajip (vsajip) Summary: logging %(module)s reporting wrong modules Initial Comment: I recently upgraded from python 2.4.2 to 2.4.4 and the logging seems to be working wrong now. I have a formatter which uses the %(module)s and %(filename)s and the point to the wrong file/module. I have some plugins in .py files, which mainly have one class derived from threading.Thread. Those classes logging calls will now log as 2006-11-29 10:17:50 - threading.py - threading - INFO - ... instead of 2006-11-29 10:17:50 - myplugin.py - myplugin - INFO - ... -- Comment By: SourceForge Robot (sf-robot) Date: 2006-12-25 19:20 Message: Logged In: YES user_id=1312539 Originator: NO This Tracker item was closed automatically by the system. It was previously set to a Pending status, and the original submitter did not respond within 14 days (the time period specified by the administrator of this Tracker). -- Comment By: Vinay Sajip (vsajip) Date: 2006-12-11 07:05 Message: Logged In: YES user_id=308438 Originator: NO I'm not sure this should be treated as a logging bug - after all, psyco is not part of standard Python and logging is only tested as a part of standard Python. Possibly this should be logged under psyco rather than Python logging. Meanwhile, if time permits I will take a look at this. -- Comment By: Pieter Zieschang (mad-marty) Date: 2006-12-01 17:09 Message: Logged In: YES user_id=1269426 Originator: YES Hi, after some investigation, I think I found the source. Just add 'import psyco; psyco.full()' to test.py aufer imports and you get the same problem with your example. It seems, logging is not compatible with the way psyco creates proxy functions. Could be that sys._getframe returns something different. - just a guess But it works with the old logging. Is there any other information you may want ? -- Comment By: Vinay Sajip (vsajip) Date: 2006-11-30 01:18 Message: Logged In: YES user_id=308438 Originator: NO I need more information. For example (N.B. lines may wrap, please adjust if copy/pasting the code below): #-- test.py import module import logging logging.basicConfig(level=logging.DEBUG, format=%(relativeCreated)-6d %(module)s %(filename)s %(lineno)d - %(message)s) logging.getLogger(test).debug(Test starting, about to start thread...) threads = module.start() for t in threads: t.join() logging.getLogger(test).debug(All done.) #-- test.py ends #-- module.py import logging import threading import random import time class MyThread(threading.Thread): def run(self): loops = 5 while True: logging.getLogger(module).debug(Running in thread: %s, threading.currentThread().getName()) time.sleep(random.random()) loops -= 1 if loops 0: break class MyOtherThread(threading.Thread): def run(self): loops = 5 while True: logging.getLogger(module).debug(Running in thread: %s, threading.currentThread().getName()) time.sleep(random.random()) loops -= 1 if loops 0: break def start(): t1 = MyThread(name=Thread One) t2 = MyOtherThread(name=Thread Two) t1.start() t2.start() return t1, t2 #-- module.py ends When I run test, I get the following output: 15 test test.py 7 - Test starting, about to start thread... 15 module module.py 11 - Running in thread: Thread One 15 module module.py 22 - Running in thread: Thread Two 327module module.py 11 - Running in thread: Thread One 343module module.py 22 - Running in thread: Thread Two 655module module.py 11 - Running in thread: Thread One 780module module.py 22 - Running in thread: Thread Two 1000 module module.py 11 - Running in thread: Thread One 1546 module module.py 22 - Running in thread: Thread Two 1890 module module.py 11 - Running in thread: Thread One 2046 module module.py 11 - Running in thread: Thread One 2218 module module.py 22 - Running in thread: Thread Two 2562 module module.py 22 - Running in thread: Thread Two 3187 test