[Python-announce] logmerger 0.8.0 released
logmerger 0.8.0 === New features: - Added --inline command line option to view merged logs in a single inline column instead of side-by-side columns (side-by-side is the default) - Added jump feature to move by number of lines or by a time period in microseconds, milliseconds, seconds, minutes, hours or days Fixes: - Fixed type annotations that broke running logmerger on Python 3.9. Screenshot: https://github.com/ptmcg/logmerger/blob/main/static/log1_log2_merged_tui_lr.jpg?raw=true Use logmerger to view multiple log files, merged side-by-side with a common timeline using timestamps from the input files. - merge ASCII log files - detects various formats of timestamps - detects multiline log messages - merge .gz files without previously gunzip'ing - merge .pcap files - merge .csv files Browse the merged logs using a textual-based TUI: - vertical scrolling - horizontal scrolling - search/find next/find previous - jump by number of lines or by time interval - go to line - go to timestamp TUI runs in a plain terminal window, so can be run over a regular SSH session. Installation Install from PyPi; pip install logmerger For PCAP merging support: pip install logmerger[pcap] Github repo: https://github.com/ptmcg/logmerger ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
[Python-announce] logmerger 0.7.0
logmerger 0.7.0 === Screenshot: https://github.com/ptmcg/logmerger/blob/main/static/log1_log2_merged_tui_lr.jpg?raw=true Use logmerger to view multiple log files, merged side-by-side with a common timeline using timestamps from the input files. - merge ASCII log files - detects various formats of timestamps - detects multiline log messages - merge .gz files without previously gunzip'ing - merge .pcap files - merge .csv files Browse the merged logs using a textual-based TUI: - vertical scrolling - horizontal scrolling - search/find next/find previous - go to line - go to timestamp TUI runs in a plain terminal window, so can be run over a regular SSH session. Installation Install from PyPi; pip install logmerger For PCAP merging support: pip install logmerger[pcap] Github repo: https://github.com/ptmcg/logmerger ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
[Python-announce] pyparsing 3.1.1 released
Thanks everyone for the great feedback on the 3.1.0 release! Caught some glaring regressions that slipped through my test suite. Just published version 3.1.1: https://github.com/pyparsing/pyparsing/releases/tag/3.1.1 - Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! (Issue #502) - Fixed bug in bad exception messages raised by Forward expressions. PR submitted by Kyle Sunden, thanks for your patience and collaboration on this (#493). - Fixed regression in SkipTo, where ignored expressions were not checked when looking for the target expression. Reported by catcombo, Issue #500. - Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, thanks! (Issue #498) - Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488) ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
[Python-announce] pyparsing 3.1.0 released
After several alpha and beta releases, I've finally pushed out version 3.1.0 of pyparsing. Here are the highlights: NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as `ParserElement.parseString`) will start to raise `DeprecationWarnings`. 3.2.0 should get released some time later in 2023. I currently plan to completely drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release until at least late 2023 if not 2024. So there is plenty of time to convert existing parsers to the new function names before the old functions are completely removed. (Big help from Devin J. Pohly in structuring the code to enable this peaceful transition.) Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7. Version 3.1.0 - June, 2023 -- API CHANGES --- - A slight change has been implemented when unquoting a quoted string parsed using the `QuotedString` class. Formerly, when unquoting and processing whitespace markers such as \t and \n, these substitutions would occur first, and then any additional '\' escaping would be done on the resulting string. This would parse "\\n" as "\". Now escapes and whitespace markers are all processed in a single pass working left to right, so the quoted string "\\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue #474 raised by jakeanq, thanks! - Reworked `delimited_list` function into the new `DelimitedList` class. `DelimitedList` has the same constructor interface as `delimited_list`, and in this release, `delimited_list` changes from a function to a synonym for `DelimitedList`. `delimited_list` and the older `delimitedList` method will be deprecated in a future release, in favor of `DelimitedList`. - `ParserElement.validate()` is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such as `ParserElement.set_debug()` and `ParserElement.create_diagram()`. (Raised in Issue #444, thanks Andrea Micheli!) NEW FEATURES AND ENHANCEMENTS - - `Optional(expr)` may now be written as `expr | ""` This will make this code: "{" + Optional(Literal("A") | Literal("a")) + "}" writable as: "{" + (Literal("A") | Literal("a") | "") + "}" Some related changes implemented as part of this work: - `Literal("")` now internally generates an `Empty()` (and no longer raises an exception) - `Empty` is now a subclass of `Literal` Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly. - Added new class method `ParserElement.using_each`, to simplify code that creates a sequence of `Literals`, `Keywords`, or other `ParserElement` subclasses. For instance, to define suppressible punctuation, you would previously write: LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};") You can now write: LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};") `using_each` will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression: algebra_var = MatchFirst( Char.using_each(string.ascii_lowercase, as_keyword=True) ) - Added new builtin `python_quoted_string`, which will match any form of single-line or multiline quoted strings defined in Python. (Inspired by discussion with Andreas Schörgenhumer in Issue #421.) - Extended `expr[]` notation for repetition of `expr` to accept a slice, where the slice's stop value indicates a `stop_on` expression: test = "BEGIN aaa bbb ccc END" BEGIN, END = Keyword.using_each("BEGIN END".split()) body_word = Word(alphas) expr = BEGIN + Group(body_word[...:END]) + END # equivalent to # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END print(expr.parse_string(test)) Prints: ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END'] - Added named field "url" to `pyparsing.common.url`, returning the entire parsed URL string. - Added bool `embed` argument to `ParserElement.create_diagram()`. When passed as True, the resulting diagram will omit the ``, ``, and `` tags so that it can be embedded in other HTML source. (Useful when embedding a call to `create_diagram()` in a PyScript HTML page.) - Added `recurse` argument to `ParserElement.set_debug` to set the debug flag on an expression and all of its sub-expressions. Requested by multimeric in Issue #399. - Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars. - `ParseResults` now has a new method `deepcopy()`, in addition to the current `copy()` method. `copy()` only makes a shallow copy - any contained `ParseResults` are copied as references -
[Python-announce] Pyparsing 3.1.0b2 released (final beta!)
I just pushed release 3.1.0b2 of pyparsing. 3.1.0 with some fixes to bugs that came up in the past few weeks - testing works! If your project uses pyparsing, please please *please* download this beta release (using "pip install -U pyparsing==3.1.0b2") and open any compatibility issues you might have at the pyparsing GitHub repo (https://github.com/pyparsing/pyparsing). In the absence of any dealbreakers, I'll make the final release in June. You can view the changes here: https://github.com/pyparsing/pyparsing/blob/master/CHANGES ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
[Python-announce] Pyparsing 3.1.0b1 released
I just pushed release 3.1.0b1 of pyparsing. 3.1.0 will include support for python 3.12, and will be the last release to support 3.6 and 3.7. If your project uses pyparsing, *please* download this beta release (using "pip install -U pyparsing==3.1.0b1") and open any compatibility issues you might have at the pyparsing GitHub repo (https://github.com/pyparsing/pyparsing). You can view the changes here: https://github.com/pyparsing/pyparsing/blob/master/CHANGES ___ Python-announce-list mailing list -- python-announce-list@python.org To unsubscribe send an email to python-announce-list-le...@python.org https://mail.python.org/mailman3/lists/python-announce-list.python.org/ Member address: arch...@mail-archive.com
[issue27822] Fail to create _SelectorTransport with unbound socket
Paul McGuire added the comment: Patch file attached. -- keywords: +patch Added file: http://bugs.python.org/file44182/ptm_27822.patch ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27822> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27746] ResourceWarnings in test_asyncio
Paul McGuire added the comment: Ok, I will submit as a separate issue. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27746> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27822] Fail to create _SelectorTransport with unbound socket
Paul McGuire added the comment: (issue applies to both 3.5.2 and 3.6) -- versions: +Python 3.5 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27822> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27746] ResourceWarnings in test_asyncio
Paul McGuire added the comment: I was about to report this same issue - I get the error message even though I explicitly call transport.close(): C:\Python35\lib\asyncio\selector_events.py:582: ResourceWarning: unclosed transport <_SelectorDatagramTransport closing fd=232> It looks like the _sock attribute of the Transport subclasses must be set to None in their close() methods. (The presence of a non-None _sock is used elsewhere as an indicator of whether the transport has been closed or not. -- nosy: +Paul McGuire ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27746> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27822] Fail to create _SelectorTransport with unbound socket
Paul McGuire added the comment: To clarify how I'm using a socket without a bound address, I am specifying the destination address in the call to transport.sendto(), so there is no address on the socket itself, hence getsockname() fails. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27822> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27822] Fail to create _SelectorTransport with unbound socket
New submission from Paul McGuire: In writing a simple UDP client using asyncio, I tripped over a call to getsockname() in the _SelectorTransport class in asyncio/selector_events.py. def __init__(self, loop, sock, protocol, extra=None, server=None): super().__init__(extra, loop) self._extra['socket'] = sock self._extra['sockname'] = sock.getsockname() Since this is a sending-only client, the socket does not get bound to an address. On Linux, this is not a problem; getsockname() will return ('0.0.0.0', 0) for IPV4, ('::', 0, 0, 0) for IPV6, and so on. But on Windows, a socket that is not bound to an address will raise this error when getsockname() is called: OSError: [WinError 10022] An invalid argument was supplied This forces me to write a wrapper for the socket to intercept getsockname() and return None. In asyncio/proactor_events.py, this is guarded against, with this code in the _ProactorSocketTransport class: try: self._extra['sockname'] = sock.getsockname() except (socket.error, AttributeError): if self._loop.get_debug(): logger.warning("getsockname() failed on %r", sock, exc_info=True) Please add similar guarding code to the _SelectorTransport class in asyncio/selector_events.py. -- components: asyncio messages: 273290 nosy: Paul McGuire, gvanrossum, haypo, yselivanov priority: normal severity: normal status: open title: Fail to create _SelectorTransport with unbound socket type: behavior versions: Python 3.6 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27822> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
Re: Python re to extract useful information from each line
Here is a first shot at a pyparsing parser for these lines: from pyparsing import * SET,POLICY,ID,FROM,TO,NAT,SRC,DST,IP,PORT,SCHEDULE,LOG,PERMIT,ALLOW,DENY = map(CaselessKeyword, SET,POLICY,ID,FROM,TO,NAT,SRC,DST,IP,PORT,SCHEDULE,LOG,PERMIT,ALLOW,DENY.split(',')) integer = Word(nums) ipAddr = Combine(integer + ('.'+integer)*3) quotedString.setParseAction(removeQuotes) logParser = (SET + POLICY + ID + integer(id) + FROM + quotedString(from_) + TO + quotedString(to_) + quotedString(service)) I run this with: for line in 1- set policy id 1000 from Untrust to Trust Any 1.1.1.1 HTTP nat dst ip 10.10.10.10 port 8000 permit log 2- set policy id 5000 from Trust to Untrust Any microsoft.com HTTP nat src permit schedule 14August2014 log 3- set policy id 7000 from Trust to Untrust Users Any ANY nat src dip-id 4 permit log 4- set policy id 7000 from Trust to Untrust servers Any ANY deny .splitlines(): line = line.strip() if not line: continue print (integer + '-' + logParser).parseString(line).dump() print Getting: ['1', '-', 'SET', 'POLICY', 'ID', '1000', 'FROM', 'Untrust', 'TO', 'Trust', 'Any'] - from_: Untrust - id: 1000 - service: Any - to_: Trust ['2', '-', 'SET', 'POLICY', 'ID', '5000', 'FROM', 'Trust', 'TO', 'Untrust', 'Any'] - from_: Trust - id: 5000 - service: Any - to_: Untrust ['3', '-', 'SET', 'POLICY', 'ID', '7000', 'FROM', 'Trust', 'TO', 'Untrust', 'Users'] - from_: Trust - id: 7000 - service: Users - to_: Untrust ['4', '-', 'SET', 'POLICY', 'ID', '7000', 'FROM', 'Trust', 'TO', 'Untrust', 'servers'] - from_: Trust - id: 7000 - service: servers - to_: Untrust Pyparsing adds Optional classes so that you can include expressions for pieces that might be missing like ... + Optional(NAT + (SRC | DST)) + ... -- Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: python command not working
On Friday, August 14, 2015 at 6:13:37 AM UTC-5, sam.h...@gmail.com wrote: On Wednesday, April 22, 2009 at 8:36:21 AM UTC+1, David Cournapeau wrote: On Wed, Apr 22, 2009 at 4:20 PM, 83nini 83n...@gmail.com wrote: Hi guys, I'm new to python, i downloaded version 2.5, opened windows (vista) command line and wrote python, this should take me to the python snip You can do it easily by adding the Python path (in my case C:\Python27) to your system PATH. This thread is 6 years old, OP has probably gone on to other things... -- https://mail.python.org/mailman/listinfo/python-list
Who is using littletable?
littletable is a little module I knocked together a few years ago, found it sort of useful, so uploaded to SF and PyPI. The download traffic at SF is very light, as I expected, but PyPI shows 3000 downloads in the past month! Who *are* all these people? In my own continuing self-education, it is interesting to see overlap in the basic goals in littletable, and the much more widely known pandas module (with littletable being more lightweight/freestanding, not requiring numpy, but correspondingly not as snappy). I know Adam Sah uses (or at least used to use) littletable as an in-memory product catalog for his website Buyer's Best friend (http://www.bbfdirect.com/). Who else is out there, and what enticed you to use this little module? -- Paul -- https://mail.python.org/mailman/listinfo/python-list
ANN: pyparsing 2.0.2 released
I'm happy to announce a new release of pyparsing, version 2.0.2. This release contains some small enhancements and some bugfixes. Change summary: --- - Extended expr(name) shortcut (same as expr.setResultsName(name)) to accept expr() as a shortcut for expr.copy(). - Added locatedExpr(expr) helper, to decorate any returned tokens with their location within the input string. Adds the results names locn_start and locn_end to the output parse results. - Added pprint() method to ParseResults, to simplify troubleshooting and prettified output. Now instead of importing the pprint module and then writing pprint.pprint(result), you can just write result.pprint(). This method also accepts additional positional and keyword arguments (such as indent, width, etc.), which get passed through directly to the pprint method (see http://docs.python.org/2/library/pprint.html#pprint.pprint). - Removed deprecation warnings when using '' for Forward expression assignment. '=' is still preferred, but '' will be retained for cases where '=' operator is not suitable (such as in defining lambda expressions). - Expanded argument compatibility for classes and functions that take list arguments, to now accept generators as well. - Extended list-like behavior of ParseResults, adding support for append and extend. NOTE: if you have existing applications using these names as results names, you will have to access them using dict-style syntax: res[append] and res[extend] - ParseResults emulates the change in list vs. iterator semantics for methods like keys(), values(), and items(). Under Python 2.x, these methods will return lists, under Python 3.x, these methods will return iterators. - ParseResults now has a method haskeys() which returns True or False depending on whether any results names have been defined. This simplifies testing for the existence of results names under Python 3.x, which returns keys() as an iterator, not a list. - ParseResults now supports both list and dict semantics for pop(). If passed no argument or an integer argument, it will use list semantics and pop tokens from the list of parsed tokens. If passed a non-integer argument (most likely a string), it will use dict semantics and pop the corresponding value from any defined results names. A second default return value argument is supported, just as in dict.pop(). - Fixed bug in markInputline, thanks for reporting this, Matt Grant! - Cleaned up my unit test environment, now runs with Python 2.6 and 3.3. Download pyparsing 2.0.2 at http://sourceforge.net/projects/pyparsing/, or use 'easy_install pyparsing'. You can also access pyparsing's epydoc documentation online at http://packages.python.org/pyparsing/. The pyparsing Wiki is at http://pyparsing.wikispaces.com. -- Paul Pyparsing is a pure-Python class library for quickly developing recursive-descent parsers. Parser grammars are assembled directly in the calling Python code, using classes such as Literal, Word, OneOrMore, Optional, etc., combined with operators '+', '|', and '^' for And, MatchFirst, and Or. No separate code-generation or external files are required. Pyparsing can be used in many cases in place of regular expressions, with shorter learning curve and greater readability and maintainability. Pyparsing comes with a number of parsing examples, including: - Hello, World! (English, Korean, Greek, and Spanish(new)) - chemical formulas - Verilog parser - Google protobuf parser - time expression parser/evaluator - configuration file parser - web page URL extractor - 5-function arithmetic expression parser - subset of CORBA IDL - chess portable game notation - simple SQL parser - Mozilla calendar file parser - EBNF parser/compiler - Python value string parser (lists, dicts, tuples, with nesting) (safe alternative to eval) - HTML tag stripper - S-expression parser - macro substitution preprocessor -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] pyparsing 2.0.1 released - compatible with Python 2.6 and later
In my releasing of Pyparsing 1.5.7/2.0.0 last November, I started to split supported Python versions: 2.x to the Pyparsing 1.5.x track, and 3.x to the Pyparsing 2.x track. Unfortunately, this caused a fair bit of pain for many current users of Python 2.6 and 2.7 (especially those using libs dependent on pyparsing), as the default installed pyparsing version using easy_install or pip would be the incompatible-to-them pyparsing 2.0.0. I hope I have rectified (or at least improved) this situation with the latest release of pyparsing 2.0.1. Version 2.0.1 takes advantage of the cross-major-version compatibility that was planned into Python, wherein many of the new features of Python 3.x were made available in Python 2.6 and 2.7. By avoiding the one usage of ‘nonlocal’ (a Python 3.x feature not available in any Python 2.x release), I’ve been able to release pyparsing 2.0.1 in a form that will work for all those using Python 2.6 and later. (If you are stuck on version 2.5 or earlier of Python, then you still have to explicitly download the 1.5.7 version of pyparsing.) This release also includes a bugfix to the new ‘=’ operator, so that ‘’ for attachment of parser definitions to Forward instances can be deprecated in favor of ‘=’. Hopefully, most current users using pip and easy_install can now just install pyparsing 2.0.1, and it will be sufficiently version-aware to function under all Pythons 2.6 and later. Thanks for your continued support and interest in pyparsing! -- Paul McGuire -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
[ANN] pyparsing 2.0.1 released - compatible with Python 2.6 and later
In my releasing of Pyparsing 1.5.7/2.0.0 last November, I started to split supported Python versions: 2.x to the Pyparsing 1.5.x track, and 3.x to the Pyparsing 2.x track. Unfortunately, this caused a fair bit of pain for many current users of Python 2.6 and 2.7 (especially those using libs dependent on pyparsing), as the default installed pyparsing version using easy_install or pip would be the incompatible-to-them pyparsing 2.0.0. I hope I have rectified (or at least improved) this situation with the latest release of pyparsing 2.0.1. Version 2.0.1 takes advantage of the cross-major-version compatibility that was planned into Python, wherein many of the new features of Python 3.x were made available in Python 2.6 and 2.7. By avoiding the one usage of ‘nonlocal’ (a Python 3.x feature not available in any Python 2.x release), I’ve been able to release pyparsing 2.0.1 in a form that will work for all those using Python 2.6 and later. (If you are stuck on version 2.5 or earlier of Python, then you still have to explicitly download the 1.5.7 version of pyparsing.) This release also includes a bugfix to the new ‘=’ operator, so that ‘’ for attachment of parser definitions to Forward instances can be deprecated in favor of ‘=’. Hopefully, most current users using pip and easy_install can now just install pyparsing 2.0.1, and it will be sufficiently version-aware to function under all Pythons 2.6 and later. Thanks for your continued support and interest in pyparsing! -- Paul McGuire -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem installing Pyparsing
Pyparsing 2.0.1 fixes this incompatibility, and should work with all versions of Python 2.6 and later. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
ANNOUNCE: pyparsing 1.5.7/2.0.0
With the release of version 2.0.0/1.5.7, pyparsing has now officially switched to Python 3.x support as its default installation environment. Python 2.x users can install the latest 1.5.7 release. (If you're using easy_install, do easy_install pyparsing==1.5.7.) I'm taking this opportunity to do some minor API tweaking too, renaming some operators and method names that I got wrong earlier (the old operators and methods are still there for now for compatibility, but they are deprecated to be removed in a future release): - Added new operator '=', which will eventually replace '' for storing the contents of a Forward(). '=' does not have the same operator precedence problems that '' does. - 'operatorPrecedence' is being renamed 'infixNotation' as a better description of what this helper function creates. 'operatorPrecedence' is deprecated, and will be dropped entirely in a future release. Several bug-fixes are included, plus several new examples, *and* an awesome example submitted by Luca DellOlio, for parsing ANTLR grammar definitions and implementing them with pyparsing objects. --- Pyparsing wiki: pyparsing.wikispaces.com SVN checkout: (latest) svn checkout https://pyparsing.svn.sourceforge.net/svnroot/pyparsing/trunk pyparsing (1.5.x branch) svn checkout https://pyparsing.svn.sourceforge.net/svnroot/pyparsing/branches/pyparsing_1.5.x pyparsing -- http://mail.python.org/mailman/listinfo/python-list
ANN: pyparsing 1.5.6 released!
After about 10 months, there is a new release of pyparsing, version 1.5.6. This release contains some small enhancements, some bugfixes, and some new examples. Most notably, this release includes the first public release of the Verilog parser. I have tired of restricting this parser for commercial use, and so I am distributing it under the same license as pyparsing, with the request that if you use it for commmercial use, please make a commensurate donation to your local Red Cross. Change summary: --- - Cleanup of parse action normalizing code, to be more version- tolerant, and robust in the face of future Python versions - much thanks to Raymond Hettinger for this rewrite! - Removal of exception cacheing, addressing a memory leak condition in Python 3. Thanks to Michael Droettboom and the Cape Town PUG for their analysis and work on this problem! - Fixed bug when using packrat parsing, where a previously parsed expression would duplicate subsequent tokens - reported by Frankie Ribery on stackoverflow, thanks! - Added 'ungroup' helper method, to address token grouping done implicitly by And expressions, even if only one expression in the And actually returns any text - also inspired by stackoverflow discussion with Frankie Ribery! - Fixed bug in srange, which accepted escaped hex characters of the form '\0x##', but should be '\x##'. Both forms will be supported for backwards compatibility. - Enhancement to countedArray, accepting an optional expression to be used for matching the leading integer count - proposed by Mathias on the pyparsing mailing list, good idea! - Added the Verilog parser to the provided set of examples, under the MIT license. While this frees up this parser for any use, if you find yourself using it in a commercial purpose, please consider making a charitable donation as described in the parser's header. - Added the excludeChars argument to the Word class, to simplify defining a word composed of all characters in a large range except for one or two. Suggested by JesterEE on the pyparsing wiki. - Added optional overlap parameter to scanString, to return overlapping matches found in the source text. - Updated oneOf internal regular expression generation, with improved parse time performance. - Slight performance improvement in transformString, removing empty strings from the list of string fragments built while scanning the source text, before calling ''.join. Especially useful when using transformString to strip out selected text. - Enhanced form of using the expr('name') style of results naming, in lieu of calling setResultsName. If name ends with an '*', then this is equivalent to expr.setResultsName('name',listAllMatches=True). - Fixed up internal list flattener to use iteration instead of recursion, to avoid stack overflow when transforming large files. - Added other new examples: . protobuf parser - parses Google's protobuf language . btpyparse - a BibTex parser contributed by Matthew Brett, with test suite test_bibparse.py (thanks, Matthew!) . groupUsingListAllMatches.py - demo using trailing '*' for results names Download pyparsing 1.5.6 at http://sourceforge.net/projects/pyparsing/, or use 'easy_install pyparsing'. You can also access pyparsing's epydoc documentation online at http://packages.python.org/pyparsing/. The pyparsing Wiki is at http://pyparsing.wikispaces.com. -- Paul Pyparsing is a pure-Python class library for quickly developing recursive-descent parsers. Parser grammars are assembled directly in the calling Python code, using classes such as Literal, Word, OneOrMore, Optional, etc., combined with operators '+', '|', and '^' for And, MatchFirst, and Or. No separate code-generation or external files are required. Pyparsing can be used in many cases in place of regular expressions, with shorter learning curve and greater readability and maintainability. Pyparsing comes with a number of parsing examples, including: - Hello, World! (English, Korean, Greek, and Spanish(new)) - chemical formulas - Verilog parser - Google protobuf parser - time expression parser/evaluator - configuration file parser - web page URL extractor - 5-function arithmetic expression parser - subset of CORBA IDL - chess portable game notation - simple SQL parser - Mozilla calendar file parser - EBNF parser/compiler - Python value string parser (lists, dicts, tuples, with nesting) (safe alternative to eval) - HTML tag stripper - S-expression parser - macro substitution preprocessor -- http://mail.python.org/mailman/listinfo/python-list
[ANN]littletable 0.3 release
Announcing the 0.3 release of littletable (the module formerly known as dulce). The version includes (thanks to much help from Colin McPhail, thanks Colin!): - support for namedtuples as table objects - Python 3 compatibility - Table.pivot() to summarize record counts by 1 or 2 table attributes littletable (formerly dulce) is a simple ORM-like wrapper for managing collections of Python objects like relational tables. No schema definition is used; instead table columns are introspected from the attributes of objects inserted into the table, and inferred from index and query parameters. Tables can be: - indexed - queried - joined - pivoted - imported from/exported to .CSV files Also, every query or join returns a new full-fledged littletable Table - no distinction of Tables vs. DataSets vs. RecordSets vs. whatever. So it is easy to build up a complex database analysis from a succession of joins and queries. littletable is a simple environment for experimenting with tables, joins, and indexing, with a minimum of startup overhead. You can download littletable at http://sourceforge.net/projects/littletable/ - htmldocs can be viewed at http://packages.python.org/littletable/. -- http://mail.python.org/mailman/listinfo/python-list
ANN: dulce 0.1 - in-memory schema-less relational database
dulce is a syntactic sweet wrapper for managing collections of Python objects like relational tables. No schema definition is used; instead table columns are introspected from the attributes of objects inserted into the table, and inferred from index and query parameters. dulce's Tables can be: - indexed - queried - joined - imported from/exported to .CSV files Also, every query or join returns a new full-fledged dulce Table - no distinction of Tables vs. DataSets vs. RecordSets vs. whatever. So it is easy to build up a complex database analysis from a succession of joins and queries. dulce is a simple environment for experimenting with tables, joins, and indexing, with a minimum of startup overhead. You can download dulce at http://sourceforge.net/projects/pythondulce/ - htmldocs can be viewed at http://ptmcg.zapto.org/dulce/htmldoc/index.html. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: mutate dictionary or list
On Sep 7, 7:05 am, Baba raoul...@gmail.com wrote: Hi I am working on an exercise which requires me to write a funtion that will check if a given word can be found in a given dictionary (the hand). def is_valid_word(word, hand, word_list): Returns True if word is in the word_list and is entirely composed of letters in the hand. Otherwise, returns False. Does not mutate hand or word_list. I don't understand this part: Does not mutate hand or word_list I would re-read your exercise description. hand is *not* a dictionary, but is most likely a list of individual letters. word_list too is probably *not* a dictionary, but a list of valid words (although this does bear a resemblance to what people in everyday life call a dictionary). Where did you get the idea that there was a dictionary in this problem? The Does not mutate hand or word_list. is a constraint that you are not allowed to update the hand or word_list arguments. For instance, you must not call word_list.sort() in order to search for the given word using some sort of binary search. You must not determine if all the letters in word come from hand by modifying the hand list (like dropping letters from hand as they are found in word). There are ways to copy arguments if you use a destructive process on their contents, so that the original stays unmodified - but that sounds like part of the exercise for you to learn about. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Tag parsing in python
On Aug 28, 11:23 pm, Paul McGuire pt...@austin.rr.com wrote: On Aug 28, 11:14 am, agnibhu dee...@gmail.com wrote: Hi all, I'm a newbie in python. I'm trying to create a library for parsing certain keywords. For example say I've key words like abc: bcd: cde: like that... So the user may use like abc: How are you bcd: I'm fine cde: ok So I've to extract the How are you and I'm fine and ok..and assign them to abc:, bcd: and cde: respectively.. There may be combination of keyowords introduced in future. like abc: xy: How are you So new keywords qualifying the other keywords so on.. I got to thinking more about your keywords-qualifying-keywords example, and I thought this would be a good way to support locale- specific tags. I also thought how one might want to have tags within tags, to be substituted later, requiring a abc:: escaped form of abc:, so that the tag is substituted with the value of tag abc: as a late binding. Wasn't too hard to modify what I posted yesterday, and now I rather like it. -- Paul # tag_substitute.py from pyparsing import (Combine, Word, alphas, FollowedBy, Group, OneOrMore, empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal, And, replaceWith) tag = Combine(Word(alphas) + ~FollowedBy(::) + :) tag_defn = Group(OneOrMore(tag))(tag) + empty + SkipTo(tag | LineEnd())(body) + Optional(LineEnd().suppress()) # now combine macro detection with substitution macros = {} macro_substitution = Forward() def make_macro_sub(tokens): # unescape '::' and substitute any embedded tags tag_value = macro_substitution.transformString(tokens.body.replace(::,:)) # save this tag and value (or overwrite previous) macros[tuple(tokens.tag)] = tag_value # define overall macro substitution expression macro_substitution MatchFirst( [(Literal(k[0]) if len(k)==1 else And([Literal(kk) for kk in k])).setParseAction(replaceWith(v)) for k,v in macros.items()] ) + ~FollowedBy(tag) # return empty string, so macro definitions don't show up in final # expanded text return tag_defn.setParseAction(make_macro_sub) # define pattern for macro scanning scan_pattern = macro_substitution | tag_defn sorry = \ nm: Dave sorry: en: I'm sorry, nm::, I'm afraid I can't do that. sorry: es: Lo siento nm::, me temo que no puedo hacer eso. Hal said, sorry: en: Hal dijo, sorry: es: print scan_pattern.transformString(sorry) Prints: Hal said, I'm sorry, Dave, I'm afraid I can't do that. Hal dijo, Lo siento Dave, me temo que no puedo hacer eso. -- http://mail.python.org/mailman/listinfo/python-list
Re: Tag parsing in python
On Aug 28, 11:14 am, agnibhu dee...@gmail.com wrote: Hi all, I'm a newbie in python. I'm trying to create a library for parsing certain keywords. For example say I've key words like abc: bcd: cde: like that... So the user may use like abc: How are you bcd: I'm fine cde: ok So I've to extract the How are you and I'm fine and ok..and assign them to abc:, bcd: and cde: respectively.. There may be combination of keyowords introduced in future. like abc: xy: How are you So new keywords qualifying the other keywords so on.. So I would like to know the python way of doing this. Is there any library already existing for making my work easier. ? ~ Agnibhu Here's how pyparsing can parse your keyword/tags: from pyparsing import Combine, Word, alphas, Group, OneOrMore, empty, SkipTo, LineEnd text1 = abc: How are you bcd: I'm fine cde: ok text2 = abc: xy: How are you tag = Combine(Word(alphas)+:) tag_defn = Group(OneOrMore(tag))(tag) + empty + SkipTo(tag | LineEnd())(body) for text in (text1,text2): print text for td in tag_defn.searchString(text): print td.dump() print Prints: abc: How are you bcd: I'm fine cde: ok [['abc:'], 'How are you'] - body: How are you - tag: ['abc:'] [['bcd:'], I'm fine] - body: I'm fine - tag: ['bcd:'] [['cde:'], 'ok'] - body: ok - tag: ['cde:'] abc: xy: How are you [['abc:', 'xy:'], 'How are you'] - body: How are you - tag: ['abc:', 'xy:'] Now here's how to further use pyparsing to actually use those tags as substitution macros: from pyparsing import Forward, MatchFirst, Literal, And, replaceWith, FollowedBy # now combine macro detection with substitution macros = {} macro_substitution = Forward() def make_macro_sub(tokens): macros[tuple(tokens.tag)] = tokens.body # define macro substitution macro_substitution MatchFirst( [(Literal(k[0]) if len(k)==1 else And([Literal(kk) for kk in k])).setParseAction(replaceWith(v)) for k,v in macros.items()] ) + ~FollowedBy(tag) return tag_defn.setParseAction(make_macro_sub) scan_pattern = macro_substitution | tag_defn test_text = text1 + \nBob said, 'abc:?' I said, 'bcd:.' + text2 + \nThen Bob said 'abc: xy:?' print test_text print scan_pattern.transformString(test_text) Prints: abc: How are you bcd: I'm fine cde: ok Bob said, 'abc:?' I said, 'bcd:.'abc: xy: How are you Then Bob said 'abc: xy:?' Bob said, 'How are you?' I said, 'I'm fine.' Then Bob said 'How are you?' -- http://mail.python.org/mailman/listinfo/python-list
Re: Is '[' a function or an operator or an language feature?
On Jul 16, 12:01 pm, Peng Yu pengyu...@gmail.com wrote: I mean to get the man page for '[' like in the following code. x=[1,2,3] But help('[') doesn't seem to give the above usage. ### Mutable Sequence Types ** List objects support additional operations that allow in-place modification of the object. Other mutable sequence types (when added to the language) should also support these operations. Strings and tuples are immutable sequence types: such objects cannot be modified once created. The following operations are defined on mutable sequence types (where *x* is an arbitrary object): ... ## I then checked help('LISTLITERALS'), which gives some description that is available from the language reference. So '[' in x=[1,2,3] is considered as a language feature rather than a function or an operator? List displays * A list display is a possibly empty series of expressions enclosed in square brackets: list_display ::= [ [expression_list | list_comprehension] ] list_comprehension ::= expression list_for list_for ::= for target_list in old_expression_list [list_iter] old_expression_list ::= old_expression [(, old_expression)+ [,]] list_iter ::= list_for | list_if list_if ::= if old_expression [list_iter] . ### -- Regards, Peng Also look for __getitem__ and __setitem__, these methods defined on your own container classes will allow you to write myobject['x'] and have your own custom lookup code get run. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: nicer way to remove prefix of a string if it exists
On Jul 13, 6:49 pm, News123 news1...@free.fr wrote: I wondered about a potentially nicer way of removing a prefix of a string if it exists. Here is an iterator solution: from itertools import izip def trim_prefix(prefix, s): i1,i2 = iter(prefix),iter(s) if all(c1==c2 for c1,c2 in izip(i1,i2)): return ''.join(i2) return s print trim_prefix(ABC,ABCDEFGHI) print trim_prefix(ABC,SLFJSLKFSLJFLSDF) Prints: DEFGHI SLFJSLKFSLJFLSDF -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: The real problem with Python 3 - no business case for conversion (was I strongly dislike Python 3)
On Jul 6, 3:30 am, David Cournapeau courn...@gmail.com wrote: On Tue, Jul 6, 2010 at 4:30 AM, D'Arcy J.M. Cain da...@druid.net wrote: One thing that would be very useful is how to maintain something that works on 2.x and 3.x, but not limiting yourself to 2.6. Giving up versions below 2.6 is out of the question for most projects with a significant userbase IMHO. As such, the idea of running the python 3 warnings is not so useful IMHO - unless it could be made to work better for python 2.x 2.6, but I am not sure the idea even makes sense. This is exactly how I felt about my support for pyparsing, that I was trying continue to provide support for 2.3 users, up through 3.x users, with a single code base. (This would actually have been possible if I had been willing to introduce a performance penalty for Python 2 users, but performance is such a critical issue for parsing I couldn't justify it to myself.) This meant that I had to constrain my implementation, while trying to incorporate forward-looking support features (such as __bool__ and __dir__), which have no effect on older Python versions, but support additions in newer Pythons. I just couldn't get through on the python-dev list that I couldn't just upgrade my code to 2.6 and then use 2to3 to keep in step across the 2-3 chasm, as this would leave behind my faithful pre-2.6 users. Here are some of the methods I used: - No use of sets. Instead I defined a very simple set simulation using dict keys, which could be interchanged with set for later versions. - No generator expressions, only list comprehensions. - No use of decorators. BUT, pyparsing includes a decorator method, traceParseAction, which can be used by users with later Pythons as @traceParseAction in their own code. - No print statements. As pyparsing is intended to be an internal module, it does no I/O as part of its function - it only processes a given string, and returns a data structure. - Python 2-3 compatible exception syntax. This may have been my trickiest step. The change of syntax for except from except ExceptionType, ex: to: except ExceptionType as ex: is completely forward and backward incompatible. The workaround is to rewrite as: except ExceptionType: ex = sys.exc_info()[0] which works just fine in 2.x and 3.x. However, there is a slight performance penalty in doing this, and pyparsing uses exceptions as part of its grammar success/failure signalling and backtracking; I've used this technique everywhere I can get away with it, but there is one critical spot where I can't use it, so I have to keep 2 code bases with slight differences between them. - Implement __bool__, followed by __nonzero__ = __bool__. This will give you boolean support for your classes in 2.3-3.1. - Implement __dir__, which is unused by old Pythons, but supports customization of dir() output for your own classes. - Implement __len__, __contains__, __iter__ and __reversed__ for container classes. - No ternary expressions. Not too difficult really, there are several well-known workarounds for this, either by careful use of and's and or's, or using the bool-as-int to return the value from (falseValue,trueValue)[condition]. - Define a version-sensitive portion of your module, to define synonyms for constants that changed name between versions. Something like: _PY3K = sys.version_info[0] 2 if _PY3K: _MAX_INT = sys.maxsize basestring = str _str2dict = set alphas = string.ascii_lowercase + string.ascii_uppercase else: _MAX_INT = sys.maxint range = xrange _str2dict = lambda strg : dict( [(c,0) for c in strg] ) alphas = string.lowercase + string.uppercase The main body of my code uses range throughout (for example), and with this definition I get the iterator behavior of xrange regardless of Python version. In the end I still have 2 source files, one for Py2 and one for Py3, but there is only a small and manageable number of differences between them, and I expect at some point I will move forward to supporting Py3 as my primary target version. But personally I think this overall Python 2-3 migration process is moving along at a decent rate, and I should be able to make my switchover in another 12-18 months. But in the meantime, I am still able to support all versions of Python NOW, and I plan to continue doing so (albeit support for 2.x versions will eventually mean continue to offer a frozen feature set, with minimal bug-fixing if any). I realize that pyparsing is a simple-minded module in comparison to others: it is pure Python, so it has no issues with C extensions; it does no I/O, so print-as-statement vs. print-as-function is not an issue; and it imports few other modules, so the ones it does have not been dropped in Py3; and overall it is only a few thousand lines of code. But I just offer this post as a concrete data point in this discussion. -- Paul --
Re: GAE + recursion limit
Does anyone have any clue what that might be? Why the problem is on GAE (even when run locally), when command line run works just fine (even with recursion limit decreased)? Can't explain why you see different behavior on GAE vs. local, but it is unusual for a small translator to flirt with recursion limit. I don't usually see parsers come close to this with fewer than 40 or 50 sub-expressions. You may have some left-recursion going on. Can you post your translator somewhere, perhaps on pastebin, or on the pyparsing wiki Discussion page (pyparsing.wikispaces.com)? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: automate minesweeper with python
On Jun 30, 6:39 pm, Jay jayk...@yahoo.com wrote: I would like to create a python script that plays the Windows game minesweeper. The python code logic and running minesweeper are not problems. However, seeing the 1-8 in the minesweeper map and clicking on squares is. I have no idea how to proceed. You can interact with a Windows application using pywinauto (http:// pywinauto.openqa.org/). Sounds like a fun little project - good luck! -- Paul -- http://mail.python.org/mailman/listinfo/python-list
[ANN] pyparsing 1.5.3 released
I'm happy to announce that a new release of pyparsing is now available, version 1.5.3. It has been almost a year and a half since 1.5.2 was released, but pyparsing has remained pretty stable. I believe I have cleaned up the botch-job I made in version 1.5.2 of trying to support both Python 2.x and Python 3.x. This new release will handle it by: - providing version-specific binary installers for Windows users - use version-adaptive code in the source distribution to use the correct version of pyparsing.py for the current Python distribution This release also includes a number of small bug-fixes, plus some very interesting new examples. Here is the high-level summary of what's new in pyparsing 1.5.3: - === NOTE: API CHANGE!!! === With this release, and henceforward, the pyparsing module is imported as pyparsing on both Python 2.x and Python 3.x versions. - Fixed up setup.py to auto-detect Python version and install the correct version of pyparsing - suggested by Alex Martelli, thanks, Alex! (and my apologies to all those who struggled with those spurious installation errors caused by my earlier fumblings!) - Fixed bug on Python3 when using parseFile, getting bytes instead of a str from the input file. - Fixed subtle bug in originalTextFor, if followed by significant whitespace (like a newline) - discovered by Francis Vidal, thanks! - Fixed very sneaky bug in Each, in which Optional elements were not completely recognized as optional - found by Tal Weiss, thanks for your patience. - Fixed off-by-1 bug in line() method when the first line of the input text was an empty line. Thanks to John Krukoff for submitting a patch! - Fixed bug in transformString if grammar contains Group expressions, thanks to patch submitted by barnabas79, nice work! - Fixed bug in originalTextFor in which trailing comments or otherwised ignored text got slurped in with the matched expression. Thanks to michael_ramirez44 on the pyparsing wiki for reporting this just in time to get into this release! - Added better support for summing ParseResults, see the new example, parseResultsSumExample.py. - Added support for composing a Regex using a compiled RE object; thanks to my new colleague, Mike Thornton! - In version 1.5.2, I changed the way exceptions are raised in order to simplify the stacktraces reported during parsing. An anonymous user posted a bug report on SF that this behavior makes it difficult to debug some complex parsers, or parsers nested within parsers. In this release I've added a class attribute ParserElement.verbose_stacktrace, with a default value of False. If you set this to True, pyparsing will report stacktraces using the pre-1.5.2 behavior. - Some interesting new examples, including a number of parsers related to parsing C source code: . pymicko.py, a MicroC compiler submitted by Zarko Zivanov. (Note: this example is separately licensed under the GPLv3, and requires Python 2.6 or higher.) Thank you, Zarko! . oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C Contest. . select_parser.py, a parser for reading SQLite SELECT statements, as specified at http://www.sqlite.org/lang_select.html; this goes into much more detail than the simple SQL parser included in pyparsing's source code . stateMachine2.py, a modified version of stateMachine.py submitted by Matt Anderson, that is compatible with Python versions 2.7 and above - thanks so much, Matt! . excelExpr.py, a *simplistic* first-cut at a parser for Excel expressions, which I originally posted on comp.lang.python in January, 2010; beware, this parser omits many common Excel cases (addition of numbers represented as strings, references to named ranges) . cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on comp.lang.python in August, 2009 (redistributed here with Mark's permission). Thanks a bunch, Mark! . partial_gene_match.py, a sample I posted to Stackoverflow.com, implementing a special variation on Literal that does close matching, up to a given number of allowed mismatches. The application was to find matching gene sequences, with allowance for one or two mismatches. . tagCapture.py, a sample showing how to use a Forward placeholder to enforce matching of text parsed in a previous expression. . matchPreviousDemo.py, simple demo showing how the matchPreviousLiteral helper method is used to match a previously parsed token. Download pyparsing 1.5.3 at http://sourceforge.net/projects/pyparsing/. You can also access pyparsing's epydoc documentation online at http://packages.python.org/pyparsing/. The pyparsing Wiki is at http://pyparsing.wikispaces.com. -- Paul Pyparsing is a pure-Python class library for quickly developing recursive-descent parsers. Parser grammars are assembled directly in the
[ANN] pyparsing 1.5.3 released
I'm happy to announce that a new release of pyparsing is now available, version 1.5.3. It has been almost a year and a half since 1.5.2 was released, but pyparsing has remained pretty stable. I believe I have cleaned up the botch-job I made in version 1.5.2 of trying to support both Python 2.x and Python 3.x. This new release will handle it by: - providing version-specific binary installers for Windows users - use version-adaptive code in the source distribution to use the correct version of pyparsing.py for the current Python distribution This release also includes a number of small bug-fixes, plus some very interesting new examples. Here is the high-level summary of what's new in pyparsing 1.5.3: - === NOTE: API CHANGE!!! === With this release, and henceforward, the pyparsing module is imported as pyparsing on both Python 2.x and Python 3.x versions. - Fixed up setup.py to auto-detect Python version and install the correct version of pyparsing - suggested by Alex Martelli, thanks, Alex! (and my apologies to all those who struggled with those spurious installation errors caused by my earlier fumblings!) - Fixed bug on Python3 when using parseFile, getting bytes instead of a str from the input file. - Fixed subtle bug in originalTextFor, if followed by significant whitespace (like a newline) - discovered by Francis Vidal, thanks! - Fixed very sneaky bug in Each, in which Optional elements were not completely recognized as optional - found by Tal Weiss, thanks for your patience. - Fixed off-by-1 bug in line() method when the first line of the input text was an empty line. Thanks to John Krukoff for submitting a patch! - Fixed bug in transformString if grammar contains Group expressions, thanks to patch submitted by barnabas79, nice work! - Fixed bug in originalTextFor in which trailing comments or otherwised ignored text got slurped in with the matched expression. Thanks to michael_ramirez44 on the pyparsing wiki for reporting this just in time to get into this release! - Added better support for summing ParseResults, see the new example, parseResultsSumExample.py. - Added support for composing a Regex using a compiled RE object; thanks to my new colleague, Mike Thornton! - In version 1.5.2, I changed the way exceptions are raised in order to simplify the stacktraces reported during parsing. An anonymous user posted a bug report on SF that this behavior makes it difficult to debug some complex parsers, or parsers nested within parsers. In this release I've added a class attribute ParserElement.verbose_stacktrace, with a default value of False. If you set this to True, pyparsing will report stacktraces using the pre-1.5.2 behavior. - Some interesting new examples, including a number of parsers related to parsing C source code: . pymicko.py, a MicroC compiler submitted by Zarko Zivanov. (Note: this example is separately licensed under the GPLv3, and requires Python 2.6 or higher.) Thank you, Zarko! . oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C Contest. . select_parser.py, a parser for reading SQLite SELECT statements, as specified at http://www.sqlite.org/lang_select.html; this goes into much more detail than the simple SQL parser included in pyparsing's source code . stateMachine2.py, a modified version of stateMachine.py submitted by Matt Anderson, that is compatible with Python versions 2.7 and above - thanks so much, Matt! . excelExpr.py, a *simplistic* first-cut at a parser for Excel expressions, which I originally posted on comp.lang.python in January, 2010; beware, this parser omits many common Excel cases (addition of numbers represented as strings, references to named ranges) . cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on comp.lang.python in August, 2009 (redistributed here with Mark's permission). Thanks a bunch, Mark! . partial_gene_match.py, a sample I posted to Stackoverflow.com, implementing a special variation on Literal that does close matching, up to a given number of allowed mismatches. The application was to find matching gene sequences, with allowance for one or two mismatches. . tagCapture.py, a sample showing how to use a Forward placeholder to enforce matching of text parsed in a previous expression. . matchPreviousDemo.py, simple demo showing how the matchPreviousLiteral helper method is used to match a previously parsed token. Download pyparsing 1.5.3 at http://sourceforge.net/projects/pyparsing/. You can also access pyparsing's epydoc documentation online at http://packages.python.org/pyparsing/. The pyparsing Wiki is at http://pyparsing.wikispaces.com. -- Paul Pyparsing is a pure-Python class library for quickly developing recursive-descent parsers. Parser grammars are assembled directly in the
Need some Python 3 help
I was teetering on the brink of releasing Pyparsing 1.5.3 (with some nice new examples and goodies), when I saw that I had recently introduced a bug in the Python 3 compatible version. Here is the stacktrace as reported on SF: Traceback (most recent call last): File testcase.py, line 11, in module result = exp.parseFile(./pyparsing_py3.py) File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 1426, in parseFile return self.parseString(file_contents, parseAll) File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 1068, in parseString loc, tokens = self._parse( instring, 0 ) File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 935, in _parseNoCache preloc = self.preParse( instring, loc ) File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 893, in preParse while loc instrlen and instring[loc] in wt: TypeError: 'in string' requires string as left operand, not int In this section of code, instring is a string, loc is an int, and wt is a string. Any clues why instring[loc] would be evaluating as int? (I am unfortunately dependent on the kindness of strangers when it comes to testing my Python 3 code, as I don't have a Py3 environment installed.) Thanks, -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Need some Python 3 help
On May 25, 8:58 pm, Benjamin Peterson benja...@python.org wrote: Paul McGuire ptmcg at austin.rr.com writes: In this section of code, instring is a string, loc is an int, and wt is a string. Any clues why instring[loc] would be evaluating as int? (I am unfortunately dependent on the kindness of strangers when it comes to testing my Python 3 code, as I don't have a Py3 environment installed.) Indexing bytes in Python 3 gives an integer. Hrmm, I had a sneaking hunch this might be the issue. But then I don't know how this code *ever* worked in Python 3, as it is chock full of indexed references into the string being parsed. And yet, I've had other folks test and confirm that pyparsing_py3 *does* work on Python 3. It is a puzzle. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
condition and True or False
While sifting through some code looking for old x and y or z code that might better be coded using y if x else z, I came across this puzzler: x = boolean expression and True or False What is and True or False adding to this picture? The boolean expression part is already evaluating to a boolean, so I don't understand why a code author would feel compelled to beat this one over the head with the additional and True or False. I did a little code Googling and found a few other Python instances of this, but also many Lua instances. I'm not that familiar with Lua, is this a practice that one who uses Lua frequently might carry over to Python, not realizing that the added and True or False is redundant? Other theories? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Usable street address parser in Python?
On Apr 17, 2:23 pm, John Nagle na...@animats.com wrote: Is there a usable street address parser available? There are some bad ones out there, but nothing good that I've found other than commercial products with large databases. I don't need 100% accuracy, but I'd like to be able to extract street name and street number for at least 98% of US mailing addresses. There's pyparsing, of course. There's a street address parser as an example at http://pyparsing.wikispaces.com/file/view/streetAddressParser.py;. It's not very good. It gets all of the following wrong: 1500 Deer Creek Lane (Parses Creek as a street type) 186 Avenue A (NYC street) 2081 N Webb Rd (Parses N Webb as a street name) 2081 N. Webb Rd (Parses N as street name) 1515 West 22nd Street (Parses West as name) 2029 Stierlin Court (Street names starting with St misparse.) Some special cases that don't work, unsurprisingly. P.O. Box 33170 The Landmark @ One Market, Suite 200 One Market, Suite 200 One Market Please take a look at the updated form of this parser. It turns out there actually *were* some bugs in the old form, plus there was no provision for PO Boxes, avenues that start with Avenue instead of ending with them, or house numbers spelled out as words. The only one I consider a special case is the support for Avenue X instead of X Avenue - adding support for the rest was added in a fairly general way. With these bug fixes, I hope this improves your hit rate. (There are also some simple attempts at adding apt/suite numbers, and APO and AFP in addition to PO boxes - if not exactly what you need, the means to extend to support other options should be pretty straightforward.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Confused by slash/escape in regexp
On Apr 11, 5:43 pm, andrew cooke and...@acooke.org wrote: Is the third case here surprising to anyone else? It doesn't make sense to me... Python 2.6.2 (r262:71600, Oct 24 2009, 03:15:21) [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2 Type help, copyright, credits or license for more information. from re import compile p1 = compile('a\x62c') p1.match('abc') _sre.SRE_Match object at 0x7f4e8f93d578 p2 = compile('a\\x62c') p2.match('abc') _sre.SRE_Match object at 0x7f4e8f93d920 p3 = compile('a\\\x62c') p3.match('a\\bc') p3.match('abc') p3.match('a\\\x62c') Curious/confused, Andrew Here is your same session, but using raw string literals: Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type help, copyright, credits or license for more information. from re import compile p1 = compile(r'a\x62c') p1.match(r'abc') _sre.SRE_Match object at 0x00A04BB8 p2 = compile(r'a\\x62c') p2.match(r'abc') p3 = compile(r'a\\\x62c') p3.match(r'a\\bc') p3.match(r'abc') p3.match(r'a\\\x62c') So I would say the surprise isn't that case 3 didn't match, but that case 2 matched. Unless I just don't get what you were testing, not being an RE wiz. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Tough sorting problem: or, I'm confusing myself
On Apr 9, 10:03 am, david jensen dmj@gmail.com wrote: Hi all, I'm trying to find a good way of doing the following: Each n-tuple in combinations( range( 2 ** m ), n ) has a corresponding value n-tuple (call them scores for clarity later). I'm currently storing them in a dictionary, by doing: res={} for i in itertools.combinations( range( 2**m ) , n): res[ i ] = getValues( i ) # getValues() is computationally expensive For each (n-1)-tuple, I need to find the two numbers that have the highest scores versus them. I know this isn't crystal clear, but hopefully an example will help: with m=n=3: Looking at only the (1, 3) case, assuming: getValues( (1, 2, 3) ) == ( -200, 125, 75 ) # this contains the highest other score, where 2 scores 125 getValues( (1, 3, 4) ) == ( 50, -50, 0 ) getValues( (1, 3, 5) ) == ( 25, 300, -325 ) getValues( (1, 3, 6) ) == ( -100, 0, 100 ) # this contains the second-highest, where 6 scores 100 getValues( (1, 3, 7) ) == ( 80, -90, 10 ) getValues( (1, 3, 8) ) == ( 10, -5, -5 ) I'd like to return ( (2, 125), (6, 100) ). The most obvious (to me) way to do this would be not to generate the res dictionary at the beginning, but just to go through each combinations( range( 2**m), n-1) and try every possibility... this will test each combination n times, however, and generating those values is expensive. [e.g. (1,2,3)'s scores will be generated when finding the best possibilities for (1,2), (1,3) and (2,3)] Add a memoizing decorator to getValues, so that repeated calls will do lookups instead of repeated calculations. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Python and Regular Expressions
On Apr 10, 8:38 pm, Paul Rubin no.em...@nospam.invalid wrote: The impression that I have (from a distance) is that Pyparsing is a good interface abstraction with a kludgy and slow implementation. That the implementation uses regexps just goes to show how kludgy it is. One hopes that someday there will be a more serious implementation, perhaps using llvm-py (I wonder whatever happened to that project, by the way) so that your parser script will compile to executable machine code on the fly. I am definitely flattered that pyparsing stirs up so much interest, and among such a distinguished group. But I have to take some umbrage at Paul Rubin's left-handed compliment, Pyparsing is a good interface abstraction with a kludgy and slow implementation, especially since he forms his opinions from a distance. I actually *did* put some thought into what I wanted in pyparsing before designing it, and this forms this chapter of Getting Started with Pyparsing (available here as a free online excerpt: http://my.safaribooksonline.com/9780596514235/what_makes_pyparsing_so_special#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODA1OTY1MTQyMzUvMTYmaW1hZ2VwYWdlPTE2), the Zen of Pyparsing as it were. My goals were: - build parsers using explicit constructs (such as words, groups, repetition, alternatives), vs. expression encoding using specialized character sequences, as found in regexen - easy parser construction from primitive elements to complex groups and alternatives, using Python's operator overloading for ease of direct implementation of parsers using ordinary Python syntax; include mechanisms for defining recursive parser expressions - implicit skipping of whitespace between parser elements - results returned not just as a list of strings, but as a rich data object, with access to parsed fields by name or by list index, taking interfaces from both dicts and lists for natural adoption into common Python idioms - no separate code-generation steps, a la lex/yacc - support for parse-time callbacks, for specialized token handling, conversion, and/or construction of data structures - 100% pure Python, to be runnable on any platform that supports Python - liberal licensing, to permit easy adoption into any user's projects anywhere So raw performance really didn't even make my short-list, beyond the obvious should be tolerably fast enough. I have found myself reading posts on c.l.py with wording like I'm trying to parse blah-blah and I've been trying for hours/days to get this regex working. For kicks, I'd spend 5-15 minutes working up a working pyparsing solution, which *does* run comparatively slowly, perhaps taking a few minutes to process the poster's data file. But the net solution is developed and running in under 1/2 an hour, which to me seems like an overall gain compared to hours of fruitless struggling with backslashes and regex character sequences. On top of which, the pyparsing solutions are still readable when I come back to them weeks or months later, instead of staring at some line-noise regex and just scratch my head wondering what it was for. And sometimes comparatively slowly means that it runs 50x slower than a compiled method that runs in 0.02 seconds - that's still getting the job done in just 1 second. And is the internal use of regexes with pyparsing really a kludge? Why? They are almost completely hidden from the parser developer. And yet by using compiled regexes, I retain the portability of 100% Python while leveraging the compiled speed of the re engine. It does seem that there have been many posts of late (either on c.l.py or the related posts on Stackoverflow) where the OP is trying to either scrape content from HTML, or parse some type of recursive expression. HTML scrapers implemented using re's are terribly fragile, since HTML in the wild often contains little surprises (unexpected whitespace; upper/lower case inconsistencies; tag attributes in unpredictable order; attribute values with double, single, or no quotation marks) which completely frustrate any re-based approach. Granted, there are times when an re-parsing-of-HTML endeavor *isn't* futile or doomed from the start - the OP may be working with a very restricted set of HTML, generated from some other script so that the output is very consistent. Unfortunately, this poster usually gets thrown under the same you'll never be able to parse HTML with re's bus. I can't explain the surge in these posts, other than to wonder if we aren't just seeing a skewed sample - that is, the many cases where people *are* successfully using re's to solve their text extraction problems aren't getting posted to c.l.py, since no one posts questions they already have the answers to. So don't be too dismissive of pyparsing, Mr. Rubin. I've gotten many e- mails, wiki, and forum posts from Python users at all levels of the expertise scale, saying that pyparsing has helped them to be very productive in one or another aspect of creating a
Re: Dynamically growing an array to implement a stack
On Apr 8, 3:21 pm, M. Hamed mhels...@hotmail.com wrote: I have trouble with some Python concept. The fact that you can not assign to a non-existent index in an array. For example: a = [0,1] a[2] = Generates an error I can use a.append(2) but that always appends to the end. Sometimes I want to use this array as a stack and hence my indexing logic would be something like: If you are already at the end (based on your stack pointer): use append() then index (and inc your pointer) if not: index directly (and inc your stack pointer) ??? The stack pointer is *always* at the end, except don't actually keep a real stack pointer, let list do it for you. Call append to push a value onto the end, and pop to pull it off. Or if you are really stuck on push/pop commands for a stack, do this: class stack(list): ... push = list.append ... ss = stack() ss.push(x) ss.push(Y) ss ['x', 'Y'] ss.pop() 'Y' ss.pop() 'x' ss.pop() Traceback (most recent call last): File stdin, line 1, in module IndexError: pop from empty list -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Recommend Commercial graphing library
On Apr 6, 11:05 am, AlienBaby matt.j.war...@gmail.com wrote: The requirement for a commercial license comes down to being restricted to not using any open source code. If it's an open source license it can't be used in our context. You may be misunderstanding this issue, I think you are equating open source with GPL, which is the open source license that requires applications that use it to also open their source. There are many other open source licenses, such as Berkeley, MIT, and LGPL, that are more permissive in what they allow, up to and in some cases including full inclusion within a closed-source commercial product. You might also contact the supplier of the open source code you are interested, and perhaps pay a modest fee to obtain a commercial license. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: s-expression parser in python
On Apr 6, 7:02 pm, James Stroud nospamjstroudmap...@mbi.ucla.edu wrote: Hello All, I want to use an s-expression based configuration file format for a python program I'm writing. Does anyone have a favorite parser? The pyparsing wiki includes this parser on its Examples page: http://pyparsing.wikispaces.com/file/view/sexpParser.py. This parser is also described in more detail in the pyparsing e-book from O'Reilly. This parser is based on the BNF defined here:http:// people.csail.mit.edu/rivest/Sexp.txt. I should think Ron Rivest would be the final authority on S-expression syntax, but this BNF omits '!', '', and '' as valid punctuation characters, and does not support free-standing floats and ints as tokens. Still, you can extend the pyparsing parser (such is the goal of pyparsing, to make these kinds of extensions easy, as the source material or BNF or requirements change out from underneath you) by inserting these changes: real = Regex(r[+-]?\d+\.\d*([eE][+-]?\d+)?).setParseAction(lambda tokens: float(tokens[0])) token = Word(alphanums + -./_:*+=!) simpleString = real | decimal | raw | token | base64_ | hexadecimal | qString And voila! Your test string parses as: [['and', ['or', ['', 'uid', 1000], ['!=', 'gid', 20]], ['', 'quota', 5000.0]]] -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: vars().has_key() question about how working .
On Apr 4, 3:42 am, catalinf...@gmail.com catalinf...@gmail.com wrote: Hi everyone . My questions is why vars().has_key('b') is False ?' I expecting to see True because is a variable ... Thanks Yes, 'b' is a var, but only within the scope of something(). See how this is different: def sth(): ... b = 25 ... print 'b' in vars() ... sth() True (Also, has_key() is the old-style way to test for key existence in a dict, and is kept around for compatibility with old code, but the preferred method now is to use 'in'.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: C-style static variables in Python?
On Apr 1, 5:34 pm, kj no.em...@please.post wrote: When coding C I have often found static local variables useful for doing once-only run-time initializations. For example: Here is a decorator to make a function self-aware, giving it a this variable that points to itself, which you could then initialize from outside with static flags or values: from functools import wraps def self_aware(fn): @wraps(fn) def fn_(*args): return fn(*args) fn_.__globals__[this] = fn_ return fn_ @self_aware def foo(): this.counter += 1 print this.counter foo.counter = 0 foo() foo() foo() Prints: 1 2 3 -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Generating text from a regular expression
On Mar 31, 5:49 am, Nathan Harmston iwanttobeabad...@googlemail.com wrote: Hi everyone, I have a slightly complicated/medium sized regular expression and I want to generate all possible words that it can match (to compare performance of regex against an acora based matcher). The pyparsing wiki Examples page includes this regex inverter: http://pyparsing.wikispaces.com/file/view/invRegex.py From the module header: # Supports: # - {n} and {m,n} repetition, but not unbounded + or * repetition # - ? optional elements # - [] character ranges # - () grouping # - | alternation -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: file seek is slow
This is a pretty tight loop: for i in xrange(100): f1.seek(0) But there is still a lot going on, some of which you can lift out of the loop. The easiest I can think of is the lookup of the 'seek' attribute on the f1 object. Try this: f1_seek = f1.seek for i in xrange(100): f1_seek(0) How does that help your timing? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Problem with regular expression
On Mar 7, 4:32 am, Joan Miller pelok...@gmail.com wrote: I would to convert the first string to upper case. But this regular expression is not matching the first string between quotes. Is using pyparsing overkill? Probably. But your time is valuable, and pyparsing let's you knock this out in less time than it probably took to write your original post. Use pyparsing's pre-defined expression sglQuotedString to match your entry key in quotes: key = sglQuotedString Add a parse action to convert to uppercase: key.setParseAction(lambda tokens:tokens[0].upper()) Now define the rest of your entry value (be sure to add the negative lookahead so we *don't* match your foo entry): entry = key + : + ~Literal({) If I put your original test cases into a single string named 'data', I can now use transformString to convert all of your keys that don't point to '{'ed values: print entry.transformString(data) Giving me: # string to non-matching 'foo': { # strings to matching 'BAR': 'bar2' 'BAR': None 'BAR': 0 'BAR': True Here's the whole script: from pyparsing import sglQuotedString, Literal key = sglQuotedString key.setParseAction(lambda tokens:tokens[0].upper()) entry = key + : + ~Literal({) print entry.transformString(data) And I'll bet that if you come back to this code in 3 months, you'll still be able to figure out what it does! -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: How to efficiently extract information from structured text file
On Feb 17, 7:38 pm, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: On Wed, 17 Feb 2010 17:13:23 -0800, Jonathan Gardner wrote: And once you realize that every program is really a compiler, then you have truly mastered the Zen of Programming in Any Programming Language That Will Ever Exist. In the same way that every tool is really a screwdriver. -- Steven The way I learned this was: - Use the right tool for the right job. - Every tool is a hammer. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: How to efficiently extract information from structured text file
On Feb 16, 5:48 pm, Imaginationworks xiaju...@gmail.com wrote: Hi, I am trying to read object information from a text file (approx. 30,000 lines) with the following format, each line corresponds to a line in the text file. Currently, the whole file was read into a string list using readlines(), then use for loop to search the = { and }; to determine the Object, SubObject,and SubSubObject. If you open(filename).read() this file into a variable named data, the following pyparsing parser will pick out your nested brace expressions: from pyparsing import * EQ,LBRACE,RBRACE,SEMI = map(Suppress,={};) ident = Word(alphas, alphanums) contents = Forward() defn = Group(ident + EQ + Group(LBRACE + contents + RBRACE + SEMI)) contents ZeroOrMore(defn | ~(LBRACE|RBRACE) + Word(printables)) results = defn.parseString(data) print results Prints: [ ['Object1', ['...', ['SubObject1', ['', ['SubSubObject1', ['...'] ] ] ], ['SubObject2', ['', ['SubSubObject21', ['...'] ] ] ], ['SubObjectN', ['', ['SubSubObjectN', ['...'] ] ] ] ] ] ] -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: pyparsing wrong output
On Feb 12, 6:41 pm, Gabriel Genellina gagsl-...@yahoo.com.ar wrote: En Fri, 12 Feb 2010 10:41:40 -0300, Eknath Venkataramani eknath.i...@gmail.com escribió: I am trying to write a parser in pyparsing. Help Me.http://paste.pocoo.org/show/177078/is the code and this is input file:http://paste.pocoo.org/show/177076/. I get output as: generator object at 0xb723b80c There is nothing wrong with pyparsing here. scanString() returns a generator, like this: py g = (x for x in range(20) if x % 3 == 1) py g generator object genexpr at 0x00E50D78 Unfortunately, your grammar doesn't match the input text, so your generator doesn't return anything. I think you are taking sort of brute force approach to this problem, and you need to think a little more abstractly. You can't just pick a fragment and then write an expression for it, and then the next and then stitch them together - well you *can* but it helps to think both abstract and concrete at the same time. With the exception of your one key of \', this is a pretty basic recursive grammar. Recursive grammars are a little complicated to start with, so I'll start with a non-recursive part. And I'll work more bottom-up or inside-out. Let's start by looking at these items: count = 8, baajaar = 0.87628353, kiraae = 0.02341598, lii = 0.02178813, adr = 0.01978462, gyiimn = 0.01765590, Each item has a name (which you called eng, so I'll keep that expression), a '=' and *something*. In the end, we won't really care about the '=' strings, they aren't really part of the keys or the associated values, they are just delimiting strings - they are important during parsing, but afterwards we don't really care about them. So we'll start with a pyparsing expression for this: keyval = eng + Suppress('=') + something Sometimes, the something is an integer, sometimes it's a floating point number. I'll define some more generic forms for these than your original number, and a separate expression for a real number: integer = Combine(Optional('-') + Word(nums)) realnum = Combine(Optional('-') + Word(nums) + '.' + Word(nums)) When we parse for these two, we need to be careful to check for a realnum before an integer, so that we don't accidentally parse the leading of 3.1415 as the integer 3. something = realnum | integer So now we can parse this fragment using a delimitedList expression (which takes care of the intervening commas, and also suppresses them from the results: filedata = count = 8, baajaar = 0.87628353, kiraae = 0.02341598, lii = 0.02178813, adr = 0.01978462, gyiimn = 0.01765590, print delimitedList(keyval).parseString(filedata) Gives: ['count', '8', 'baajaar', '0.87628353', 'kiraae', '0.02341598', 'lii', '0.02178813', 'adr', '0.01978462', 'gyiimn', '0.01765590'] Right off the bat, we see that we want a little more structure to these results, so that the keys and values are grouped naturally by the parser. The easy way to do this is with Group, as in: keyval = Group(eng + Suppress('=') + something) With this one change, we now get: [['count', '8'], ['baajaar', '0.87628353'], ['kiraae', '0.02341598'], ['lii', '0.02178813'], ['adr', '0.01978462'], ['gyiimn', '0.01765590']] Now we need to add the recursive part of your grammar. A nested input looks like: confident = { count = 4, trans = { ashhvsht = 0.75100505, phraarmnbh = 0.08341708, }, }, So in addition to integers and reals, our something could also be a nested list of keyvals: something = realnum | integer | (lparen + delimitedList(keyval) + rparen) This is *almost* right, with just a couple of tweaks: - the list of keyvals may have a comma after the last item before the closing '}' - we really want to suppress the opening and closing braces (lparen and rparen) - for similar structure reasons, we'll enclose the list of keyvals in a Group to retain the data hierarchy lparen,rparen = map(Suppress, {}) something = realnum | integer | Group(lparen + delimitedList(keyval) + Optional(',') + rparen) The recursive problem is that we have defined keyval using something, and something using keyval. You can't do that in Python. So we use the pyparsing class Forward to forward declare something: something = Forward() keyval = Group(eng + Suppress('=') + something) To define something as a Forward, we use the '' shift operator: something (realnum | integer | Group(lparen + delimitedList(keyval) + Optional(',') + rparen)) Our grammar now looks like: lparen,rparen = map(Suppress, {}) something = Forward() keyval = Group(eng + Suppress('=') + something) something (realnum | integer | Group(lparen + delimitedList(keyval) + Optional(',') + rparen)) To parse your entire input file, use a delimitedList(keyval) results =
Re: How to measure elapsed time under Windows?
On Feb 10, 2:24 am, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: On Tue, 9 Feb 2010 21:45:38 + (UTC), Grant Edwards inva...@invalid.invalid declaimed the following in gmane.comp.python.general: Doesn't work. datetime.datetime.now has granularity of 15-16ms. Intervals much less that that often come back with a delta of 0. A delay of 20ms produces a delta of either 15-16ms or 31-32ms WinXP uses an ~15ms time quantum for task switching. Which defines the step rate of the wall clock output... http://www.eggheadcafe.com/software/aspnet/35546579/the-quantum-was-n...http://www.eggheadcafe.com/software/aspnet/32823760/how-do-you-set-ti... http://www.lochan.org/2005/keith-cl/useful/win32time.html -- Wulfraed Dennis Lee Bieber KD6MOG wlfr...@ix.netcom.com HTTP://wlfraed.home.netcom.com/ Gabriel Genellina reports that time.clock() uses Windows' QueryPerformanceCounter() API, which has much higher resolution than the task switcher's 15ms. QueryPerformanceCounter's resolution is hardware-dependent; using the Win API, and a little test program, I get this value on my machine: Frequency is 3579545 ticks/sec Resolution is 0.279365114840015 microsecond/tick -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: How to measure elapsed time under Windows?
On Feb 9, 10:10 am, Grant Edwards inva...@invalid.invalid wrote: Is there another way to measure small periods of elapsed time (say in the 1-10ms range)? On Feb 9, 10:10 am, Grant Edwards inva...@invalid.invalid wrote: Is there another way to measure small periods of elapsed time (say in the 1-10ms range)? I made repeated calls to time.clock() in a generator expression, which is as fast a loop I can think of in Python. Then I computed the successive time deltas to see if any granularities jumped out. Here are the results: import time from itertools import groupby # get about 1000 different values of time.clock() ts = set(time.clock() for i in range(1000)) # sort in ascending order ts = sorted(ts) # compute diffs between adjacent time values diffs = [j-i for i,j in zip(ts[:-1],ts[1:])] # sort and group diffs.sort() diffgroups = groupby(diffs) # print the distribution of time differences in microseconds for i in diffgroups: print %3d %12.6f % (len(list(i[1])), i[0]*1e6) ... 25 2.234921 28 2.234921 242 2.514286 506 2.514286 45 2.793651 116 2.793651 1 3.073016 8 3.073016 6 3.352381 4 3.631746 3 3.92 1 3.92 5 4.190477 2 4.469842 1 6.146033 1 8.660319 1 9.79 110.895239 111.174605 124.304765 141.904767 There seems to be a step size of about .28 microseconds. So I would guess time.clock() has enough resolution. But also beware of the overhead of the calls to clock() - using timeit, I find that each call takes about 2 microseconds (consistent with the smallest time difference in the above data set). -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: How to print all expressions that match a regular expression
On Feb 6, 1:36 pm, hzh...@gmail.com hzh...@gmail.com wrote: Hi, I am a fresh man with python. I know there is regular expressions in Python. What I need is that given a particular regular expression, output all the matches. For example, given “[1|2|3]{2}” as the regular expression, the program should output all 9 matches, i.e., 11 12 13 21 22 23 31 32 33. Is there any well-written routine in Python or third-party program to do this? If there isn't, could somebody make some suggestions on how to write it myself? Thanks. Zhuo Please check out this example on the pyparsing wiki, invRegex.py: http://pyparsing.wikispaces.com/file/view/invRegex.py. This code implements a generator that returns successive matching strings for the given regex. Running it, I see that you actually have a typo in your example. print list(invert([1|2|3]{2})) ['11', '1|', '12', '13', '|1', '||', '|2', '|3', '21', '2|', '22', '23', '31', '3|', '32', '33'] I think you meant either [123]{2} or (1|2|3){2}. print list(invert([123]{2})) ['11', '12', '13', '21', '22', '23', '31', '32', '33'] print list(invert((1|2|3){2})) ['11', '12', '13', '21', '22', '23', '31', '32', '33'] Of course, as other posters have pointed out, this inverter does not accept regexen with unbounded multiple characters '+' or '*', but '?' and {min,max} notation will work. Even '.' is supported, although this can generate a large number of return values. Of course, you'll also have to install pyparsing to get this to work. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
I never represented that this parser would handle any and all Excel formulas! But I should hope the basic structure of a pyparsing solution might help the OP add some of the other features you cited, if necessary. It's actually pretty common to take an incremental approach in making such a parser, and so here are some of the changes that you would need to make based on the deficiencies you pointed out: functions can have a variable number of arguments, of any kind of expression - statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList (expr) + RPAR sheet name could also be a quoted string - sheetRef = Word(alphas, alphanums) | QuotedString(',escQuote='') add boolean literal support - boolLiteral = oneOf(TRUE FALSE) - operand = numericLiteral | funcCall | boolLiteral | cellRange | cellRef These small changes are enough to extend the parser to successfully handle the test2a, 2b, and 3a cases. (I'll add this to the pyparsing wiki examples, as it looks like it is a good start on a familiar but complex expression.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing an Excel formula with the re module
On Jan 5, 1:49 pm, Tim Chase python.l...@tim.thechases.com wrote: vsoler wrote: Hence, I need toparseExcel formulas. Can I do it by means only of re (regular expressions)? I know that for simple formulas such as =3*A7+5 it is indeed possible. What about complex for formulas that include functions, sheet names and possibly other *.xls files? Where things start getting ugly is when you have nested function calls, such as =if(Sum(A1:A25)42,Min(B1:B25), if(Sum(C1:C25)3.14, (Min(C1:C25)+3)*18,Max(B1:B25))) Regular expressions don't do well with nested parens (especially arbitrarily-nesting-depth such as are possible), so I'd suggest going for a full-blown parsing solution like pyparsing. If you have fair control over what can be contained in the formulas and you know they won't contain nested parens/functions, you might be able to formulate some sort of kinda, sorta, maybe parses some forms of formulas regexp. -tkc This might give the OP a running start: from pyparsing import (CaselessKeyword, Suppress, Word, alphas, alphanums, nums, Optional, Group, oneOf, Forward, Regex, operatorPrecedence, opAssoc, dblQuotedString) test1 = =3*A7+5 test2 = =3*Sheet1!$A$7+5 test3 = =if(Sum(A1:A25)42,Min(B1:B25), \ if(Sum(C1:C25)3.14, (Min(C1:C25)+3)*18,Max(B1:B25))) EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$') sheetRef = Word(alphas, alphanums) colRef = Optional(DOLLAR) + Word(alphas,max=2) rowRef = Optional(DOLLAR) + Word(nums) cellRef = Group(Optional(sheetRef + EXCL)(sheet) + colRef(col) + rowRef(row)) cellRange = (Group(cellRef(start) + COLON + cellRef(end)) (range) | cellRef ) expr = Forward() COMPARISON_OP = oneOf( = = = != ) condExpr = expr + COMPARISON_OP + expr ifFunc = (CaselessKeyword(if) + LPAR + Group(condExpr)(condition) + COMMA + expr(if_true) + COMMA + expr(if_false) + RPAR) statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange + RPAR sumFunc = statFunc(sum) minFunc = statFunc(min) maxFunc = statFunc(max) aveFunc = statFunc(ave) funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc multOp = oneOf(* /) addOp = oneOf(+ -) numericLiteral = Regex(r\-?\d+(\.\d+)?) operand = numericLiteral | funcCall | cellRange | cellRef arithExpr = operatorPrecedence(operand, [ (multOp, 2, opAssoc.LEFT), (addOp, 2, opAssoc.LEFT), ]) textOperand = dblQuotedString | cellRef textExpr = operatorPrecedence(textOperand, [ ('', 2, opAssoc.LEFT), ]) expr (arithExpr | textExpr) import pprint for test in (test1,test2, test3): print test pprint.pprint( (EQ + expr).parseString(test).asList() ) print Prints: =3*A7+5 [[['3', '*', ['A', '7']], '+', '5']] =3*Sheet1!$A$7+5 [[['3', '*', ['Sheet1', 'A', '7']], '+', '5']] =if(Sum(A1:A25)42,Min(B1:B25), if(Sum(C1:C25)3.14, (Min(C1:C25)+3) *18,Max(B1:B25))) ['if', ['sum', [['A', '1'], ['A', '25']], '', '42'], 'min', [['B', '1'], ['B', '25']], 'if', ['sum', [['C', '1'], ['C', '25']], '', '3.14'], [['min', [['C', '1'], ['C', '25']], '+', '3'], '*', '18'], 'max', [['B', '1'], ['B', '25']]] -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Regex help needed!
On Dec 21, 5:38 am, Oltmans rolf.oltm...@gmail.com wrote: Hello,. everyone. I've a string that looks something like lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id = amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div From above string I need the digits within the ID attribute. For example, required output from above string is - 35343433 - 345343 - 8898 I've written this regex that's kind of working re.findall(\w+\s*\W+amazon_(\d+),str) The issue with using regexen for parsing HTML is that you often get surprised by attributes that you never expected, or out of order, or with weird or missing quotation marks, or tags or attributes that are in upper/lower case. BeautifulSoup is one tool to use for HTML scraping, here is a pyparsing example, with hopefully descriptive comments: from pyparsing import makeHTMLTags,ParseException src = lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id = amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div hello, my age is 86 years old and I was born in 1945. Do you know that PI is roughly 3.1443534534534534534 # use makeHTMLTags to return an expression that will match # HTML div tags, including attributes, upper/lower case, # etc. (makeHTMLTags will return expressions for both # opening and closing tags, but we only care about the # opening one, so just use the [0]th returned item div = makeHTMLTags(div)[0] # define a parse action to filter only for div tags # with the proper id form def filterByIdStartingWithAmazon(tokens): if not tokens.id.startswith(amazon_): raise ParseException( must have id attribute starting with 'amazon_') # define a parse action that will add a pseudo- # attribute 'amazon_id', to make it easier to get the # numeric portion of the id after the leading 'amazon_' def makeAmazonIdAttribute(tokens): tokens[amazon_id] = tokens.id[len(amazon_):] # attach parse action callbacks to the div expression - # these will be called during parse time div.setParseAction(filterByIdStartingWithAmazon, makeAmazonIdAttribute) # search through the input string for matching divs, # and print out their amazon_id's for divtag in div.searchString(src): print divtag.amazon_id Prints: 345343 35343433 8898 -- http://mail.python.org/mailman/listinfo/python-list
Re: How to create a docstring for a module?
On Dec 6, 7:43 am, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Sun, 06 Dec 2009 06:34:17 -0600, Tim Chase wrote: I've occasionally wanted something like this, and have found that it can be done by manually assigning to __doc__ (either at the module-level or classes) which can make some documentation bits a little easier: Unfortunately, and surprisingly, assigning to __doc__ doesn't work with new-style classes. -- Steven Fortunately, in the OP's case, he isn't trying to do this with a class, but with a module. For me, assigning to __doc__ at the module works in defining a docstring for pyparsing, at least for Py2.5. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: trouble with regex?
On Oct 8, 11:42 am, MRAB pyt...@mrabarnett.plus.com wrote: inhahe wrote: Can someone tell me why this doesn't work? colorre = re.compile ('(' '^' '|' '(?:' '\x0b(?:10|11|12|13|14|15|0\\d|\\d)' '(?:' ',(?:10|11|12|13|14|15|0\\d|\\d)' ')?' ')' ')(.*?)') I'm trying to extract mirc color codes. You might find this site interesting (http://utilitymill.com/utility/ Regex_For_Range) to generate RE's for numeric ranges. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: bug with itertools.groupby?
On Oct 6, 6:06 pm, Kitlbast vlad.shevche...@gmail.com wrote: grouped acc: 61 grouped acc: 64 grouped acc: 61 am I doing something wrong? sort first, then groupby. -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression to structure HTML
On Oct 2, 12:10 am, 504cr...@gmail.com 504cr...@gmail.com wrote: I'm kind of new to regular expressions, and I've spent hours trying to finesse a regular expression to build a substitution. What I'd like to do is extract data elements from HTML and structure them so that they can more readily be imported into a database. Oy! If I had a nickel for every misguided coder who tried to scrape HTML with regexes... Some reasons why RE's are no good at parsing HTML: - tags can be mixed case - tags can have whitespace in many unexpected places - tags with no body can combine opening and closing tag with a '/' before the closing '', as in BR/ - tags can have attributes that you did not expect (like BR CLEAR=ALL) - attributes can occur in any order within the tag - attribute names can also be in unexpected upper/lower case - attribute values can be enclosed in double quotes, single quotes, or even (surprise!) NO quotes For HTML that is machine-generated, you *may* be able to make some page-specific assumptions. But if edited by human hands, or if you are trying to make a generic page scraper, RE's will never cut it. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Are min() and max() thread-safe?
On Sep 16, 11:33 pm, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: I have two threads, one running min() and the other running max() over the same list. I'm getting some mysterious results which I'm having trouble debugging. Are min() and max() thread-safe, or am I doing something fundamentally silly by having them walk over the same list simultaneously? If you are calculating both min and max of a sequence, here is an algorithm that can cut your comparisons by 25% - for objects with rich/ time-consuming comparisons, that can really add up. import sys if sys.version[0] == 2: range = xrange def minmax(seq): if not seq: return None, None min_ = seq[0] max_ = seq[0] seqlen = len(seq) start = seqlen % 2 for i in range(start,seqlen,2): a,b = seq[i],seq[i+1] if a b: a,b = b,a if a min_: min_ = a if b max_: max_ = b return min_,max_ With this test code, I verified that the comparison count drops from 2*len to 1.5*len: if __name__ == __main__: import sys if sys.version[0] == 2: range = xrange import random def minmax_bf(seq): # brute force, just call min and max on sequence return min(seq),max(seq) testseq = [random.random() for i in range(100)] print minmax_bf(testseq) print minmax(testseq) class ComparisonCounter(object): tally = 0 def __init__(self,obj): self.obj = obj def __cmp__(self,other): ComparisonCounter.tally += 1 return cmp(self.obj,other.obj) def __getattr__(self,attr): return getattr(self.obj, attr) def __str__(self): return str(self.obj) def __repr__(self): return repr(self.obj) testseq = [ComparisonCounter(random.random()) for i in range (10001)] print minmax_bf(testseq) print ComparisonCounter.tally ComparisonCounter.tally = 0 print minmax(testseq) print ComparisonCounter.tally Plus, now that you are finding both min and max in a single pass through the sequence, you can wrap this in a lock to make sure of the atomicity of your results. (Just for grins, I also tried sorting the list and returning elements 0 and -1 for min and max - I got numbers of comparisons in the range of 12X to 15X the length of the sequence.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Retrieve url's of all jpegs at a web page URL
On Sep 15, 11:32 pm, Stefan Behnel stefan...@behnel.de wrote: Also untested: from lxml import html doc = html.parse(page_url) doc.make_links_absolute(page_url) urls = [ img.src for img in doc.xpath('//img') ] Then use e.g. urllib2 to save the images. Looks similar to what a pyparsing approach would look like: from pyparsing import makeHTMLTags, htmlComment import urllib html = urllib.urlopen(url).read() imgTag = makeHTMLTags(img)[0] imgTag.ignore(htmlComment) urls = [ img.src for img in imgTag.searchString(html) ] -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Where regexs listed for Python language's tokenizer/lexer?
On Sep 12, 1:10 am, Chris Seberino cseber...@gmail.com wrote: Where regexs listed for Python language's tokenizer/lexer? If I'm not mistaken, the grammar is not sufficient to specify the language you also need to specify the regexs that define the tokens right?..where is that? I think the OP is asking for the regexs that define the terminals referenced in the Python grammar, similar to those found in yacc token definitions. He's not implying that there are regexs that implement the whole grammar. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Something confusing about non-greedy reg exp match
On Sep 6, 11:23 pm, Ben Finney ben+pyt...@benfinney.id.au wrote: George Burdell gburde...@gmail.com writes: I want to find every occurrence of money, and for each occurrence, I want to scan back to the first occurrence of hello. How can this be done? By recognising the task: not expression matching, but lexing and parsing. For which you might find the ‘pyparsing’ library of use URL:http://pyparsing.wikispaces.com/. Even pyparsing has to go through some gyrations to do this sort of match, then backup parsing. Here is my solution: from pyparsing import SkipTo, originalTextFor expr = originalTextFor(hello + SkipTo(money, failOn=hello, include=True)) print expr.searchString('hello how are you hello funny money') [['hello funny money']] SkipTo is analogous to the OP's .*?, but the failOn attribute adds the logic if this string is found before matching the target string, then fail. So pyparsing scans through the string, matches the first hello, attempts to skip to the next occurrence of money, but finds another hello first, so this parse fails. Then the scan continues until the next hello is found, and this time, SkipTo successfully finds money without first hitting a hello. I then had to wrap the whole thing in a helper method originalTextFor, otherwise I get an ugly grouping of separate strings. So I still don't really have any kind of backup after matching parsing, I just turned this into a qualified forward match. One could do a similar thing with a parse action. If you could attach some kind of validating function to a field within a regex, you could have done the same thing there. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Smallest float different from 0.0?
On Sep 7, 9:47 am, kj no.em...@please.post wrote: Is there some standardized way (e.g. some official module of such limit constants) to get the smallest positive float that Python will regard as distinct from 0.0? TIA! kj You could find it for yourself: for i in range(400): ...if 10**-i == 0: ... print i ... break ... 324 -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Creating slice notation from string
On Sep 2, 4:55 pm, bvdp b...@mellowood.ca wrote: I'm trying to NOT create a parser to do this and I'm sure that it's easy if I could only see the light! Well, this is a nice puzzler, better than a sudoku. Maybe a quick parser with pyparsing will give you some guidance on how to do this without a parser library: from pyparsing import * # relevant punctuation, suppress after parsing LBR,RBR,COLON = map(Suppress,[]:) # expression to parse numerics and convert to int's integer = Regex(-?\d+).setParseAction(lambda t: int(t[0])) # first try, almost good enough, but wrongly parses [2] - [2::] sliceExpr = ( LBR + Optional(integer,default=None) + Optional(COLON + Optional(integer,default=None), default=None) + Optional(COLON + Optional(integer,default=None), default=None) + RBR ) # better, this version special-cases [n] - [n:n+1] # otherwise, just create slice from parsed int's singleInteger = integer + ~FollowedBy(COLON) singleInteger.setParseAction(lambda t : [t[0],t[0]+1]) sliceExpr = ( LBR + (singleInteger | Optional(integer,default=None) + Optional(COLON + Optional(integer,default=None), default=None) + Optional(COLON + Optional(integer,default=None), default=None) ) + RBR ) # attach parse action to convert parsed int's to a slice sliceExpr.setParseAction(lambda t: slice(*(t.asList( tests = \ [2] [2:3] [2:] [2::2] [-1:-1:-1] [:-1] [::-1] [:].splitlines() testlist = range(10) for t in tests: parsedSlice = sliceExpr.parseString(t)[0] print t, parsedSlice, testlist[parsedSlice] Prints: [2] slice(2, 3, None) [2] [2:3] slice(2, 3, None) [2] [2:] slice(2, None, None) [2, 3, 4, 5, 6, 7, 8, 9] [2::2] slice(2, None, 2) [2, 4, 6, 8] [-1:-1:-1] slice(-1, -1, -1) [] [:-1] slice(None, -1, None) [0, 1, 2, 3, 4, 5, 6, 7, 8] [::-1] slice(None, None, -1) [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] [:] slice(None, None, None) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Yes, it is necessary to handle the special case of a slice that is really just a single index. If your list of parsed integers has only a single value n, then the slice constructor creates a slice (None,n,None). What you really want, if you want everything to create a slice, is to get slice(n,n+1,None). That is what the singleInteger special case does in the pyparsing parser. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Is behavior of += intentional for int?
On Aug 30, 2:33 am, Derek Martin c...@pizzashack.org wrote: THAT is why Python's behavior with regard to numerical objects is not intuitive, and frankly bizzare to me, and I dare say to others who find it so. Yes, that's right. BIZZARE. Can't we all just get along? I think the question boils down to where is the object?. In this statement: a = 3 which is the object, a or 3? There exist languages (such as C++) that allow you to override the '=' assignment as a class operator. So that I could create a class where I decided that assigning an integer value to it applies some application logic, probably the setting of some fundamental attribute. In that language, 'a' is the object, and 3 is a value being assigned to it. This can cause some consternation when a reader (or worse, maintainer) isn't familiar with my code, sees this simple assignment, and figures that they can use 'a' elsewhere as a simple integer, with some surprising or disturbing results. Python just doesn't work that way. Python binds values to names. Always. In Python, = is not and never could be a class operator. In Python, any expression of LHS = RHS, LHS is always a name, and in this statement it is being bound to some object found by evaluating the right hand side, RHS. The bit of confusion here is that the in-place operators like +=, -=, etc. are something of a misnomer - obviously a *name* can't be incremented or decremented (unlike a pointer in C or C++). One has to see that these are really shortcuts for LHS = LHS + RHS, and once again, our LHS is just a name getting bound to the result of LHS + RHS. Is this confusing, or non-intuitive? Maybe. Do you want to write code in Python? Get used to it. It is surprising how many times we think things are intuitive when we really mean they are familiar. For long-time C and Java developers, it is intuitive that variables are memory locations, and switching to Python's name model for them is non-intuitive. As for your quibble about 3 is not an object, I'm afraid that may be your own personal set of blinders. Integer constants as objects is not unique to Python, you can see it in other languages - Smalltalk and Ruby are two that I know personally. Ruby implements a loop using this interesting notation: 3.times do ...do something... end Of course, it is a core idiom of the language, and if I adopted Ruby, I would adopt its idioms and object model. Is it any odder that 3 is an object than that the string literal Hello, World! is an object? Perhaps we are just not reminded of it so often, because Python's int class defines no methods that are not __ special methods (type in dir(3) at the Python prompt). So we never see any Python code referencing a numeric literal and immediately calling a method on it, as in Ruby's simple loop construct. But we do see methods implemented on str like split(), and so about above across after against.split() gives me a list of the English prepositions that begin with a. We see this kind of thing often enough, we get accustomed to the objectness of string literals. It gets to be so familiar, it eventually seems intuitive. You yourself mentioned that intuition is subjective - unfortunately, the intuitiveness of a feature is often tied to its value as a coding concept, and so statements of non-intuitiveness can be interpreted as a slant against the virtue of that concept, or even against the language itself. Once we accept that 3 is an object, we clearly have to stipulate that there can be no changes allowed to it. 3 must *always* have the value of the integer between 2 and 4. So our language invokes the concept that some classes create instances that are immutable. For a Python long-timer like Mr. D'Aprano, I don't think he even consciously thinks about this kind of thing any more; his intuition has aligned with the Python stars, so he extrapolates from the OP's suggestion to the resulting aberrant behavior, as he posted it. You can dispute and rail at this core language concept if you like, but I think the more entrenched you become in the position that '3 is an object' is bizarre, the less enjoyable your Python work will be. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Is behavior of += intentional for int?
On Aug 30, 5:42 am, Paul McGuire pt...@austin.rr.com wrote: Python binds values to names. Always. In Python, = is not and never could be a class operator. In Python, any expression of LHS = RHS, LHS is always a name, and in this statement it is being bound to some object found by evaluating the right hand side, RHS. An interesting side note, and one that could be granted to the OP, is that Python *does* support the definition of class operator overrides for in-place assignment operators like += (by defining a method __iadd__). This is how numpy's values accomplish their mutability. It is surprising how many times we think things are intuitive when we really mean they are familiar. Of course, just as I was typing my response, Steve D'Aprano beat me to the punch. Maybe it's time we added a new acronym to this group's ongoing discussions: PDWTW, or Python doesn't work that way. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Is behavior of += intentional for int?
On Aug 29, 7:45 am, zaur szp...@gmail.com wrote: Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39) [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin Type copyright, credits or license() for more information. a=1 x=[a] id(a)==id(x[0]) True a+=1 a 2 x[0] 1 I thought that += should only change the value of the int object. But += create new. Is this intentional? ints are immutable. But your logic works fine with a mutable object, like a list: a = [1] x = [a] print id(a) == id(x[0]) True a += [1] print a [1, 1] print x[0] [1, 1] What exactly are you trying to do? -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: regexp help
On Aug 27, 1:15 pm, Bakes ba...@ymail.com wrote: If I were using the code: (?Pdata[0-9]+) to get an integer between 0 and 9, how would I allow it to register negative integers as well? With that + sign in there, you will actually get an integer from 0 to 9... -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: python fast HTML data extraction library
On Jul 22, 5:43 pm, Filip pink...@gmail.com wrote: My library, rather than parsing the whole input into a tree, processes it like a char stream with regular expressions. Filip - In general, parsing HTML with re's is fraught with easily-overlooked deviations from the norm. But since you have stepped up to the task, here are some comments on your re's: # You should use raw string literals throughout, as in: # blah_re = re.compile(r'sljdflsflds') # (note the leading r before the string literal). raw string literals # really help keep your re expressions clean, so that you don't ever # have to double up any '\' characters. # Attributes might be enclosed in single quotes, or not enclosed in any quotes at all. attr_re = re.compile('([\da-z]+?)\s*=\s*\(.*?)\', re.DOTALL | re.UNICODE | re.IGNORECASE) # Needs re.IGNORECASE, and can have tag attributes, such as BR CLEAR=ALL line_break_re = re.compile('br\/?', re.UNICODE) # what about HTML entities defined using hex syntax, such as #; amp_re = re.compile('\(?![a-z]+?\;)', re.UNICODE | re.IGNORECASE) How would you extract data from a table? For instance, how would you extract the data entries from the table at this URL: http://tf.nist.gov/tf-cgi/servers.cgi ? This would be a good example snippet for your module documentation. Try extracting all of the a href=...sldjlsfjd/a links from yahoo.com, and see how much of what you expect actually gets matched. Good luck! -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Override a method but inherit the docstring
On Jul 16, 8:01 pm, Ben Finney ben+pyt...@benfinney.id.au wrote: Howdy all, The following is a common idiom:: class FooGonk(object): def frobnicate(self): Frobnicate this gonk. basic_implementation(self.wobble) class BarGonk(FooGonk): def frobnicate(self): special_implementation(self.warble) The docstring for ‘FooGonk.frobnicate’ is, intentionally, perfectly applicable to the ‘BarGonk.frobnicate’ method also. Yet in overriding the method, the original docstring is not associated with it. What is the most Pythonic, DRY-adherent, and preferably least-ugly approach to override a method, but have the same docstring on both methods? Two ideas come to mind, the decorator way and the metaclass way. I am not a guru at either, but these two examples work: # the decorator way def inherit_docstring_from(cls): def docstring_inheriting_decorator(fn): fn.__doc__ = getattr(cls,fn.__name__).__doc__ return fn return docstring_inheriting_decorator class FooGonk(object): def frobnicate(self): Frobnicate this gonk. basic_implementation(self.wobble) class BarGonk(FooGonk): @inherit_docstring_from(FooGonk) def frobnicate(self): special_implementation(self.warble) bg = BarGonk() help(bg.frobnicate) Prints: Help on method frobnicate in module __main__: frobnicate(self) method of __main__.BarGonk instance Frobnicate this gonk. Using a decorator in this manner requires repeating the super class name. Perhaps there is a way to get the bases of BarGonk, but I don't think so, because at the time that the decorator is called, BarGonk is not yet fully defined. # The metaclass way from types import FunctionType class DocStringInheritor(type): def __new__(meta, classname, bases, classDict): newClassDict = {} for attributeName, attribute in classDict.items(): if type(attribute) == FunctionType: # look through bases for matching function by name for baseclass in bases: if hasattr(baseclass, attributeName): basefn = getattr(baseclass,attributeName) if basefn.__doc__: attribute.__doc__ = basefn.__doc__ break newClassDict[attributeName] = attribute return type.__new__(meta, classname, bases, newClassDict) class FooGonk2(object): def frobnicate(self): Frobnicate this gonk. basic_implementation(self.wobble) class BarGonk2(FooGonk2): __metaclass__ = DocStringInheritor def frobnicate(self): special_implementation(self.warble) bg = BarGonk2() help(bg.frobnicate) Prints: Help on method frobnicate in module __main__: frobnicate(self) method of __main__.BarGonk2 instance Frobnicate this gonk. This metaclass will walk the list of bases until the desired superclass method is found AND if that method has a docstring and only THEN does it attach the superdocstring to the derived class method. Please use carefully, I just did the metaclass thing by following Michael Foord's Metaclass tutorial (http://www.voidspace.org.uk/python/ articles/metaclasses.shtml), I may have missed a step or two. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: c++ Source Code for acm 2004-2005 problems
On Jul 12, 5:24 pm, Davood Vahdati davoodvahdati2...@gmail.com wrote: Dear Sirs And Madams : it is an Acm programming competition Questions in year 2004-2005 . could you please solve problems is question ? I Wan't C++ Source Code program About this questions OR Problems . thank you for your prompt attention to this matter 2 1 4 3 5 1 4 2 huge chunk of OT content snipped looking for the Python content in this post... m, nope, didn't find any... I guess the OP tried on a C++ newsgroup and got told to do his own homework, so he came here instead? -- http://mail.python.org/mailman/listinfo/python-list
Re: Examples of Python driven Microsoft UI Automation wanted
On Jul 9, 1:09 pm, DuaneKaufman duane.kauf...@gmail.com wrote: The application I wish to interact with is not my own, but an ERP system GUI front-end. I have used pywinauto to drive a Flash game running inside of an Internet Explorer browser - that's pretty GUI! -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: parsing times like 5 minutes ago?
On Jul 6, 7:21 pm, m...@pixar.com wrote: I'm looking for something like Tcl's [clock scan] command which parses human-readable time strings such as: % clock scan 5 minutes ago 1246925569 % clock scan tomorrow 12:00 1246993200 % clock scan today + 1 fortnight 1248135628 Does any such package exist for Python? Many TIA! Mark -- Mark Harrison Pixar Animation Studios I've been dabbling with such a parser with pyparsing - here is my progress so far: http://pyparsing.wikispaces.com/UnderDevelopment It parses these test cases: today tomorrow yesterday in a couple of days a couple of days from now a couple of days from today in a day 3 days ago 3 days from now a day ago now 10 minutes ago 10 minutes from now in 10 minutes in a minute in a couple of minutes 20 seconds ago in 30 seconds 20 seconds before noon 20 seconds before noon tomorrow noon midnight noon tomorrow -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Code that ought to run fast, but can't due to Python limitations.
On Jul 5, 3:12 am, Hendrik van Rooyen m...@microcorp.co.za wrote: Use a dispatch dict, and have each state return the next state. Then you can use strings representing state names, and everybody will be able to understand the code. toy example, not tested, nor completed: protocol = {start:initialiser,hunt:hunter,classify:classifier,other states} def state_machine(): next_step = protocol[start]() while True: next_step = protocol[next_step]() I've just spent about an hour looking over this code, with a few comments to inject to the thread here: - To all those suggesting the OP convert to a dispatch table, be assured that this code is well aware of this idiom. It is used HEAVILY at a macro level, picking through the various HTML states (starting a tag, reading attributes, reading body, etc.). There still are a number of cascading if-elif's within some of these states, and some of them *may* be candidates for further optimization. - There is an underlying HTMLInputStream that seems to be doing some unnecessary position bookkeeping (positionLine and positionCol). Commenting this out increases my test speed by about 13%. In my ignorance, I may be removing some important behavior, but this does not seem to be critical as I tested against a few megs of HTML source. Before blaming the tokenizer for everything, there may be more performance to be wrung from the input stream processor. For that matter, I would guess that about 90% of all HTML files that this code would process would easily fit in memory - in that case, the stream processing (and all of the attendant if I'm not at the end of the current chunk code) could be skipped/removed entirely. - The HTMLInputStream's charsUntil code is an already-identified bottleneck, and some re enhancements have been applied here to help out. - Run-time construction of tuple literals where the tuple members are constants can be lifted out. emitCurrentToken rebuilds this tuple every time it is called (which is a lot!): if (token[type] in (tokenTypes[StartTag], tokenTypes [EndTag], tokenTypes[EmptyTag])): Move this tuple literal into a class constant (or if you can tolerate it, a default method argument to gain LOAD_FAST benefits - sometimes optimization isn't pretty). - These kinds of optimizations are pretty small, and only make sense if they are called frequently. Tallying which states are called in my test gives the following list in decreasing frequency. Such a list would help guide your further tuning efforts: tagNameState194848 dataState 182179 attributeNameState 116507 attributeValueDoubleQuotedState 114931 tagOpenState105556 beforeAttributeNameState58612 beforeAttributeValueState 58216 afterAttributeValueState58083 closeTagOpenState 50547 entityDataState 1673 attributeValueSingleQuotedState 1098 commentEndDashState 372 markupDeclarationOpenState 370 commentEndState 364 commentStartState 362 commentState362 selfClosingStartTagState359 doctypePublicIdentifierDoubleQuotedState291 doctypeSystemIdentifierDoubleQuotedState247 attributeValueUnQuotedState 191 doctypeNameState32 beforeDoctypePublicIdentifierState 16 afterDoctypePublicIdentifierState 14 afterDoctypeNameState 9 doctypeState8 beforeDoctypeNameState 8 afterDoctypeSystemIdentifierState 6 afterAttributeNameState 5 commentStartDashState 2 bogusCommentState 2 For instance, I wouldn't bother doing much tuning of the bogusCommentState. Anything called fewer than 50,000 times in this test doesn't look like it would be worth the trouble. -- Paul (Thanks to those who suggested pyparsing as an alternative, but I think this code is already beyond pyparsing in a few respects. For one thing, this code works with an input stream, in order to process large HTML files; pyparsing *only* works with an in-memory string. This code can also take advantage of some performance short cuts, knowing that it is parsing HTML; pyparsing's generic classes can't do that.) -- http://mail.python.org/mailman/listinfo/python-list
Re: How to insert string in each match using RegEx iterator
On Jun 9, 11:13 pm, 504cr...@gmail.com 504cr...@gmail.com wrote: By what method would a string be inserted at each instance of a RegEx match? Some might say that using a parsing library for this problem is overkill, but let me just put this out there as another data point for you. Pyparsing (http://pyparsing.wikispaces.com) supports callbacks that allow you to embellish the matched tokens, and create a new string containing the modified text for each match of a pyparsing expression. Hmm, maybe the code example is easier to follow than the explanation... from pyparsing import Word, nums, Regex # an integer is a 'word' composed of numeric characters integer = Word(nums) # or use this if you prefer integer = Regex(r'\d+') # attach a parse action to prefix 'INSERT ' before the matched token integer.setParseAction(lambda tokens: INSERT + tokens[0]) # use transformString to search through the input, applying the # parse action to all matches of the given expression test = '123 abc 456 def 789 ghi' print integer.transformString(test) # prints # INSERT 123 abc INSERT 456 def INSERT 789 ghi I offer this because often the simple examples that get posted are just the barest tip of the iceberg of what the poster eventually plans to tackle. Good luck in your Pythonic adventure! -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: random number including 1 - i.e. [0,1]
On Jun 9, 11:23 pm, Esmail ebo...@hotmail.com wrote: Here is part of the specification of an algorithm I'm implementing that shows the reason for my original query: vid = w * vid + c1 * rand( ) * ( pid – xid ) + c2 * Rand( ) * (pgd –xid ) (1a) xid = xid + vid (1b) where c1 and c2 are two positive constants, rand() and Rand() are two random functions in the range [0,1], ^ and w is the inertia weight. It is entirely possible that the documentation you have for the original rand() and Rand() functions have misstated their range. In my experience, rand() functions that I have worked with have always been [0,1). -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: random number including 1 - i.e. [0,1]
On Jun 9, 4:33 pm, Esmail ebo...@hotmail.com wrote: Hi, random.random() will generate a random value in the range [0, 1). Is there an easy way to generate random values in the range [0, 1]? I.e., including 1? Are you trying to generate a number in the range [0,n] by multiplying a random function that returns [0,1] * n? If so, then you want to do this using: int(random.random()*(n+1)) This will give equal chance of getting any number from 0 to n. If you had a function that returned a random in the range [0,1], then multiplying by n and then truncating would give only the barest sliver of a chance of giving the value n. You could try rounding, but then you get this skew: 0 for values [0, 0.5) (width of 0.5) 1 for value [0.5, 1.5) (width of 1) ... n for value [n-0.5, n] (width of ~0.50001) Still not a uniform die roll. You have only about 1/2 the probability of getting 0 or n as any other value. If you want to perform a fair roll of a 6-sided die, you would start with int(random.random() * 6). This gives a random number in the range [0,5], with each value of the same probability. How to get our die roll that goes from 1 to 6? Add 1. Thus: die_roll = lambda : int(random.random() * 6) + 1 Or for a n-sided die: die_roll = lambda n : int(random.random() * n) + 1 This is just guessing on my part, but otherwise, I don't know why you would care if random.random() returned values in the range [0,1) or [0,1]. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: networking simulator on python
On Jun 8, 7:18 pm, Ala shaib...@ymail.com wrote: Hello everyone. I plan on starting to write a network simulator on python for testing a modified version of TCP. I am wondering if a python network simulator exists? Also, if anyone tried using simpy for doing a simulation. Thank you There was an article on just this topic in the April issue of Python Magazine. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: I need help building a data structure for a state diagram
On May 24, 1:16 pm, Matthew Wilson m...@tplus1.com wrote: I'm working on a really simple workflow for my bug tracker. I want filed bugs to start in an UNSTARTED status. From there, they can go to STARTED. I just wrote an article for the April issue of Python Magazine on how to add embedded DSL code to your Python scripts using Python's imputil module, and I used a state pattern for my example. Two state machine examples I used to illustrate the work were a traffic light and a library book checkin/checkout. The traffic light state machine is just a simple cycle through the 3 light states. But the library book state machine is more complex (your bug tracking example made me think of it), with different transitions allowed one state into multiple different states. Here is how the code looks for these examples: == # trafficLight.pystate statemachine TrafficLight: Red - Green Green - Yellow Yellow - Red Red.carsCanGo= False Yellow.carsCanGo = True Green.carsCanGo = True # ... other class definitions for state-specific behavior ... == # trafficLightDemo.py # add support for .pystate files, with # embedded state machine DSL import stateMachine import trafficLight tlight = trafficLight.Red() while 1: print tlight, GO if tlight.carsCanGo else STOP tlight.delay() tlight = tlight.next_state() == # libraryBook.pystate statemachine BookCheckout: New -(create)- Available Available -(reserve)- Reserved Available -(checkout)- CheckedOut Reserved -(cancel)- Available Reserved -(checkout)- CheckedOut CheckedOut -(checkin)- Available CheckedOut -(renew)- CheckedOut You don't need to adopt this whole DSL implementation, but the article might give you some other ideas. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: slice iterator?
On May 8, 11:14 pm, Ned Deily n...@acm.org wrote: In article 7xprejoswg@ruckus.brouhaha.com, Paul Rubin http://phr...@nospam.invalid wrote: Ross ross.j...@gmail.com writes: I have a really long list that I would like segmented into smaller lists. Let's say I had a list a = [1,2,3,4,5,6,7,8,9,10,11,12] and I wanted to split it into groups of 2 or groups of 3 or 4, etc. Is there a way to do this without explicitly defining new lists? That question comes up so often it should probably be a standard library function. Anyway, here is an iterator, if that's what you want: from itertools import islice a = range(12) xs = iter(lambda x=iter(a): list(islice(x,3)), []) print list(xs) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]] Of course, as the saying goes, there's more than one way to do it ;-) python2.6 itertools introduces the izip_longest function and the grouper recipe http://docs.python.org/library/itertools.html: def grouper(n, iterable, fillvalue=None): grouper(3, 'ABCDEFG', 'x') -- ABC DEF Gxx args = [iter(iterable)] * n return izip_longest(fillvalue=fillvalue, *args) -- Ned Deily, n...@acm.org- Hide quoted text - - Show quoted text - Here's a version that works pre-2.6: grouper = lambda iterable,size,fill=None : zip(*[(iterable+[fill,]*(size-1))[i::size] for i in range(size)]) a = range(12) grouper(a,6) [(0, 1, 2, 3, 4, 5), (6, 7, 8, 9, 10, 11)] grouper(a,5) [(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, None, None, None)] grouper(a,3) [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11)] -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: string processing question
On Apr 30, 11:55 am, Kurt Mueller m...@problemlos.ch wrote: Hi, on a Linux system and python 2.5.1 I have the following behaviour which I do not understand: case 1 python -c 'a=ä; print a ; print a.center(6,-) ; b=unicode(a, utf8); print b.center(6,-)' ä --ä-- --ä--- Weird. What happens if you change the second print statement to: print b.center(6,u-) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: if statement, with function inside it: if (t = Test()) == True:
On Apr 24, 5:00 am, GC-Martijn gcmart...@gmail.com wrote: Hello, I'm trying to do a if statement with a function inside it. I want to use that variable inside that if loop , without defining it. def Test(): return 'Vla' I searching something like this: if (t = Test()) == 'Vla': print t # Vla Here is a thread from 3 weeks ago on this very topic, with a couple of proposed solutions. http://groups.google.com/group/comp.lang.python/browse_frm/thread/9f8e79fa28d69905/e934c73ee3c2dbc2?hl=enq= -- Paul -- http://mail.python.org/mailman/listinfo/python-list
ANN: pyparsing 1.5.2 released!
Well, it has been about 6 months since the release of pyparsing 1.5.1, and there have been no new functional enhancements to pyparsing. I take this as a further sign that pyparsing is reaching a development/ maturity plateau. With the help of the pyparsing community, there are some compatibility upgrades, and few bug fixes. The major news is compatibility with Python 3 and IronPython 2.0.1. Here is the high-level summary of what's new in pyparsing 1.5.2: - Removed __slots__ declaration on ParseBaseException, for compatibility with IronPython 2.0.1. Raised by David Lawler on the pyparsing wiki, thanks David! - Added pyparsing_py3.py module, so that Python 3 users can use pyparsing by changing their pyparsing import statement to: import pyparsing_py3 Thanks for help from Patrick Laban and his friend Geremy Condra on the pyparsing wiki. - Fixed bug in SkipTo/failOn handling - caught by eagle eye cpennington on the pyparsing wiki! - Fixed second bug in SkipTo when using the ignore constructor argument, reported by Catherine Devlin, thanks! - Fixed obscure bug reported by Eike Welk when using a class as a ParseAction with an errant __getitem__ method. - Simplified exception stack traces when reporting parse exceptions back to caller of parseString or parseFile - thanks to a tip from Peter Otten on comp.lang.python. - Changed behavior of scanString to avoid infinitely looping on expressions that match zero-length strings. Prompted by a question posted by ellisonbg on the wiki. - Enhanced classes that take a list of expressions (And, Or, MatchFirst, and Each) to accept generator expressions also. This can be useful when generating lists of alternative expressions, as in this case, where the user wanted to match any repetitions of '+', '*', '#', or '.', but not mixtures of them (that is, match '+++', but not '+-+'): codes = +*#. format = MatchFirst(Word(c) for c in codes) Based on a problem posed by Denis Spir on the Python tutor list. - Added new example eval_arith.py, which extends the example simpleArith.py to actually evaluate the parsed expressions. Download pyparsing 1.5.2 at http://sourceforge.net/projects/pyparsing/. The pyparsing Wiki is at http://pyparsing.wikispaces.com -- Paul Pyparsing is a pure-Python class library for quickly developing recursive-descent parsers. Parser grammars are assembled directly in the calling Python code, using classes such as Literal, Word, OneOrMore, Optional, etc., combined with operators '+', '|', and '^' for And, MatchFirst, and Or. No separate code-generation or external files are required. Pyparsing can be used in many cases in place of regular expressions, with shorter learning curve and greater readability and maintainability. Pyparsing comes with a number of parsing examples, including: - Hello, World! (English, Korean, Greek, and Spanish(new)) - chemical formulas - configuration file parser - web page URL extractor - 5-function arithmetic expression parser - subset of CORBA IDL - chess portable game notation - simple SQL parser - Mozilla calendar file parser - EBNF parser/compiler - Python value string parser (lists, dicts, tuples, with nesting) (safe alternative to eval) - HTML tag stripper - S-expression parser - macro substitution preprocessor -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: pyparsing 1.5.2 released!
Well, it has been about 6 months since the release of pyparsing 1.5.1, and there have been no new functional enhancements to pyparsing. I take this as a further sign that pyparsing is reaching a development/ maturity plateau. With the help of the pyparsing community, there are some compatibility upgrades, and few bug fixes. The major news is compatibility with Python 3 and IronPython 2.0.1. Here is the high-level summary of what's new in pyparsing 1.5.2: - Removed __slots__ declaration on ParseBaseException, for compatibility with IronPython 2.0.1. Raised by David Lawler on the pyparsing wiki, thanks David! - Added pyparsing_py3.py module, so that Python 3 users can use pyparsing by changing their pyparsing import statement to: import pyparsing_py3 Thanks for help from Patrick Laban and his friend Geremy Condra on the pyparsing wiki. - Fixed bug in SkipTo/failOn handling - caught by eagle eye cpennington on the pyparsing wiki! - Fixed second bug in SkipTo when using the ignore constructor argument, reported by Catherine Devlin, thanks! - Fixed obscure bug reported by Eike Welk when using a class as a ParseAction with an errant __getitem__ method. - Simplified exception stack traces when reporting parse exceptions back to caller of parseString or parseFile - thanks to a tip from Peter Otten on comp.lang.python. - Changed behavior of scanString to avoid infinitely looping on expressions that match zero-length strings. Prompted by a question posted by ellisonbg on the wiki. - Enhanced classes that take a list of expressions (And, Or, MatchFirst, and Each) to accept generator expressions also. This can be useful when generating lists of alternative expressions, as in this case, where the user wanted to match any repetitions of '+', '*', '#', or '.', but not mixtures of them (that is, match '+++', but not '+-+'): codes = +*#. format = MatchFirst(Word(c) for c in codes) Based on a problem posed by Denis Spir on the Python tutor list. - Added new example eval_arith.py, which extends the example simpleArith.py to actually evaluate the parsed expressions. Download pyparsing 1.5.2 at http://sourceforge.net/projects/pyparsing/. The pyparsing Wiki is at http://pyparsing.wikispaces.com -- Paul Pyparsing is a pure-Python class library for quickly developing recursive-descent parsers. Parser grammars are assembled directly in the calling Python code, using classes such as Literal, Word, OneOrMore, Optional, etc., combined with operators '+', '|', and '^' for And, MatchFirst, and Or. No separate code-generation or external files are required. Pyparsing can be used in many cases in place of regular expressions, with shorter learning curve and greater readability and maintainability. Pyparsing comes with a number of parsing examples, including: - Hello, World! (English, Korean, Greek, and Spanish(new)) - chemical formulas - configuration file parser - web page URL extractor - 5-function arithmetic expression parser - subset of CORBA IDL - chess portable game notation - simple SQL parser - Mozilla calendar file parser - EBNF parser/compiler - Python value string parser (lists, dicts, tuples, with nesting) (safe alternative to eval) - HTML tag stripper - S-expression parser - macro substitution preprocessor -- http://mail.python.org/mailman/listinfo/python-list
Re: Help improve program for parsing simple rules
On Apr 16, 10:57 am, prueba...@latinmail.com wrote: Another interesting task for those that are looking for some interesting problem: I inherited some rule system that checks for programmers program outputs that to be ported: given some simple rules and the values it has to determine if the program is still working correctly and give the details of what the values are. If you have a better idea of how to do this kind of parsing please chime in. I am using tokenize but that might be more complex than it needs to be. This is what I have come up so far: I've been meaning to expand on pyparsing's simpleArith.py example for a while, to include the evaluation of the parsed tokens. Here is the online version, http://pyparsing.wikispaces.com/file/view/eval_arith.py, it will be included in version 1.5.2 (coming shortly). I took the liberty of including your rule set as a list of embedded test cases. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Help improve program for parsing simple rules
On Apr 17, 10:43 am, John Machin sjmac...@lexicon.net wrote: I don't see how it can handle the chained relop in the last two testcases e. g. '0.00 LE A LE 4.00' -- unless relops are chained by default in your parser. John - First of all, to respect precedence of operations, higher level precedences are parsed and grouped first. If you left off the parse actions and just printed out the parse tree created by the example (using asList()), for A + B * C you would get ['A', '+', [ 'B', '*', 'C' ]]. If you expand that test case to A + B * C + D, you would get ['A', '+', [ 'B', '*', 'C' ], '+', 'D' ]. This is counter to the conventional infix parser that would create [['A', '+', [ 'B', '*', 'C' ]], '+', 'D' ], in which binary operators typicaly return 'operand' 'operator' 'operand' triples, and either operand might be a nested parse tree. As it happens, when using pyparsing's operatorPrecedence helper, *all* binary operators at the same precedence level are actually parsed in a single chain. This is why you see this logic in EvalAddOp.eval: def eval(self): sum = self.value[0].eval() for op,val in operatorOperands(self.value[1:]): if op == '+': sum += val.eval() if op == '-': sum -= val.eval() return sum operatorOperands is a little generator that returns operator-operand pairs, beginning at the second (that is, the 1th) token in the list. You can't just do the simple evaluation of operand1 operator operand2, you have to build up the sum by first evaluating operand1, and then iterating over the operator-operand pairs in the rest of the list. Same thing for the muliplication operators. For the comparison operators, things are a little more involved. operand1 operator1 operand2 operator2 operand3 (as in 0.00 LE A LE 4.00) has to evaluate as op1 operator1 op2 AND op2 operator2 op3 So EvalComparisonOp's eval method looks like: def eval(self): val1 = self.value[0].eval() ret = True for op,val in operatorOperands(self.value[1:]): fn = EvalComparisonOp.opMap[op] val2 = val.eval() ret = ret and fn(val1,val2) val1 = val2 return ret The first term is evaluated and stored in val1. Then each comparisonop-operand pair is extracted, the operand is eval()'ed and stored in val2, and the comparison method that is mapped to the comparisonop is called using val1 and val2. Then, to move on to do the next comparison, val2 is stored into val1, and the we iterate to the next comparison-operand pair. In fact, not only does this handle 0.00 LE A LE 4.00, but it could also evaluate 0.00 LE A LE 4.00 LE E D. (I see that I should actually do some short-circuiting here - if ret is false after calling fn(val1,val2), I should just break out at that point. I'll have that fixed in the online version shortly.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Help improve program for parsing simple rules
On Apr 17, 1:26 pm, Aaron Brady castiro...@gmail.com wrote: Hi, not to offend; I don't know your background. Courtesy on Usenet!!! I'm going to go buy a lottery ticket! Not to worry, I'm a big boy. People have even called my baby ugly, and I manage to keep my blood pressure under control. One thing I like about Python is it and the docs are careful about short-circuiting conditions. ISTR that C left some of those details up to the compiler at one point. def f(): ... print( 'in f' ) ... return 10 ... 0f()20 in f True 0f() and f()20 in f in f True Therefore, if op{n} has side effects, 'op1 operator1 op2 AND op2 operator2 op3' is not equivalent to 'op1 optor1 op2 optor2 op3'. Interesting point, but I don't remember that A B C is valid C syntax, are you perhaps thinking of a different language? By luck, my implementation of EvalComparisonOp.eval does in fact capture the post-eval value of op2, so that if its evaluation caused any side effects, they would not be repeated. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: question about xrange performance
On Apr 17, 1:39 pm, _wolf wolfgang.l...@gmail.com wrote: can it be that a simple diy-class outperforms a python built-in by a factor of 180? is there something i have done the wrong way? omissions, oversights? do other people get similar figures? cheers I wouldn't say you are outperforming xrange until your class also supports: for i in xxrange( 1, 2 ): # do something with i Wouldn't be difficult, but you're not there yet. And along the lines with MRAB's comments, xrange is not really intended for in testing, it is there for iteration over a range without constructing the list of range elements first, which one notices right away when looping over xrange(1e8) vs. range(1e8). Your observation is especially useful to keep in mind as Python 3 now imbues range with xrange behavior, so if you have code that tests blah in range(blee,bloo):, you will get similar poor results.) And of course, you are cheating a bit with your xxrange in test, since you aren't really verifying that the number is actually in the given list, you are just testing against the extrema, and relying on your in-built knowledge that xrange (as you are using it) contains all the intermediate values. Compare to testing with xrange(1,100,2) and you'll find that 10 is *not* in this range, even though 1 = 10 100. (Extending xxrange to do this as well is also One might wonder why you are even writing code to test for existence in a range list, when blee = blah bloo is obviously going to outperform this kind of code. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Help improve program for parsing simple rules
On Apr 17, 2:40 pm, prueba...@latinmail.com wrote: On Apr 17, 11:26 am, Paul McGuire pt...@austin.rr.com wrote: On Apr 16, 10:57 am, prueba...@latinmail.com wrote: Another interesting task for those that are looking for some interesting problem: I inherited some rule system that checks for programmers program outputs that to be ported: given some simple rules and the values it has to determine if the program is still working correctly and give the details of what the values are. If you have a better idea of how to do this kind of parsing please chime in. I am using tokenize but that might be more complex than it needs to be. This is what I have come up so far: I've been meaning to expand on pyparsing's simpleArith.py example for a while, to include the evaluation of the parsed tokens. Here is the online version,http://pyparsing.wikispaces.com/file/view/eval_arith.py, it will be included in version 1.5.2 (coming shortly). I took the liberty of including your rule set as a list of embedded test cases. -- Paul That is fine with me. I don't know how feasible it is for me to use pyparsing for this project considering I don't have admin access on the box that is eventually going to run this. To add insult to injury Python is in the version 2-3 transition (I really would like to push the admins to install 3.1 by the end of the year before the amount of code written by us gets any bigger) meaning that any third party library is an additional burden on the future upgrade. I can't remember if pyparsing is pure Python. If it is I might be able to include it alongside my code if it is not too big.- Hide quoted text - - Show quoted text - It *is* pure Python, and consists of a single source file for the very purpose of ease-of-inclusion. A number of projects include their own versions of pyparsing for version compatibility management, matplotlib is one that comes to mind. The upcoming version 1.5.2 download includes a pyparsing_py3.py file for Python 3 compatibility, I should have that ready for users to download *VERY SOON NOW*! -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: regex alternation problem
On Apr 17, 4:49 pm, Jesse Aldridge jessealdri...@gmail.com wrote: import re s1 = I am an american s2 = I am american an for s in [s1, s2]: print re.findall( (am|an) , s) # Results: # ['am'] # ['am', 'an'] --- I want the results to be the same for each string. What am I doing wrong? Does it help if you expand your RE to its full expression, with '_'s where the blanks go: _am_ or _an_ Now look for these in I_am_an_american. After the first _am_ is processed, findall picks up at the leading 'a' of 'an', and there is no leading blank, so no match. If you search through I_am_american_an_, both am and an have surrounding spaces, so both match. Instead of using explicit spaces, try using '\b' meaning word break: import re re.findall(r\b(am|an)\b, I am an american) ['am', 'an'] re.findall(r\b(am|an)\b, I am american an) ['am', 'an'] -- Paul Your find pattern includes (and consumes) a leading AND trailing space around each word. In the first string I am an american, there is a leading and trailing space around am, but the trailing space for am is the leading space for an, so an -- http://mail.python.org/mailman/listinfo/python-list
Re: regex alternation problem
On Apr 17, 5:28 pm, Paul McGuire pt...@austin.rr.com wrote: -- Paul Your find pattern includes (and consumes) a leading AND trailing space around each word. In the first string I am an american, there is a leading and trailing space around am, but the trailing space for am is the leading space for an, so an - Hide quoted text - Oops, sorry, ignore debris after sig... -- http://mail.python.org/mailman/listinfo/python-list
Re: Automatically generating arithmetic operations for a subclass
On Apr 14, 4:09 am, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: I have a subclass of int where I want all the standard arithmetic operators to return my subclass, but with no other differences: class MyInt(int): def __add__(self, other): return self.__class__(super(MyInt, self).__add__(other)) # and so on for __mul__, __sub__, etc. My quick-and-dirty count of the __magic__ methods that need to be over- ridden comes to about 30. That's a fair chunk of unexciting boilerplate. Something like this maybe? def takesOneArg(fn): try: fn(1) except TypeError: return False else: return True class MyInt(int): pass template = MyInt.__%s__ = lambda self, other: self.__class__(super (MyInt, self).__%s__(other)) fns = [fn for fn in dir(int) if fn.startswith('__') and takesOneArg (getattr(1,fn))] print fns for fn in fns: exec(template % (fn,fn)) Little harm in this usage of exec, since it is your own code that you are running. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: safe eval of moderately simple math expressions
On Apr 11, 2:41 am, Aaron Brady castiro...@gmail.com wrote: Why do I get the feeling that the authors of 'pyparsing' are out of breath? What kind of breathlessness do you mean? I'm still breathing, last time I checked. The-rumors-of-my-demise-have-been-greatly-exaggerated'ly yours, -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: safe eval of moderately simple math expressions
On Apr 9, 10:56 am, Joel Hedlund joel.hedl...@gmail.com wrote: Hi all! I'm writing a program that presents a lot of numbers to the user, and I want to let the user apply moderately simple arithmentics to these numbers. Joel - Take a look at the examples page on the pyparsing wiki (http:// pyparsing.wikispaces.com/Examples). Look at the examples fourFn.py and simpleArith.py for some expression parsers that you could extend to support whatever math builtins you wish. Since you would be doing your own parsing and eval code, you could be sure that no dangerous code was being run, just simple arithmetic. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: Best way to extract from regex in if statement
On Apr 3, 9:26 pm, Paul Rubin http://phr...@nospam.invalid wrote: bwgoudey bwgou...@gmail.com writes: elif re.match(^DATASET:\s*(.+) , line): m=re.match(^DATASET:\s*(.+) , line) print m.group(1)) Sometimes I like to make a special class that saves the result: class Reg(object): # illustrative code, not tested def match(self, pattern, line): self.result = re.match(pattern, line) return self.result I took this a little further, *and* lightly tested it too. Since this idiom makes repeated references to the input line, I added that to the constructor of the matching class. By using __call__, I made the created object callable, taking the RE expression as its lone argument and returning a boolean indicating match success or failure. The result of the re.match call is saved in self.matchresult. By using __getattr__, the created object proxies for the results of the re.match call. I think the resulting code looks pretty close to the original C or Perl idiom of cascading elif (c=re_expr_match(...)) blocks. (I thought about cacheing previously seen REs, or adding support for compiled REs instead of just strings - after all, this idiom usually occurs in a loop while iterating of some large body of text. It turns out that the re module already caches previously compiled REs, so I left my cacheing out in favor of that already being done in the std lib.) -- Paul import re class REmatcher(object): def __init__(self,sourceline): self.line = sourceline def __call__(self, regexp): self.matchresult = re.match(regexp, self.line) self.success = self.matchresult is not None return self.success def __getattr__(self, attr): return getattr(self.matchresult, attr) This test: test = \ ABC 123 xyzzy Holy Hand Grenade Take the pebble from my hand, Grasshopper outfmt = '%s' is %s [%s] for line in test.splitlines(): matchexpr = REmatcher(line) if matchexpr(r\d+$): print outfmt % (line, numeric, matchexpr.group()) elif matchexpr(r[a-z]+$): print outfmt % (line, lowercase, matchexpr.group()) elif matchexpr(r[A-Z]+$): print outfmt % (line, uppercase, matchexpr.group()) elif matchexpr(r([A-Z][a-z]*)(\s[A-Z][a-z]*)*$): print outfmt % (line, a proper word or phrase, matchexpr.group()) else: print outfmt % (line, something completely different, ...) Produces: 'ABC' is uppercase [ABC] '123' is numeric [123] 'xyzzy' is lowercase [xyzzy] 'Holy Hand Grenade' is a proper word or phrase [Holy Hand Grenade] 'Take the pebble from my hand, Grasshopper' is something completely different [...] -- http://mail.python.org/mailman/listinfo/python-list
Re: python needs leaning stuff from other language
On Apr 3, 11:48 pm, Tim Wintle tim.win...@teamrubber.com wrote: del mylist[:] * or * mylist[:] = [] * or * mylist = [] which, although semantically similar are different as far as the interpreter are concerned (since two of them create a new list): Only the last item creates a new list of any consequence. The first two retain the original list and delete or discard the items in it. A temporary list gets created in the 2nd option, and is then used to assign new contents to mylist's [:] slice - so yes, technically, a new list *is* created in the case of this option. But mylist does not get bound to it as in the 3rd case. In case 2, mylist's binding is unchanged, and the temporary list gets GC'ed almost immediately. -- Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: List of paths
On Apr 1, 3:57 am, Nico Grubert nicogrub...@gmail.com wrote: Dear Python developers I have the following (sorted) list. ['/notebook', '/notebook/mac', '/notebook/mac/macbook', '/notebook/mac/macbookpro', '/notebook/pc', '/notebook/pc/lenovo', '/notebook/pc/hp', '/notebook/pc/sony', '/desktop', '/desktop/pc/dell', '/desktop/mac/imac', '/server/hp/proliant', '/server/hp/proliant/385', '/server/hp/proliant/585' ] I want to remove all paths x from the list if there is a path y in the list which is part of x so y.startswith(x) is true. The list I want to have is: ['/notebook', '/desktop', '/server/hp/proliant'] Any idea how I can do this in Python? Thanks in advance Nico paths = ['/notebook', '/notebook/mac', '/notebook/mac/macbook', '/notebook/mac/macbookpro', '/notebook/pc', '/notebook/pc/lenovo', '/notebook/pc/hp', '/notebook/pc/sony', '/desktop', '/desktop/pc/dell', '/desktop/mac/imac', '/server/hp/proliant', '/server/hp/proliant/385', '/server/hp/proliant/585' ] seen = set() basepaths = [ seen.add(s) or s for s in paths if not any(s.startswith(ss) for ss in seen) ] gives: ['/notebook', '/desktop', '/server/hp/proliant'] -- Paul -- http://mail.python.org/mailman/listinfo/python-list