from:"Paul McGuire"

[Python-announce] logmerger 0.8.0 released

2023-12-07 Thread Paul McGuire

logmerger 0.8.0
===

New features:
- Added --inline command line option to view merged logs in a single inline 
column instead of side-by-side columns (side-by-side is the default)
- Added jump feature to move by number of lines or by a time period in 
microseconds, milliseconds, seconds, minutes, hours or days

Fixes:
- Fixed type annotations that broke running logmerger on Python 3.9.

Screenshot:
https://github.com/ptmcg/logmerger/blob/main/static/log1_log2_merged_tui_lr.jpg?raw=true


Use logmerger to view multiple log files, merged side-by-side with a common 
timeline using timestamps from the input files.
- merge ASCII log files
- detects various formats of timestamps
- detects multiline log messages
- merge .gz files without previously gunzip'ing
- merge .pcap files
- merge .csv files

Browse the merged logs using a textual-based TUI:
- vertical scrolling
- horizontal scrolling
- search/find next/find previous
- jump by number of lines or by time interval
- go to line
- go to timestamp

TUI runs in a plain terminal window, so can be run over a regular SSH session.


Installation


Install from PyPi;

pip install logmerger

For PCAP merging support:

pip install logmerger[pcap]


Github repo: https://github.com/ptmcg/logmerger
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com

[Python-announce] logmerger 0.7.0

2023-10-08 Thread Paul McGuire

logmerger 0.7.0
===

Screenshot:
https://github.com/ptmcg/logmerger/blob/main/static/log1_log2_merged_tui_lr.jpg?raw=true


Use logmerger to view multiple log files, merged side-by-side with a common 
timeline using timestamps from the input files.

- merge ASCII log files
  - detects various formats of timestamps
  - detects multiline log messages
- merge .gz files without previously gunzip'ing
- merge .pcap files
- merge .csv files

Browse the merged logs using a textual-based TUI:
- vertical scrolling
- horizontal scrolling 
- search/find next/find previous
- go to line
- go to timestamp

TUI runs in a plain terminal window, so can be run over a regular SSH session.


Installation


Install from PyPi;

pip install logmerger

For PCAP merging support:

pip install logmerger[pcap]


Github repo:  https://github.com/ptmcg/logmerger

___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com

[Python-announce] pyparsing 3.1.1 released

2023-07-30 Thread Paul McGuire

Thanks everyone for the great feedback on the 3.1.0 release! Caught some 
glaring regressions that slipped through my test suite. Just published version 
3.1.1: https://github.com/pyparsing/pyparsing/releases/tag/3.1.1

- Fixed regression in Word(min), reported by Ricardo Coccioli, good catch! 
(Issue #502)

- Fixed bug in bad exception messages raised by Forward expressions. PR 
submitted by Kyle Sunden, thanks for your patience and collaboration on this 
(#493).

- Fixed regression in SkipTo, where ignored expressions were not checked when 
looking for the target expression. Reported by catcombo, Issue #500.

- Fixed type annotation for enable_packrat, PR submitted by Mike Urbach, 
thanks! (Issue #498)

- Some general internal code cleanup. (Instigated by Michal Čihař, Issue #488)
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com

[Python-announce] pyparsing 3.1.0 released

2023-06-18 Thread Paul McGuire

After several alpha and beta releases, I've finally pushed out version 3.1.0 of 
pyparsing. Here are the highlights:

NOTE: In the future release 3.2.0, use of many of the pre-PEP8 methods (such as 
`ParserElement.parseString`) will start to raise `DeprecationWarnings`. 3.2.0 
should get released some time later in 2023. I currently plan to completely 
drop the pre-PEP8 methods in pyparsing 4.0, though we won't see that release 
until at least late 2023 if not 2024. So there is plenty of time to convert 
existing parsers to the new function names before the old functions are 
completely removed. (Big help from Devin J. Pohly in structuring the code to 
enable this peaceful transition.)

Version 3.2.0 will also discontinue support for Python versions 3.6 and 3.7.


Version 3.1.0 - June, 2023
--

API CHANGES
---

- A slight change has been implemented when unquoting a quoted string parsed 
using the `QuotedString` class. Formerly, when unquoting and processing 
whitespace markers such as \t and \n, these substitutions would occur first, 
and then any additional '\' escaping would be done on the resulting string. 
This would parse "\\n" as "\". Now escapes and whitespace markers are 
all processed in a single pass working left to right, so the quoted string 
"\\n" would get unquoted to "\n" (a backslash followed by "n"). Fixes issue 
#474 raised by jakeanq, thanks!

- Reworked `delimited_list` function into the new `DelimitedList` class. 
`DelimitedList` has the same constructor interface as `delimited_list`, and in 
this release, `delimited_list` changes from a function to a synonym for 
`DelimitedList`. `delimited_list` and the older `delimitedList` method will be 
deprecated in a future release, in favor of `DelimitedList`.

- `ParserElement.validate()` is deprecated. It predates the support for 
left-recursive parsers, and was prone to false positives (warning that a 
grammar was invalid when it was in fact valid).  It will be removed in a future 
pyparsing release. In its place, developers should use debugging and analytical 
tools, such as `ParserElement.set_debug()` and 
`ParserElement.create_diagram()`. (Raised in Issue #444, thanks Andrea Micheli!)


NEW FEATURES AND ENHANCEMENTS
-

- `Optional(expr)` may now be written as `expr | ""`

  This will make this code:

  "{" + Optional(Literal("A") | Literal("a")) + "}"

  writable as:

  "{" + (Literal("A") | Literal("a") | "") + "}"

  Some related changes implemented as part of this work:
  - `Literal("")` now internally generates an `Empty()` (and no longer raises 
an exception)
  - `Empty` is now a subclass of `Literal`

  Suggested by Antony Lee (issue #412), PR (#413) by Devin J. Pohly.

- Added new class method `ParserElement.using_each`, to simplify code that 
creates a sequence of `Literals`, `Keywords`, or other `ParserElement` 
subclasses.

  For instance, to define suppressible punctuation, you would previously write:

  LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")

  You can now write:

  LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")

  `using_each` will also accept optional keyword args, which it will pass 
through to the class initializer. Here is an expression for single-letter 
variable names that might be used in an algebraic expression:

  algebra_var = MatchFirst(
  Char.using_each(string.ascii_lowercase, as_keyword=True)
  )

- Added new builtin `python_quoted_string`, which will match any form of 
single-line or multiline quoted strings defined in Python. (Inspired by 
discussion with Andreas Schörgenhumer in Issue #421.)

- Extended `expr[]` notation for repetition of `expr` to accept a slice, where 
the slice's stop value indicates a `stop_on` expression:

  test = "BEGIN aaa bbb ccc END"
  BEGIN, END = Keyword.using_each("BEGIN END".split())
  body_word = Word(alphas)

  expr = BEGIN + Group(body_word[...:END]) + END
  # equivalent to
  # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END

  print(expr.parse_string(test))

  Prints:

  ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']

- Added named field "url" to `pyparsing.common.url`, returning the entire 
parsed URL string.

- Added bool `embed` argument to `ParserElement.create_diagram()`. When passed 
as True, the resulting diagram will omit the ``, ``, and 
`` tags so that it can be embedded in other HTML source. (Useful when 
embedding a call to `create_diagram()` in a PyScript HTML page.)

- Added `recurse` argument to `ParserElement.set_debug` to set the debug flag 
on an expression and all of its sub-expressions. Requested by multimeric in 
Issue #399.

- Added '·' (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.

- `ParseResults` now has a new method `deepcopy()`, in addition to the current 
`copy()` method. `copy()` only makes a shallow copy - any contained 
`ParseResults` are copied as references -

[Python-announce] Pyparsing 3.1.0b2 released (final beta!)

2023-05-20 Thread Paul McGuire

I just pushed release 3.1.0b2 of pyparsing. 3.1.0 with some fixes to bugs that 
came up in the past few weeks - testing works!

If your project uses pyparsing, please please *please* download this beta 
release (using "pip install -U pyparsing==3.1.0b2") and open any compatibility 
issues you might have at the pyparsing GitHub repo 
(https://github.com/pyparsing/pyparsing).

In the absence of any dealbreakers, I'll make the final release in June.

You can view the changes here: 
https://github.com/pyparsing/pyparsing/blob/master/CHANGES
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com

[Python-announce] Pyparsing 3.1.0b1 released

2023-04-10 Thread Paul McGuire

I just pushed release 3.1.0b1 of pyparsing. 3.1.0 will include support for 
python 3.12, and will be the last release to support 3.6 and 3.7.

If your project uses pyparsing, *please* download this beta release (using "pip 
install -U pyparsing==3.1.0b1") and open any compatibility issues you might 
have at the pyparsing GitHub repo (https://github.com/pyparsing/pyparsing).

You can view the changes here: 
https://github.com/pyparsing/pyparsing/blob/master/CHANGES
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com

[issue27822] Fail to create _SelectorTransport with unbound socket

2016-08-21 Thread Paul McGuire


Paul McGuire added the comment:

Patch file attached.

--
keywords: +patch
Added file: http://bugs.python.org/file44182/ptm_27822.patch

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27822>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27746] ResourceWarnings in test_asyncio

2016-08-21 Thread Paul McGuire


Paul McGuire added the comment:

Ok, I will submit as a separate issue.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27746>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27822] Fail to create _SelectorTransport with unbound socket

2016-08-21 Thread Paul McGuire


Paul McGuire added the comment:

(issue applies to both 3.5.2 and 3.6)

--
versions: +Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27822>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27746] ResourceWarnings in test_asyncio

2016-08-21 Thread Paul McGuire


Paul McGuire added the comment:

I was about to report this same issue - I get the error message even though I 
explicitly call transport.close(): 

C:\Python35\lib\asyncio\selector_events.py:582: ResourceWarning: unclosed 
transport <_SelectorDatagramTransport closing fd=232>

It looks like the _sock attribute of the Transport subclasses must be set to 
None in their close() methods. (The presence of a non-None _sock is used 
elsewhere as an indicator of whether the transport has been closed or not.

--
nosy: +Paul McGuire

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27746>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27822] Fail to create _SelectorTransport with unbound socket

2016-08-21 Thread Paul McGuire


Paul McGuire added the comment:

To clarify how I'm using a socket without a bound address, I am specifying the 
destination address in the call to transport.sendto(), so there is no address 
on the socket itself, hence getsockname() fails.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27822>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27822] Fail to create _SelectorTransport with unbound socket

2016-08-21 Thread Paul McGuire


New submission from Paul McGuire:

In writing a simple UDP client using asyncio, I tripped over a call to 
getsockname() in the _SelectorTransport class in asyncio/selector_events.py.


def __init__(self, loop, sock, protocol, extra=None, server=None):
super().__init__(extra, loop)
self._extra['socket'] = sock
self._extra['sockname'] = sock.getsockname()

Since this is a sending-only client, the socket does not get bound to an 
address. On Linux, this is not a problem; getsockname() will return ('0.0.0.0', 
0) for IPV4, ('::', 0, 0, 0) for IPV6, and so on. But on Windows, a socket that 
is not bound to an address will raise this error when getsockname() is called:

OSError: [WinError 10022] An invalid argument was supplied

This forces me to write a wrapper for the socket to intercept getsockname() and 
return None.

In asyncio/proactor_events.py, this is guarded against, with this code in the 
_ProactorSocketTransport class:

try:
self._extra['sockname'] = sock.getsockname()
except (socket.error, AttributeError):
if self._loop.get_debug():
logger.warning("getsockname() failed on %r",
 sock, exc_info=True)


Please add similar guarding code to the _SelectorTransport class in 
asyncio/selector_events.py.

--
components: asyncio
messages: 273290
nosy: Paul McGuire, gvanrossum, haypo, yselivanov
priority: normal
severity: normal
status: open
title: Fail to create _SelectorTransport with unbound socket
type: behavior
versions: Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27822>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Re: Python re to extract useful information from each line

2015-08-19 Thread Paul McGuire

Here is a first shot at a pyparsing parser for these lines:

from pyparsing import *
SET,POLICY,ID,FROM,TO,NAT,SRC,DST,IP,PORT,SCHEDULE,LOG,PERMIT,ALLOW,DENY = 
map(CaselessKeyword,

SET,POLICY,ID,FROM,TO,NAT,SRC,DST,IP,PORT,SCHEDULE,LOG,PERMIT,ALLOW,DENY.split(','))

integer = Word(nums)
ipAddr = Combine(integer + ('.'+integer)*3)
quotedString.setParseAction(removeQuotes)

logParser = (SET + POLICY + ID + integer(id) + 
 FROM + quotedString(from_) + 
 TO + quotedString(to_) + quotedString(service))


I run this with:

for line in 
1- set policy id 1000 from Untrust to Trust Any 1.1.1.1 HTTP nat dst 
ip 10.10.10.10 port 8000 permit log 

2- set policy id 5000 from Trust to Untrust Any microsoft.com HTTP 
nat src permit schedule 14August2014 log 

3- set policy id 7000 from Trust to Untrust Users Any ANY nat src 
dip-id 4 permit log 

4- set policy id 7000 from Trust to Untrust servers Any ANY deny 

.splitlines():
line = line.strip()
if not line: continue
print (integer + '-' + logParser).parseString(line).dump()
print

Getting:

['1', '-', 'SET', 'POLICY', 'ID', '1000', 'FROM', 'Untrust', 'TO', 'Trust', 
'Any']
- from_: Untrust
- id: 1000
- service: Any
- to_: Trust

['2', '-', 'SET', 'POLICY', 'ID', '5000', 'FROM', 'Trust', 'TO', 'Untrust', 
'Any']
- from_: Trust
- id: 5000
- service: Any
- to_: Untrust

['3', '-', 'SET', 'POLICY', 'ID', '7000', 'FROM', 'Trust', 'TO', 'Untrust', 
'Users']
- from_: Trust
- id: 7000
- service: Users
- to_: Untrust

['4', '-', 'SET', 'POLICY', 'ID', '7000', 'FROM', 'Trust', 'TO', 'Untrust', 
'servers']
- from_: Trust
- id: 7000
- service: servers
- to_: Untrust


Pyparsing adds Optional classes so that you can include expressions for pieces 
that might be missing like ... + Optional(NAT + (SRC | DST)) + ...

-- Paul
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: python command not working

2015-08-17 Thread Paul McGuire

On Friday, August 14, 2015 at 6:13:37 AM UTC-5, sam.h...@gmail.com wrote:
 On Wednesday, April 22, 2009 at 8:36:21 AM UTC+1, David Cournapeau wrote:
  On Wed, Apr 22, 2009 at 4:20 PM, 83nini 83n...@gmail.com wrote:
   Hi guys,
  
   I'm new to python, i downloaded version 2.5, opened windows (vista)
   command line and wrote python, this should take me to the python

snip

 
 You can do it easily by adding the Python path (in my case C:\Python27) to 
 your system PATH.
 

This thread is  6 years old, OP has probably gone on to other things...
-- 
https://mail.python.org/mailman/listinfo/python-list

Who is using littletable?

2015-08-17 Thread Paul McGuire

littletable is a little module I knocked together a few years ago, found it 
sort of useful, so uploaded to SF and PyPI.  The download traffic at SF is very 
light, as I expected, but PyPI shows  3000 downloads in the past month!  Who 
*are* all these people?

In my own continuing self-education, it is interesting to see overlap in the 
basic goals in littletable, and the much more widely known pandas module (with 
littletable being more lightweight/freestanding, not requiring numpy, but 
correspondingly not as snappy).

I know Adam Sah uses (or at least used to use) littletable as an in-memory 
product catalog for his website Buyer's Best friend 
(http://www.bbfdirect.com/).  Who else is out there, and what enticed you to 
use this little module?

-- Paul
-- 
https://mail.python.org/mailman/listinfo/python-list

ANN: pyparsing 2.0.2 released

2014-04-19 Thread Paul McGuire

I'm happy to announce a new release of pyparsing, version 2.0.2. 
This release contains some small enhancements and some bugfixes.


Change summary:
---
- Extended expr(name) shortcut (same as expr.setResultsName(name))
  to accept expr() as a shortcut for expr.copy().

- Added locatedExpr(expr) helper, to decorate any returned tokens
  with their location within the input string. Adds the results names
  locn_start and locn_end to the output parse results.

- Added pprint() method to ParseResults, to simplify troubleshooting
  and prettified output. Now instead of importing the pprint module
  and then writing pprint.pprint(result), you can just write
  result.pprint().  This method also accepts additional positional and
  keyword arguments (such as indent, width, etc.), which get passed 
  through directly to the pprint method 
  (see http://docs.python.org/2/library/pprint.html#pprint.pprint).

- Removed deprecation warnings when using '' for Forward expression
  assignment. '=' is still preferred, but '' will be retained
  for cases where '=' operator is not suitable (such as in defining
  lambda expressions).

- Expanded argument compatibility for classes and functions that
  take list arguments, to now accept generators as well.

- Extended list-like behavior of ParseResults, adding support for
  append and extend. NOTE: if you have existing applications using
  these names as results names, you will have to access them using
  dict-style syntax: res[append] and res[extend]

- ParseResults emulates the change in list vs. iterator semantics for
  methods like keys(), values(), and items(). Under Python 2.x, these
  methods will return lists, under Python 3.x, these methods will 
  return iterators.

- ParseResults now has a method haskeys() which returns True or False
  depending on whether any results names have been defined. This simplifies
  testing for the existence of results names under Python 3.x, which 
  returns keys() as an iterator, not a list.

- ParseResults now supports both list and dict semantics for pop().
  If passed no argument or an integer argument, it will use list semantics
  and pop tokens from the list of parsed tokens. If passed a non-integer
  argument (most likely a string), it will use dict semantics and 
  pop the corresponding value from any defined results names. A
  second default return value argument is supported, just as in 
  dict.pop().

- Fixed bug in markInputline, thanks for reporting this, Matt Grant!

- Cleaned up my unit test environment, now runs with Python 2.6 and 
  3.3.



Download pyparsing 2.0.2 at http://sourceforge.net/projects/pyparsing/,
or use 'easy_install pyparsing'. You can also access pyparsing's 
epydoc documentation online at http://packages.python.org/pyparsing/.

The pyparsing Wiki is at http://pyparsing.wikispaces.com.

-- Paul


Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or.  No separate code-generation or external
files are required.  Pyparsing can be used in many cases in place of
regular expressions, with shorter learning curve and greater
readability and maintainability.  Pyparsing comes with a number of
parsing examples, including:
- Hello, World! (English, Korean, Greek, and Spanish(new))
- chemical formulas
- Verilog parser
- Google protobuf parser
- time expression parser/evaluator
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
- Python value string parser (lists, dicts, tuples, with nesting)
  (safe alternative to eval)
- HTML tag stripper
- S-expression parser
- macro substitution preprocessor
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/

[ANN] pyparsing 2.0.1 released - compatible with Python 2.6 and later

2013-07-21 Thread Paul McGuire

In my releasing of Pyparsing 1.5.7/2.0.0 last November, I started to split 
supported Python versions: 2.x to the Pyparsing 1.5.x track, and 3.x to the 
Pyparsing 2.x track. Unfortunately, this caused a fair bit of pain for many 
current users of Python 2.6 and 2.7 (especially those using libs dependent on 
pyparsing), as the default installed pyparsing version using easy_install or 
pip would be the incompatible-to-them pyparsing 2.0.0.

I hope I have rectified (or at least improved) this situation with the latest 
release of pyparsing 2.0.1. Version 2.0.1 takes advantage of the 
cross-major-version compatibility that was planned into Python, wherein many of 
the new features of Python 3.x were made available in Python 2.6 and 2.7. By 
avoiding the one usage of ‘nonlocal’ (a Python 3.x feature not available in any 
Python 2.x release), I’ve been able to release pyparsing 2.0.1 in a form that 
will work for all those using Python 2.6 and later. (If you are stuck on 
version 2.5 or earlier of Python, then you still have to explicitly download 
the 1.5.7 version of pyparsing.)

This release also includes a bugfix to the new ‘=’ operator, so that ‘’ for 
attachment of parser definitions to Forward instances can be deprecated in 
favor of ‘=’.

Hopefully, most current users using pip and easy_install can now just install 
pyparsing 2.0.1, and it will be sufficiently version-aware to function under 
all Pythons 2.6 and later.

Thanks for your continued support and interest in pyparsing!

-- Paul McGuire

-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/

[ANN] pyparsing 2.0.1 released - compatible with Python 2.6 and later

2013-07-20 Thread Paul McGuire

In my releasing of Pyparsing 1.5.7/2.0.0 last November, I started to split 
supported Python versions: 2.x to the Pyparsing 1.5.x track, and 3.x to the 
Pyparsing 2.x track. Unfortunately, this caused a fair bit of pain for many 
current users of Python 2.6 and 2.7 (especially those using libs dependent on 
pyparsing), as the default installed pyparsing version using easy_install or 
pip would be the incompatible-to-them pyparsing 2.0.0.

I hope I have rectified (or at least improved) this situation with the latest 
release of pyparsing 2.0.1. Version 2.0.1 takes advantage of the 
cross-major-version compatibility that was planned into Python, wherein many of 
the new features of Python 3.x were made available in Python 2.6 and 2.7. By 
avoiding the one usage of ‘nonlocal’ (a Python 3.x feature not available in any 
Python 2.x release), I’ve been able to release pyparsing 2.0.1 in a form that 
will work for all those using Python 2.6 and later. (If you are stuck on 
version 2.5 or earlier of Python, then you still have to explicitly download 
the 1.5.7 version of pyparsing.)

This release also includes a bugfix to the new ‘=’ operator, so that ‘’ for 
attachment of parser definitions to Forward instances can be deprecated in 
favor of ‘=’.

Hopefully, most current users using pip and easy_install can now just install 
pyparsing 2.0.1, and it will be sufficiently version-aware to function under 
all Pythons 2.6 and later.

Thanks for your continued support and interest in pyparsing!

-- Paul McGuire

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Problem installing Pyparsing

2013-07-20 Thread Paul McGuire

Pyparsing 2.0.1 fixes this incompatibility, and should work with all versions 
of Python 2.6 and later.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

ANNOUNCE: pyparsing 1.5.7/2.0.0

2012-12-16 Thread Paul McGuire

With the release of version 2.0.0/1.5.7, pyparsing has now officially switched 
to Python 3.x support as its default installation environment. Python 2.x users 
can install the latest 1.5.7 release. (If you're using easy_install, do 
easy_install pyparsing==1.5.7.)

I'm taking this opportunity to do some minor API tweaking too, renaming some 
operators and method names that I got wrong earlier (the old operators and 
methods are still there for now for compatibility, but they are deprecated to 
be removed in a future release):

- Added new operator '=', which will eventually replace '' for 
  storing the contents of a Forward(). '=' does not have the same
  operator precedence problems that '' does.

- 'operatorPrecedence' is being renamed 'infixNotation' as a better
  description of what this helper function creates. 'operatorPrecedence'
  is deprecated, and will be dropped entirely in a future release.

Several bug-fixes are included, plus several new examples, *and* an awesome 
example submitted by Luca DellOlio, for parsing ANTLR grammar definitions and 
implementing them with pyparsing objects.

---
Pyparsing wiki: pyparsing.wikispaces.com
SVN checkout: 
  (latest) svn checkout 
https://pyparsing.svn.sourceforge.net/svnroot/pyparsing/trunk pyparsing
  (1.5.x branch) svn checkout 
https://pyparsing.svn.sourceforge.net/svnroot/pyparsing/branches/pyparsing_1.5.x
 pyparsing
  
-- 
http://mail.python.org/mailman/listinfo/python-list

ANN: pyparsing 1.5.6 released!

2011-07-01 Thread Paul McGuire

After about 10 months, there is a new release of pyparsing, version
1.5.6.  This release contains some small enhancements, some bugfixes,
and some new examples.

Most notably, this release includes the first public release of the
Verilog parser.  I have tired of restricting this parser for
commercial use, and so I am distributing it under the same license as
pyparsing, with the request that if you use it for commmercial use,
please make a commensurate donation to your local Red Cross.

Change summary:
---
- Cleanup of parse action normalizing code, to be more version-
tolerant,
  and robust in the face of future Python versions - much thanks to
  Raymond Hettinger for this rewrite!

- Removal of exception cacheing, addressing a memory leak condition
  in Python 3. Thanks to Michael Droettboom and the Cape Town PUG for
  their analysis and work on this problem!

- Fixed bug when using packrat parsing, where a previously parsed
  expression would duplicate subsequent tokens - reported by Frankie
  Ribery on stackoverflow, thanks!

- Added 'ungroup' helper method, to address token grouping done
  implicitly by And expressions, even if only one expression in the
  And actually returns any text - also inspired by stackoverflow
  discussion with Frankie Ribery!

- Fixed bug in srange, which accepted escaped hex characters of the
  form '\0x##', but should be '\x##'.  Both forms will be supported
  for backwards compatibility.

- Enhancement to countedArray, accepting an optional expression to be
  used for matching the leading integer count - proposed by Mathias on
  the pyparsing mailing list, good idea!

- Added the Verilog parser to the provided set of examples, under the
  MIT license.  While this frees up this parser for any use, if you
find
  yourself using it in a commercial purpose, please consider making a
  charitable donation as described in the parser's header.

- Added the excludeChars argument to the Word class, to simplify
defining
  a word composed of all characters in a large range except for one or
  two. Suggested by JesterEE on the pyparsing wiki.

- Added optional overlap parameter to scanString, to return
overlapping
  matches found in the source text.

- Updated oneOf internal regular expression generation, with improved
  parse time performance.

- Slight performance improvement in transformString, removing empty
  strings from the list of string fragments built while scanning the
  source text, before calling ''.join.  Especially useful when using
  transformString to strip out selected text.

- Enhanced form of using the expr('name') style of results naming,
  in lieu of calling setResultsName.  If name ends with an '*', then
  this is equivalent to
expr.setResultsName('name',listAllMatches=True).

- Fixed up internal list flattener to use iteration instead of
recursion,
  to avoid stack overflow when transforming large files.

- Added other new examples:
  . protobuf parser - parses Google's protobuf language
  . btpyparse - a BibTex parser contributed by Matthew Brett,
with test suite test_bibparse.py (thanks, Matthew!)
  . groupUsingListAllMatches.py - demo using trailing '*' for results
names


Download pyparsing 1.5.6 at http://sourceforge.net/projects/pyparsing/,
or use 'easy_install pyparsing'. You can also access pyparsing's
epydoc documentation online at http://packages.python.org/pyparsing/.

The pyparsing Wiki is at http://pyparsing.wikispaces.com.

-- Paul


Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or.  No separate code-generation or external
files are required.  Pyparsing can be used in many cases in place of
regular expressions, with shorter learning curve and greater
readability and maintainability.  Pyparsing comes with a number of
parsing examples, including:
- Hello, World! (English, Korean, Greek, and Spanish(new))
- chemical formulas
- Verilog parser
- Google protobuf parser
- time expression parser/evaluator
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
- Python value string parser (lists, dicts, tuples, with nesting)
  (safe alternative to eval)
- HTML tag stripper
- S-expression parser
- macro substitution preprocessor
-- 
http://mail.python.org/mailman/listinfo/python-list

[ANN]littletable 0.3 release

2010-10-24 Thread Paul McGuire

Announcing the 0.3 release of littletable (the module formerly known
as dulce).  The version includes (thanks to much help from Colin
McPhail, thanks Colin!):
- support for namedtuples as table objects
- Python 3 compatibility
- Table.pivot() to summarize record counts by 1 or 2 table attributes

littletable (formerly dulce) is a simple ORM-like wrapper for managing
collections of Python objects like relational tables.  No schema
definition is used; instead table columns are introspected from the
attributes of objects inserted into the table, and inferred from index
and query parameters.  Tables can be:
- indexed
- queried
- joined
- pivoted
- imported from/exported to .CSV files

Also, every query or join returns a new full-fledged littletable Table
- no distinction of Tables vs. DataSets vs. RecordSets vs. whatever.
So it is easy to build up a complex database analysis from a
succession of joins and queries.

littletable is a simple environment for experimenting with tables,
joins, and indexing, with a minimum of startup overhead.  You can
download littletable at http://sourceforge.net/projects/littletable/ -
htmldocs can be viewed at http://packages.python.org/littletable/.
-- 
http://mail.python.org/mailman/listinfo/python-list

ANN: dulce 0.1 - in-memory schema-less relational database

2010-10-17 Thread Paul McGuire

dulce is a syntactic sweet wrapper for managing collections of
Python objects like relational tables.  No schema definition is used;
instead table columns are introspected from the attributes of objects
inserted into the table, and inferred from index and query
parameters.  dulce's Tables can be:
- indexed
- queried
- joined
- imported from/exported to .CSV files

Also, every query or join returns a new full-fledged dulce Table - no
distinction of Tables vs. DataSets vs. RecordSets vs. whatever.  So it
is easy to build up a complex database analysis from a succession of
joins and queries.

dulce is a simple environment for experimenting with tables, joins,
and indexing, with a minimum of startup overhead.  You can download
dulce at http://sourceforge.net/projects/pythondulce/ - htmldocs can
be viewed at http://ptmcg.zapto.org/dulce/htmldoc/index.html.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: mutate dictionary or list

2010-09-08 Thread Paul McGuire

On Sep 7, 7:05 am, Baba raoul...@gmail.com wrote:
 Hi

 I am working on an exercise which requires me to write a funtion that
 will check if a given word can be found in a given dictionary (the
 hand).

 def is_valid_word(word, hand, word_list):
     
     Returns True if word is in the word_list and is entirely
     composed of letters in the hand. Otherwise, returns False.
     Does not mutate hand or word_list.

 I don't understand this part: Does not mutate hand or word_list


I would re-read your exercise description.  hand is *not* a
dictionary, but is most likely a list of individual letters.
word_list too is probably *not* a dictionary, but a list of valid
words (although this does bear a resemblance to what people in
everyday life call a dictionary).  Where did you get the idea that
there was a dictionary in this problem?

The Does not mutate hand or word_list. is a constraint that you are
not allowed to update the hand or word_list arguments.  For instance,
you must not call word_list.sort() in order to search for the given
word using some sort of binary search.  You must not determine if all
the letters in word come from hand by modifying the hand list (like
dropping letters from hand as they are found in word).  There are ways
to copy arguments if you use a destructive process on their contents,
so that the original stays unmodified - but that sounds like part of
the exercise for you to learn about.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Tag parsing in python

2010-08-29 Thread Paul McGuire

On Aug 28, 11:23 pm, Paul McGuire pt...@austin.rr.com wrote:
 On Aug 28, 11:14 am, agnibhu dee...@gmail.com wrote:





  Hi all,

  I'm a newbie in python. I'm trying to create a library for parsing
  certain keywords.
  For example say I've key words like abc: bcd: cde: like that... So the
  user may use like
  abc: How are you bcd: I'm fine cde: ok

  So I've to extract the How are you and I'm fine and ok..and
  assign them to abc:, bcd: and cde: respectively.. There may be
  combination of keyowords introduced in future. like abc: xy: How are
  you
  So new keywords qualifying the other keywords so on..

I got to thinking more about your keywords-qualifying-keywords
example, and I thought this would be a good way to support locale-
specific tags.  I also thought how one might want to have tags within
tags, to be substituted later, requiring a abc:: escaped form of
abc:, so that the tag is substituted with the value of tag abc: as
a late binding.

Wasn't too hard to modify what I posted yesterday, and now I rather
like it.

-- Paul


# tag_substitute.py

from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
OneOrMore,
empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
And, replaceWith)

tag = Combine(Word(alphas) + ~FollowedBy(::) + :)
tag_defn = Group(OneOrMore(tag))(tag) + empty + SkipTo(tag |
LineEnd())(body) + Optional(LineEnd().suppress())


# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
# unescape '::' and substitute any embedded tags
tag_value =
macro_substitution.transformString(tokens.body.replace(::,:))

# save this tag and value (or overwrite previous)
macros[tuple(tokens.tag)] = tag_value

# define overall macro substitution expression
macro_substitution  MatchFirst(
[(Literal(k[0]) if len(k)==1
else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
for k,v in macros.items()] ) + ~FollowedBy(tag)

# return empty string, so macro definitions don't show up in final
# expanded text
return 

tag_defn.setParseAction(make_macro_sub)

# define pattern for macro scanning
scan_pattern = macro_substitution | tag_defn


sorry = \
nm: Dave
sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
Hal said, sorry: en:
Hal dijo, sorry: es: 
print scan_pattern.transformString(sorry)

Prints:

Hal said, I'm sorry, Dave, I'm afraid I can't do that.
Hal dijo, Lo siento Dave, me temo que no puedo hacer eso.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Tag parsing in python

2010-08-28 Thread Paul McGuire

On Aug 28, 11:14 am, agnibhu dee...@gmail.com wrote:
 Hi all,

 I'm a newbie in python. I'm trying to create a library for parsing
 certain keywords.
 For example say I've key words like abc: bcd: cde: like that... So the
 user may use like
 abc: How are you bcd: I'm fine cde: ok

 So I've to extract the How are you and I'm fine and ok..and
 assign them to abc:, bcd: and cde: respectively.. There may be
 combination of keyowords introduced in future. like abc: xy: How are
 you
 So new keywords qualifying the other keywords so on..
 So I would like to know the python way of doing this. Is there any
 library already existing for making my work easier. ?

 ~
 Agnibhu

Here's how pyparsing can parse your keyword/tags:

from pyparsing import Combine, Word, alphas, Group, OneOrMore, empty,
SkipTo, LineEnd

text1 = abc: How are you bcd: I'm fine cde: ok
text2 = abc: xy: How are you

tag = Combine(Word(alphas)+:)
tag_defn = Group(OneOrMore(tag))(tag) + empty + SkipTo(tag |
LineEnd())(body)

for text in (text1,text2):
print text
for td in tag_defn.searchString(text):
print td.dump()
print

Prints:

abc: How are you bcd: I'm fine cde: ok
[['abc:'], 'How are you']
- body: How are you
- tag: ['abc:']
[['bcd:'], I'm fine]
- body: I'm fine
- tag: ['bcd:']
[['cde:'], 'ok']
- body: ok
- tag: ['cde:']

abc: xy: How are you
[['abc:', 'xy:'], 'How are you']
- body: How are you
- tag: ['abc:', 'xy:']



Now here's how to further use pyparsing to actually use those tags as
substitution macros:

from pyparsing import Forward, MatchFirst, Literal, And, replaceWith,
FollowedBy

# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
macros[tuple(tokens.tag)] = tokens.body

# define macro substitution
macro_substitution  MatchFirst(
[(Literal(k[0]) if len(k)==1
else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
for k,v in macros.items()] ) + ~FollowedBy(tag)

return 
tag_defn.setParseAction(make_macro_sub)

scan_pattern = macro_substitution | tag_defn

test_text = text1 + \nBob said, 'abc:?' I said, 'bcd:.' + text2 +
\nThen Bob said 'abc: xy:?'

print test_text
print scan_pattern.transformString(test_text)


Prints:

abc: How are you bcd: I'm fine cde: ok
Bob said, 'abc:?' I said, 'bcd:.'abc: xy: How are you
Then Bob said 'abc: xy:?'

Bob said, 'How are you?' I said, 'I'm fine.'
Then Bob said 'How are you?'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is '[' a function or an operator or an language feature?

2010-07-17 Thread Paul McGuire

On Jul 16, 12:01 pm, Peng Yu pengyu...@gmail.com wrote:
 I mean to get the man page for '[' like in the following code.

 x=[1,2,3]

 But help('[') doesn't seem to give the above usage.

 ###
 Mutable Sequence Types
 **

 List objects support additional operations that allow in-place
 modification of the object. Other mutable sequence types (when added
 to the language) should also support these operations. Strings and
 tuples are immutable sequence types: such objects cannot be modified
 once created. The following operations are defined on mutable sequence
 types (where *x* is an arbitrary object):
 ...
 ##

 I then checked help('LISTLITERALS'), which gives some description that
 is available from the language reference. So '[' in x=[1,2,3] is
 considered as a language feature rather than a function or an
 operator?

 
 List displays
 *

 A list display is a possibly empty series of expressions enclosed in
 square brackets:

    list_display        ::= [ [expression_list | list_comprehension] ]
    list_comprehension  ::= expression list_for
    list_for            ::= for target_list in old_expression_list
 [list_iter]
    old_expression_list ::= old_expression [(, old_expression)+ [,]]
    list_iter           ::= list_for | list_if
    list_if             ::= if old_expression [list_iter]
 .
 ###
 --
 Regards,
 Peng

Also look for __getitem__ and __setitem__, these methods defined on
your own container classes will allow you to write myobject['x'] and
have your own custom lookup code get run.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: nicer way to remove prefix of a string if it exists

2010-07-14 Thread Paul McGuire

On Jul 13, 6:49 pm, News123 news1...@free.fr wrote:
 I wondered about a potentially nicer way of removing a prefix of a
 string if it exists.


Here is an iterator solution:

from itertools import izip

def trim_prefix(prefix, s):
i1,i2 = iter(prefix),iter(s)
if all(c1==c2 for c1,c2 in izip(i1,i2)):
return ''.join(i2)
return s

print trim_prefix(ABC,ABCDEFGHI)
print trim_prefix(ABC,SLFJSLKFSLJFLSDF)


Prints:

DEFGHI
SLFJSLKFSLJFLSDF


-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: The real problem with Python 3 - no business case for conversion (was I strongly dislike Python 3)

2010-07-07 Thread Paul McGuire

On Jul 6, 3:30 am, David Cournapeau courn...@gmail.com wrote:
 On Tue, Jul 6, 2010 at 4:30 AM, D'Arcy J.M. Cain da...@druid.net wrote:

 One thing that would be very useful is how to maintain something that
 works on 2.x and 3.x, but not limiting yourself to 2.6. Giving up
 versions below 2.6 is out of the question for most projects with a
 significant userbase IMHO. As such, the idea of running the python 3
 warnings is not so useful IMHO - unless it could be made to work
 better for python 2.x  2.6, but I am not sure the idea even makes
 sense.

This is exactly how I felt about my support for pyparsing, that I was
trying continue to provide support for 2.3 users, up through 3.x
users, with a single code base.  (This would actually have been
possible if I had been willing to introduce a performance penalty for
Python 2 users, but performance is such a critical issue for parsing I
couldn't justify it to myself.)  This meant that I had to constrain my
implementation, while trying to incorporate forward-looking support
features (such as __bool__ and __dir__), which have no effect on older
Python versions, but support additions in newer Pythons.  I just
couldn't get through on the python-dev list that I couldn't just
upgrade my code to 2.6 and then use 2to3 to keep in step across the
2-3 chasm, as this would leave behind my faithful pre-2.6 users.

Here are some of the methods I used:

- No use of sets.  Instead I defined a very simple set simulation
using dict keys, which could be interchanged with set for later
versions.

- No generator expressions, only list comprehensions.

- No use of decorators.  BUT, pyparsing includes a decorator method,
traceParseAction, which can be used by users with later Pythons as
@traceParseAction in their own code.

- No print statements.  As pyparsing is intended to be an internal
module, it does no I/O as part of its function - it only processes a
given string, and returns a data structure.

- Python 2-3 compatible exception syntax.  This may have been my
trickiest step.  The change of syntax for except from

except ExceptionType, ex:

to:

except ExceptionType as ex:

is completely forward and backward incompatible.  The workaround is to
rewrite as:

except ExceptionType:
ex = sys.exc_info()[0]

which works just fine in 2.x and 3.x.  However, there is a slight
performance penalty in doing this, and pyparsing uses exceptions as
part of its grammar success/failure signalling and backtracking; I've
used this technique everywhere I can get away with it, but there is
one critical spot where I can't use it, so I have to keep 2 code bases
with slight differences between them.

- Implement __bool__, followed by __nonzero__ = __bool__.  This will
give you boolean support for your classes in 2.3-3.1.

- Implement __dir__, which is unused by old Pythons, but supports
customization of dir() output for your own classes.

- Implement __len__, __contains__, __iter__ and __reversed__ for
container classes.

- No ternary expressions.  Not too difficult really, there are several
well-known workarounds for this, either by careful use of and's and
or's, or using the bool-as-int to return the value from
(falseValue,trueValue)[condition].

- Define a version-sensitive portion of your module, to define
synonyms for constants that changed name between versions.  Something
like:

_PY3K = sys.version_info[0]  2
if _PY3K:
_MAX_INT = sys.maxsize
basestring = str
_str2dict = set
alphas = string.ascii_lowercase + string.ascii_uppercase
else:
_MAX_INT = sys.maxint
range = xrange
_str2dict = lambda strg : dict( [(c,0) for c in strg] )
alphas = string.lowercase + string.uppercase

The main body of my code uses range throughout (for example), and with
this definition I get the iterator behavior of xrange regardless of
Python version.


In the end I still have 2 source files, one for Py2 and one for Py3,
but there is only a small and manageable number of differences between
them, and I expect at some point I will move forward to supporting Py3
as my primary target version.  But personally I think this overall
Python 2-3 migration process is moving along at a decent rate, and I
should be able to make my switchover in another 12-18 months.  But in
the meantime, I am still able to support all versions of Python NOW,
and I plan to continue doing so (albeit support for 2.x versions
will eventually mean continue to offer a frozen feature set, with
minimal bug-fixing if any).

I realize that pyparsing is a simple-minded module in comparison to
others: it is pure Python, so it has no issues with C extensions; it
does no I/O, so print-as-statement vs. print-as-function is not an
issue; and it imports few other modules, so the ones it does have not
been dropped in Py3; and overall it is only a few thousand lines of
code.  But I just offer this post as a concrete data point in this
discussion.

-- Paul
--

Re: GAE + recursion limit

2010-07-02 Thread Paul McGuire

 Does anyone have any clue what that might be?
 Why the problem is on GAE (even when run locally), when command line
 run works just fine (even with recursion limit decreased)?

Can't explain why you see different behavior on GAE vs. local, but it
is unusual for a small translator to flirt with recursion limit.  I
don't usually see parsers come close to this with fewer than 40 or 50
sub-expressions.  You may have some left-recursion going on.  Can you
post your translator somewhere, perhaps on pastebin, or on the
pyparsing wiki Discussion page (pyparsing.wikispaces.com)?

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: automate minesweeper with python

2010-06-30 Thread Paul McGuire

On Jun 30, 6:39 pm, Jay jayk...@yahoo.com wrote:
 I would like to create a python script that plays the Windows game
 minesweeper.

 The python code logic and running minesweeper are not problems.
 However, seeing the 1-8 in the minesweeper map and clicking on
 squares is. I have no idea how to proceed.

You can interact with a Windows application using pywinauto (http://
pywinauto.openqa.org/).

Sounds like a fun little project - good luck!

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

[ANN] pyparsing 1.5.3 released

2010-06-25 Thread Paul McGuire

I'm happy to announce that a new release of pyparsing is now
available,
version 1.5.3.  It has been almost a year and a half since 1.5.2 was
released, but pyparsing has remained pretty stable.

I believe I have cleaned up the botch-job I made in version 1.5.2 of
trying to support both Python 2.x and Python 3.x.  This new release
will handle it by:
- providing version-specific binary installers for Windows users
- use version-adaptive code in the source distribution to use the
  correct version of pyparsing.py for the current Python distribution

This release also includes a number of small bug-fixes, plus some
very
interesting new examples.


Here is the high-level summary of what's new in pyparsing 1.5.3:

- === NOTE:  API CHANGE!!! ===
  With this release, and henceforward, the pyparsing module is
  imported as pyparsing on both Python 2.x and Python 3.x versions.

- Fixed up setup.py to auto-detect Python version and install the
  correct version of pyparsing - suggested by Alex Martelli,
  thanks, Alex! (and my apologies to all those who struggled with
  those spurious installation errors caused by my earlier
  fumblings!)

- Fixed bug on Python3 when using parseFile, getting bytes instead of
  a str from the input file.

- Fixed subtle bug in originalTextFor, if followed by
  significant whitespace (like a newline) - discovered by
  Francis Vidal, thanks!

- Fixed very sneaky bug in Each, in which Optional elements were
  not completely recognized as optional - found by Tal Weiss, thanks
  for your patience.

- Fixed off-by-1 bug in line() method when the first line of the
  input text was an empty line. Thanks to John Krukoff for submitting
  a patch!

- Fixed bug in transformString if grammar contains Group expressions,
  thanks to patch submitted by barnabas79, nice work!

- Fixed bug in originalTextFor in which trailing comments or
otherwised
  ignored text got slurped in with the matched expression.  Thanks to
  michael_ramirez44 on the pyparsing wiki for reporting this just in
  time to get into this release!

- Added better support for summing ParseResults, see the new example,
  parseResultsSumExample.py.

- Added support for composing a Regex using a compiled RE object;
  thanks to my new colleague, Mike Thornton!

- In version 1.5.2, I changed the way exceptions are raised in order
  to simplify the stacktraces reported during parsing.  An anonymous
  user posted a bug report on SF that this behavior makes it difficult
  to debug some complex parsers, or parsers nested within parsers. In
  this release I've added a class attribute
ParserElement.verbose_stacktrace,
  with a default value of False. If you set this to True, pyparsing
will
  report stacktraces using the pre-1.5.2 behavior.

- Some interesting new examples, including a number of parsers related
  to parsing C source code:

  . pymicko.py, a MicroC compiler submitted by Zarko Zivanov.
(Note: this example is separately licensed under the GPLv3,
and requires Python 2.6 or higher.)  Thank you, Zarko!

  . oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C
Contest.

  . select_parser.py, a parser for reading SQLite SELECT statements,
as specified at http://www.sqlite.org/lang_select.html; this goes
into much more detail than the simple SQL parser included in
pyparsing's
source code

  . stateMachine2.py, a modified version of stateMachine.py submitted
by Matt Anderson, that is compatible with Python versions 2.7 and
above - thanks so much, Matt!

  . excelExpr.py, a *simplistic* first-cut at a parser for Excel
expressions, which I originally posted on comp.lang.python in
January,
2010; beware, this parser omits many common Excel cases (addition
of
numbers represented as strings, references to named ranges)

  . cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on
comp.lang.python in August, 2009 (redistributed here with Mark's
permission).  Thanks a bunch, Mark!

  . partial_gene_match.py, a sample I posted to Stackoverflow.com,
implementing a special variation on Literal that does close
matching,
up to a given number of allowed mismatches.  The application was
to
find matching gene sequences, with allowance for one or two
mismatches.

  . tagCapture.py, a sample showing how to use a Forward placeholder
to
enforce matching of text parsed in a previous expression.

  . matchPreviousDemo.py, simple demo showing how the
matchPreviousLiteral
helper method is used to match a previously parsed token.


Download pyparsing 1.5.3 at http://sourceforge.net/projects/pyparsing/.
You
can also access pyparsing's epydoc documentation online at
http://packages.python.org/pyparsing/.

The pyparsing Wiki is at http://pyparsing.wikispaces.com.

-- Paul


Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the

[ANN] pyparsing 1.5.3 released

2010-06-24 Thread Paul McGuire

I'm happy to announce that a new release of pyparsing is now
available,
version 1.5.3.  It has been almost a year and a half since 1.5.2 was
released, but pyparsing has remained pretty stable.

I believe I have cleaned up the botch-job I made in version 1.5.2 of
trying to support both Python 2.x and Python 3.x.  This new release
will handle it by:
- providing version-specific binary installers for Windows users
- use version-adaptive code in the source distribution to use the
  correct version of pyparsing.py for the current Python distribution

This release also includes a number of small bug-fixes, plus some
very
interesting new examples.


Here is the high-level summary of what's new in pyparsing 1.5.3:

- === NOTE:  API CHANGE!!! ===
  With this release, and henceforward, the pyparsing module is
  imported as pyparsing on both Python 2.x and Python 3.x versions.

- Fixed up setup.py to auto-detect Python version and install the
  correct version of pyparsing - suggested by Alex Martelli,
  thanks, Alex! (and my apologies to all those who struggled with
  those spurious installation errors caused by my earlier
  fumblings!)

- Fixed bug on Python3 when using parseFile, getting bytes instead of
  a str from the input file.

- Fixed subtle bug in originalTextFor, if followed by
  significant whitespace (like a newline) - discovered by
  Francis Vidal, thanks!

- Fixed very sneaky bug in Each, in which Optional elements were
  not completely recognized as optional - found by Tal Weiss, thanks
  for your patience.

- Fixed off-by-1 bug in line() method when the first line of the
  input text was an empty line. Thanks to John Krukoff for submitting
  a patch!

- Fixed bug in transformString if grammar contains Group expressions,
  thanks to patch submitted by barnabas79, nice work!

- Fixed bug in originalTextFor in which trailing comments or
otherwised
  ignored text got slurped in with the matched expression.  Thanks to
  michael_ramirez44 on the pyparsing wiki for reporting this just in
  time to get into this release!

- Added better support for summing ParseResults, see the new example,
  parseResultsSumExample.py.

- Added support for composing a Regex using a compiled RE object;
  thanks to my new colleague, Mike Thornton!

- In version 1.5.2, I changed the way exceptions are raised in order
  to simplify the stacktraces reported during parsing.  An anonymous
  user posted a bug report on SF that this behavior makes it difficult
  to debug some complex parsers, or parsers nested within parsers. In
  this release I've added a class attribute
ParserElement.verbose_stacktrace,
  with a default value of False. If you set this to True, pyparsing
will
  report stacktraces using the pre-1.5.2 behavior.

- Some interesting new examples, including a number of parsers related
  to parsing C source code:

  . pymicko.py, a MicroC compiler submitted by Zarko Zivanov.
(Note: this example is separately licensed under the GPLv3,
and requires Python 2.6 or higher.)  Thank you, Zarko!

  . oc.py, a subset C parser, using the BNF from the 1996 Obfuscated C
Contest.

  . select_parser.py, a parser for reading SQLite SELECT statements,
as specified at http://www.sqlite.org/lang_select.html; this goes
into much more detail than the simple SQL parser included in
pyparsing's
source code

  . stateMachine2.py, a modified version of stateMachine.py submitted
by Matt Anderson, that is compatible with Python versions 2.7 and
above - thanks so much, Matt!

  . excelExpr.py, a *simplistic* first-cut at a parser for Excel
expressions, which I originally posted on comp.lang.python in
January,
2010; beware, this parser omits many common Excel cases (addition
of
numbers represented as strings, references to named ranges)

  . cpp_enum_parser.py, a nice little parser posted my Mark Tolonen on
comp.lang.python in August, 2009 (redistributed here with Mark's
permission).  Thanks a bunch, Mark!

  . partial_gene_match.py, a sample I posted to Stackoverflow.com,
implementing a special variation on Literal that does close
matching,
up to a given number of allowed mismatches.  The application was
to
find matching gene sequences, with allowance for one or two
mismatches.

  . tagCapture.py, a sample showing how to use a Forward placeholder
to
enforce matching of text parsed in a previous expression.

  . matchPreviousDemo.py, simple demo showing how the
matchPreviousLiteral
helper method is used to match a previously parsed token.


Download pyparsing 1.5.3 at http://sourceforge.net/projects/pyparsing/.
You
can also access pyparsing's epydoc documentation online at
http://packages.python.org/pyparsing/.

The pyparsing Wiki is at http://pyparsing.wikispaces.com.

-- Paul


Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the

Need some Python 3 help

2010-05-25 Thread Paul McGuire

I was teetering on the brink of releasing Pyparsing 1.5.3 (with some
nice new examples and goodies), when I saw that I had recently
introduced a bug in the Python 3 compatible version.  Here is the
stacktrace as reported on SF:

Traceback (most recent call last):
File testcase.py, line 11, in module
result = exp.parseFile(./pyparsing_py3.py)
File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 1426,
in parseFile
return self.parseString(file_contents, parseAll)
File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 1068,
in parseString
loc, tokens = self._parse( instring, 0 )
File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 935, in
_parseNoCache
preloc = self.preParse( instring, loc )
File /data/projekte/parsing/pyparsing/pyparsing_py3.py, line 893, in
preParse
while loc  instrlen and instring[loc] in wt:
TypeError: 'in string' requires string as left operand, not int

In this section of code, instring is a string, loc is an int, and wt
is a string.  Any clues why instring[loc] would be evaluating as int?
(I am unfortunately dependent on the kindness of strangers when it
comes to testing my Python 3 code, as I don't have a Py3 environment
installed.)

Thanks,
-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need some Python 3 help

2010-05-25 Thread Paul McGuire

On May 25, 8:58 pm, Benjamin Peterson benja...@python.org wrote:
 Paul McGuire ptmcg at austin.rr.com writes:

  In this section of code, instring is a string, loc is an int, and wt
  is a string.  Any clues why instring[loc] would be evaluating as int?
  (I am unfortunately dependent on the kindness of strangers when it
  comes to testing my Python 3 code, as I don't have a Py3 environment
  installed.)

 Indexing bytes in Python 3 gives an integer.

Hrmm, I had a sneaking hunch this might be the issue.  But then I
don't know how this code *ever* worked in Python 3, as it is chock
full of indexed references into the string being parsed.  And yet,
I've had other folks test and confirm that pyparsing_py3 *does* work
on Python 3.  It is a puzzle.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

condition and True or False

2010-05-02 Thread Paul McGuire

While sifting through some code looking for old x and y or z code
that might better be coded using y if x else z, I came across this
puzzler:

x = boolean expression and True or False

What is and True or False adding to this picture?  The boolean
expression part is already evaluating to a boolean, so I don't
understand why a code author would feel compelled to beat this one
over the head with the additional and True or False.

I did a little code Googling and found a few other Python instances of
this, but also many Lua instances.  I'm not that familiar with Lua, is
this a practice that one who uses Lua frequently might carry over to
Python, not realizing that the added and True or False is redundant?

Other theories?

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Usable street address parser in Python?

2010-04-19 Thread Paul McGuire

On Apr 17, 2:23 pm, John Nagle na...@animats.com wrote:
    Is there a usable street address parser available?  There are some
 bad ones out there, but nothing good that I've found other than commercial
 products with large databases.  I don't need 100% accuracy, but I'd like
 to be able to extract street name and street number for at least 98% of
 US mailing addresses.

    There's pyparsing, of course. There's a street address parser as an
 example at http://pyparsing.wikispaces.com/file/view/streetAddressParser.py;.
 It's not very good.  It gets all of the following wrong:

         1500 Deer Creek Lane    (Parses Creek as a street type)
         186 Avenue A            (NYC street)
         2081 N Webb Rd          (Parses N Webb as a street name)
         2081 N. Webb Rd         (Parses N as street name)
         1515 West 22nd Street   (Parses West as name)
         2029 Stierlin Court     (Street names starting with St misparse.)

 Some special cases that don't work, unsurprisingly.
         P.O. Box 33170
         The Landmark @ One Market, Suite 200
         One Market, Suite 200
         One Market


Please take a look at the updated form of this parser.  It turns out
there actually *were* some bugs in the old form, plus there was no
provision for PO Boxes, avenues that start with Avenue instead of
ending with them, or house numbers spelled out as words.  The only one
I consider a special case is the support for Avenue X instead of
X Avenue - adding support for the rest was added in a fairly general
way.  With these bug fixes, I hope this improves your hit rate. (There
are also some simple attempts at adding apt/suite numbers, and APO and
AFP in addition to PO boxes - if not exactly what you need, the means
to extend to support other options should be pretty straightforward.)

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Confused by slash/escape in regexp

2010-04-11 Thread Paul McGuire

On Apr 11, 5:43 pm, andrew cooke and...@acooke.org wrote:
 Is the third case here surprising to anyone else?  It doesn't make
 sense to me...

 Python 2.6.2 (r262:71600, Oct 24 2009, 03:15:21)
 [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
 Type help, copyright, credits or license for more information. 
 from re import compile
  p1 = compile('a\x62c')
  p1.match('abc')

 _sre.SRE_Match object at 0x7f4e8f93d578 p2 = compile('a\\x62c')
  p2.match('abc')

 _sre.SRE_Match object at 0x7f4e8f93d920

  p3 = compile('a\\\x62c')
  p3.match('a\\bc')
  p3.match('abc')
  p3.match('a\\\x62c')

 Curious/confused,
 Andrew

Here is your same session, but using raw string literals:

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32
Type help, copyright, credits or license for more information.
 from re import compile
 p1 = compile(r'a\x62c')
 p1.match(r'abc')
_sre.SRE_Match object at 0x00A04BB8
 p2 = compile(r'a\\x62c')
 p2.match(r'abc')
 p3 = compile(r'a\\\x62c')
 p3.match(r'a\\bc')
 p3.match(r'abc')
 p3.match(r'a\\\x62c')


So I would say the surprise isn't that case 3 didn't match, but that
case 2 matched.

Unless I just don't get what you were testing, not being an RE wiz.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Tough sorting problem: or, I'm confusing myself

2010-04-11 Thread Paul McGuire

On Apr 9, 10:03 am, david jensen dmj@gmail.com wrote:
 Hi all,

 I'm trying to find a good way of doing the following:

 Each n-tuple in combinations( range( 2 ** m ), n ) has a corresponding
 value n-tuple (call them scores for clarity later). I'm currently
 storing them in a dictionary, by doing:

 
 res={}
 for i in itertools.combinations( range( 2**m ) , n):
     res[ i ] = getValues( i )    # getValues() is computationally
 expensive
 

 For each (n-1)-tuple, I need to find the two numbers that have the
 highest scores versus them. I know this isn't crystal clear, but
 hopefully an example will help: with m=n=3:

 Looking at only the (1, 3) case, assuming:
 getValues( (1, 2, 3) ) == ( -200, 125, 75 )    # this contains the
 highest other score, where 2 scores 125
 getValues( (1, 3, 4) ) == ( 50, -50, 0 )
 getValues( (1, 3, 5) ) == ( 25, 300, -325 )
 getValues( (1, 3, 6) ) == ( -100, 0, 100 )    # this contains the
 second-highest, where 6 scores 100
 getValues( (1, 3, 7) ) == ( 80, -90, 10  )
 getValues( (1, 3, 8) ) == ( 10, -5, -5 )

 I'd like to return ( (2, 125), (6, 100) ).

 The most obvious (to me) way to do this would be not to generate the
 res dictionary at the beginning, but just to go through each
 combinations( range( 2**m), n-1) and try every possibility... this
 will test each combination n times, however, and generating those
 values is expensive. [e.g. (1,2,3)'s scores will be generated when
 finding the best possibilities for (1,2), (1,3) and (2,3)]


Add a memoizing decorator to getValues, so that repeated calls will do
lookups instead of repeated calculations.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python and Regular Expressions

2010-04-10 Thread Paul McGuire

On Apr 10, 8:38 pm, Paul Rubin no.em...@nospam.invalid wrote:
 The impression that I have (from a distance) is that Pyparsing is a good
 interface abstraction with a kludgy and slow implementation.  That the
 implementation uses regexps just goes to show how kludgy it is.  One
 hopes that someday there will be a more serious implementation, perhaps
 using llvm-py (I wonder whatever happened to that project, by the way)
 so that your parser script will compile to executable machine code on
 the fly.

I am definitely flattered that pyparsing stirs up so much interest,
and among such a distinguished group. But I have to take some umbrage
at Paul Rubin's left-handed compliment,  Pyparsing is a good
interface abstraction with a kludgy and slow implementation,
especially since he forms his opinions from a distance.

I actually *did* put some thought into what I wanted in pyparsing
before designing it, and this forms this chapter of Getting Started
with Pyparsing (available here as a free online excerpt:
http://my.safaribooksonline.com/9780596514235/what_makes_pyparsing_so_special#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODA1OTY1MTQyMzUvMTYmaW1hZ2VwYWdlPTE2),
the Zen of Pyparsing as it were. My goals were:

- build parsers using explicit constructs (such as words, groups,
repetition, alternatives), vs. expression encoding using specialized
character sequences, as found in regexen

- easy parser construction from primitive elements to complex groups
and alternatives, using Python's operator overloading for ease of
direct implementation of parsers using ordinary Python syntax; include
mechanisms for defining recursive parser expressions

- implicit skipping of whitespace between parser elements

- results returned not just as a list of strings, but as a rich data
object, with access to parsed fields by name or by list index, taking
interfaces from both dicts and lists for natural adoption into common
Python idioms

- no separate code-generation steps, a la lex/yacc

- support for parse-time callbacks, for specialized token handling,
conversion, and/or construction of data structures

- 100% pure Python, to be runnable on any platform that supports
Python

- liberal licensing, to permit easy adoption into any user's projects
anywhere

So raw performance really didn't even make my short-list, beyond the
obvious should be tolerably fast enough.

I have found myself reading posts on c.l.py with wording like I'm
trying to parse blah-blah and I've been trying for hours/days to get
this regex working.  For kicks, I'd spend 5-15 minutes working up a
working pyparsing solution, which *does* run comparatively slowly,
perhaps taking a few minutes to process the poster's data file.  But
the net solution is developed and running in under 1/2 an hour, which
to me seems like an overall gain compared to hours of fruitless
struggling with backslashes and regex character sequences.  On top of
which, the pyparsing solutions are still readable when I come back to
them weeks or months later, instead of staring at some line-noise
regex and just scratch my head wondering what it was for.  And
sometimes comparatively slowly means that it runs 50x slower than a
compiled method that runs in 0.02 seconds - that's still getting the
job done in just 1 second.

And is the internal use of regexes with pyparsing really a kludge?
Why? They are almost completely hidden from the parser developer. And
yet by using compiled regexes, I retain the portability of 100% Python
while leveraging the compiled speed of the re engine.

It does seem that there have been many posts of late (either on c.l.py
or the related posts on Stackoverflow) where the OP is trying to
either scrape content from HTML, or parse some type of recursive
expression.  HTML scrapers implemented using re's are terribly
fragile, since HTML in the wild often contains little surprises
(unexpected whitespace; upper/lower case inconsistencies; tag
attributes in unpredictable order; attribute values with double,
single, or no quotation marks) which completely frustrate any re-based
approach.  Granted, there are times when an re-parsing-of-HTML
endeavor *isn't* futile or doomed from the start - the OP may be
working with a very restricted set of HTML, generated from some other
script so that the output is very consistent. Unfortunately, this
poster usually gets thrown under the same you'll never be able to
parse HTML with re's bus. I can't explain the surge in these posts,
other than to wonder if we aren't just seeing a skewed sample - that
is, the many cases where people *are* successfully using re's to solve
their text extraction problems aren't getting posted to c.l.py, since
no one posts questions they already have the answers to.

So don't be too dismissive of pyparsing, Mr. Rubin. I've gotten many e-
mails, wiki, and forum posts from Python users at all levels of the
expertise scale, saying that pyparsing has helped them to be very
productive in one or another aspect of creating a

Re: Dynamically growing an array to implement a stack

2010-04-08 Thread Paul McGuire

On Apr 8, 3:21 pm, M. Hamed mhels...@hotmail.com wrote:
 I have trouble with some Python concept. The fact that you can not
 assign to a non-existent index in an array. For example:

 a = [0,1]
 a[2] = Generates an error

 I can use a.append(2) but that always appends to the end. Sometimes I
 want to use this array as a stack and hence my indexing logic would be
 something like:

 If you are already at the end (based on your stack pointer):
       use append() then index (and inc your pointer)
 if not:
       index directly (and inc your stack pointer)

??? The stack pointer is *always* at the end, except don't actually
keep a real stack pointer, let list do it for you.  Call append to
push a value onto the end, and pop to pull it off.  Or if you are
really stuck on push/pop commands for a stack, do this:

 class stack(list):
...   push = list.append
...
 ss = stack()
 ss.push(x)
 ss.push(Y)
 ss
['x', 'Y']
 ss.pop()
'Y'
 ss.pop()
'x'
 ss.pop()
Traceback (most recent call last):
  File stdin, line 1, in module
IndexError: pop from empty list

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Recommend Commercial graphing library

2010-04-06 Thread Paul McGuire

On Apr 6, 11:05 am, AlienBaby matt.j.war...@gmail.com wrote:
 The requirement for a commercial license comes down to being
 restricted to not using any open source code. If it's an open source
 license it can't be used in our context.

You may be misunderstanding this issue, I think you are equating open
source with GPL, which is the open source license that requires
applications that use it to also open their source.  There are many
other open source licenses, such as Berkeley, MIT, and LGPL, that are
more permissive in what they allow, up to and in some cases including
full inclusion within a closed-source commercial product.  You might
also contact the supplier of the open source code you are interested,
and perhaps pay a modest fee to obtain a commercial license.

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: s-expression parser in python

2010-04-06 Thread Paul McGuire

On Apr 6, 7:02 pm, James Stroud nospamjstroudmap...@mbi.ucla.edu
wrote:
 Hello All,

 I want to use an s-expression based configuration file format for a
 python program I'm writing. Does anyone have a favorite parser?


The pyparsing wiki includes this parser on its Examples page:
http://pyparsing.wikispaces.com/file/view/sexpParser.py.  This parser
is also described in more detail in the pyparsing e-book from
O'Reilly.

This parser is based on the BNF defined here:http://
people.csail.mit.edu/rivest/Sexp.txt.  I should think Ron Rivest would
be the final authority on S-expression syntax, but this BNF omits '!',
'', and '' as valid punctuation characters, and does not support
free-standing floats and ints as tokens.

Still, you can extend the pyparsing parser (such is the goal of
pyparsing, to make these kinds of extensions easy, as the source
material or BNF or requirements change out from underneath you) by
inserting these changes:

real = Regex(r[+-]?\d+\.\d*([eE][+-]?\d+)?).setParseAction(lambda
tokens: float(tokens[0]))
token = Word(alphanums + -./_:*+=!)
simpleString = real | decimal | raw | token | base64_ | hexadecimal |
qString

And voila!  Your test string parses as:

[['and',
  ['or', ['', 'uid', 1000], ['!=', 'gid', 20]],
  ['', 'quota', 5000.0]]]

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: vars().has_key() question about how working .

2010-04-04 Thread Paul McGuire

On Apr 4, 3:42 am, catalinf...@gmail.com catalinf...@gmail.com
wrote:
 Hi everyone .
 My questions is why vars().has_key('b') is False ?'
 I expecting to see True because is a variable ...
 Thanks

Yes, 'b' is a var, but only within the scope of something().  See how
this is different:

 def sth():
...   b = 25
...   print 'b' in vars()
...
 sth()
True

(Also, has_key() is the old-style way to test for key existence in a
dict, and is kept around for compatibility with old code, but the
preferred method now is to use 'in'.)

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: C-style static variables in Python?

2010-04-02 Thread Paul McGuire

On Apr 1, 5:34 pm, kj no.em...@please.post wrote:
 When coding C I have often found static local variables useful for
 doing once-only run-time initializations.  For example:


Here is a decorator to make a function self-aware, giving it a this
variable that points to itself, which you could then initialize from
outside with static flags or values:

from functools import wraps

def self_aware(fn):
@wraps(fn)
def fn_(*args):
return fn(*args)
fn_.__globals__[this] = fn_
return fn_

@self_aware
def foo():
this.counter += 1
print this.counter

foo.counter = 0

foo()
foo()
foo()


Prints:

1
2
3

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Generating text from a regular expression

2010-03-31 Thread Paul McGuire

On Mar 31, 5:49 am, Nathan Harmston iwanttobeabad...@googlemail.com
wrote:
 Hi everyone,

 I have a slightly complicated/medium sized regular expression and I
 want to generate all possible words that it can match (to compare
 performance of regex against an acora based matcher).

The pyparsing wiki Examples page includes this regex inverter:
http://pyparsing.wikispaces.com/file/view/invRegex.py

From the module header:
# Supports:
# - {n} and {m,n} repetition, but not unbounded + or * repetition
# - ? optional elements
# - [] character ranges
# - () grouping
# - | alternation

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: file seek is slow

2010-03-09 Thread Paul McGuire

This is a pretty tight loop:

for i in xrange(100):
   f1.seek(0)

But there is still a lot going on, some of which you can lift out of
the loop.  The easiest I can think of is the lookup of the 'seek'
attribute on the f1 object.  Try this:

f1_seek = f1.seek
for i in xrange(100):
   f1_seek(0)

How does that help your timing?

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Problem with regular expression

2010-03-07 Thread Paul McGuire

On Mar 7, 4:32 am, Joan Miller pelok...@gmail.com wrote:
 I would to convert the first string to upper case. But this regular
 expression is not matching the first string between quotes.

Is using pyparsing overkill?  Probably.  But your time is valuable,
and pyparsing let's you knock this out in less time than it probably
took to write your original post.


Use pyparsing's pre-defined expression sglQuotedString to match your
entry key in quotes:

key = sglQuotedString

Add a parse action to convert to uppercase:

key.setParseAction(lambda tokens:tokens[0].upper())

Now define the rest of your entry value (be sure to add the negative
lookahead so we *don't* match your foo entry):

entry = key + : + ~Literal({)

If I put your original test cases into a single string named 'data', I
can now use transformString to convert all of your keys that don't
point to '{'ed values:

print entry.transformString(data)

Giving me:

# string to non-matching
'foo': {

# strings to matching
'BAR': 'bar2'
'BAR': None
'BAR': 0
'BAR': True

Here's the whole script:

from pyparsing import sglQuotedString, Literal

key = sglQuotedString
key.setParseAction(lambda tokens:tokens[0].upper())
entry = key + : + ~Literal({)

print entry.transformString(data)

And I'll bet that if you come back to this code in 3 months, you'll
still be able to figure out what it does!

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to efficiently extract information from structured text file

2010-02-18 Thread Paul McGuire

On Feb 17, 7:38 pm, Steven D'Aprano
ste...@remove.this.cybersource.com.au wrote:
 On Wed, 17 Feb 2010 17:13:23 -0800, Jonathan Gardner wrote:
  And once you realize that every program is really a compiler, then you
  have truly mastered the Zen of Programming in Any Programming Language
  That Will Ever Exist.

 In the same way that every tool is really a screwdriver.

 --
 Steven

The way I learned this was:
- Use the right tool for the right job.
- Every tool is a hammer.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to efficiently extract information from structured text file

2010-02-17 Thread Paul McGuire

On Feb 16, 5:48 pm, Imaginationworks xiaju...@gmail.com wrote:
 Hi,

 I am trying to read object information from a text file (approx.
 30,000 lines) with the following format, each line corresponds to a
 line in the text file.  Currently, the whole file was read into a
 string list using readlines(), then use for loop to search the = {
 and }; to determine the Object, SubObject,and SubSubObject.

If you open(filename).read() this file into a variable named data, the
following pyparsing parser will pick out your nested brace
expressions:

from pyparsing import *

EQ,LBRACE,RBRACE,SEMI = map(Suppress,={};)
ident = Word(alphas, alphanums)
contents = Forward()
defn = Group(ident + EQ + Group(LBRACE + contents + RBRACE + SEMI))

contents  ZeroOrMore(defn | ~(LBRACE|RBRACE) + Word(printables))

results = defn.parseString(data)

print results

Prints:

[
 ['Object1',
   ['...',
['SubObject1',
  ['',
['SubSubObject1',
  ['...']
]
  ]
],
['SubObject2',
  ['',
   ['SubSubObject21',
 ['...']
   ]
  ]
],
['SubObjectN',
  ['',
   ['SubSubObjectN',
 ['...']
   ]
  ]
]
   ]
 ]
]

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: pyparsing wrong output

2010-02-12 Thread Paul McGuire

On Feb 12, 6:41 pm, Gabriel Genellina gagsl-...@yahoo.com.ar
wrote:
 En Fri, 12 Feb 2010 10:41:40 -0300, Eknath Venkataramani  
 eknath.i...@gmail.com escribió:

  I am trying to write a parser in pyparsing.
  Help Me.http://paste.pocoo.org/show/177078/is the code and this is  
  input
  file:http://paste.pocoo.org/show/177076/.
  I get output as:
  generator object at 0xb723b80c

 There is nothing wrong with pyparsing here. scanString() returns a  
 generator, like this:

 py g = (x for x in range(20) if x % 3 == 1)
 py g
 generator object genexpr at 0x00E50D78

Unfortunately, your grammar doesn't match the input text, so your
generator doesn't return anything.

I think you are taking sort of brute force approach to this problem,
and you need to think a little more abstractly.  You can't just pick a
fragment and then write an expression for it, and then the next and
then stitch them together - well you *can* but it helps to think both
abstract and concrete at the same time.

With the exception of your one key of \', this is a pretty basic
recursive grammar.  Recursive grammars are a little complicated to
start with, so I'll start with a non-recursive part.  And I'll work
more bottom-up or inside-out.

Let's start by looking at these items:

count = 8,
baajaar = 0.87628353,
kiraae = 0.02341598,
lii = 0.02178813,
adr = 0.01978462,
gyiimn = 0.01765590,

Each item has a name (which you called eng, so I'll keep that
expression), a '=' and *something*.  In the end, we won't really care
about the '=' strings, they aren't really part of the keys or the
associated values, they are just delimiting strings - they are
important during parsing, but afterwards we don't really care about
them.  So we'll start with a pyparsing expression for this:

keyval = eng + Suppress('=') + something

Sometimes, the something is an integer, sometimes it's a floating
point number.  I'll define some more generic forms for these than your
original number, and a separate expression for a real number:

integer = Combine(Optional('-') + Word(nums))
realnum = Combine(Optional('-') + Word(nums) + '.' + Word(nums))

When we parse for these two, we need to be careful to check for a
realnum before an integer, so that we don't accidentally parse the
leading of 3.1415 as the integer 3.

something = realnum | integer

So now we can parse this fragment using a delimitedList expression
(which takes care of the intervening commas, and also suppresses them
from the results:

filedata = 
count = 8,
baajaar = 0.87628353,
kiraae = 0.02341598,
lii = 0.02178813,
adr = 0.01978462,
gyiimn = 0.01765590,
print delimitedList(keyval).parseString(filedata)

Gives:
['count', '8', 'baajaar', '0.87628353', 'kiraae', '0.02341598',
 'lii', '0.02178813', 'adr', '0.01978462', 'gyiimn', '0.01765590']

Right off the bat, we see that we want a little more structure to
these results, so that the keys and values are grouped naturally by
the parser.  The easy way to do this is with Group, as in:

keyval = Group(eng + Suppress('=') + something)

With this one change, we now get:

[['count', '8'], ['baajaar', '0.87628353'],
 ['kiraae', '0.02341598'], ['lii', '0.02178813'],
 ['adr', '0.01978462'], ['gyiimn', '0.01765590']]

Now we need to add the recursive part of your grammar.  A nested input
looks like:

confident = {
  count = 4,
  trans = {
ashhvsht = 0.75100505,
phraarmnbh = 0.08341708,
},
},

So in addition to integers and reals, our something could also be a
nested list of keyvals:

something = realnum | integer | (lparen + delimitedList(keyval) +
rparen)

This is *almost* right, with just a couple of tweaks:
- the list of keyvals may have a comma after the last item before the
closing '}'
- we really want to suppress the opening and closing braces (lparen
and rparen)
- for similar structure reasons, we'll enclose the list of keyvals in
a Group to retain the data hierarchy

lparen,rparen = map(Suppress, {})
something = realnum | integer |
Group(lparen + delimitedList(keyval) + Optional(',') + rparen)

The recursive problem is that we have defined keyval using something,
and something using keyval.  You can't do that in Python.  So we use
the pyparsing class Forward to forward declare something:

something = Forward()
keyval = Group(eng + Suppress('=') + something)

To define something as a Forward, we use the '' shift operator:

something  (realnum | integer |
Group(lparen + delimitedList(keyval) + Optional(',') +
rparen))

Our grammar now looks like:

lparen,rparen = map(Suppress, {})

something = Forward()
keyval = Group(eng + Suppress('=') + something)
something  (realnum | integer |
Group(lparen + delimitedList(keyval) + Optional(',') +
rparen))

To parse your entire input file, use a delimitedList(keyval)

results =

Re: How to measure elapsed time under Windows?

2010-02-10 Thread Paul McGuire

On Feb 10, 2:24 am, Dennis Lee Bieber wlfr...@ix.netcom.com wrote:
 On Tue, 9 Feb 2010 21:45:38 + (UTC), Grant Edwards
 inva...@invalid.invalid declaimed the following in
 gmane.comp.python.general:

  Doesn't work.  datetime.datetime.now has granularity of
  15-16ms.

  Intervals much less that that often come back with a delta of
  0.  A delay of 20ms produces a delta of either 15-16ms or
  31-32ms

         WinXP uses an ~15ms time quantum for task switching. Which defines
 the step rate of the wall clock output...

 http://www.eggheadcafe.com/software/aspnet/35546579/the-quantum-was-n...http://www.eggheadcafe.com/software/aspnet/32823760/how-do-you-set-ti...

 http://www.lochan.org/2005/keith-cl/useful/win32time.html
 --
         Wulfraed         Dennis Lee Bieber               KD6MOG
         wlfr...@ix.netcom.com     HTTP://wlfraed.home.netcom.com/

Gabriel Genellina reports that time.clock() uses Windows'
QueryPerformanceCounter() API, which has much higher resolution than
the task switcher's 15ms.  QueryPerformanceCounter's resolution is
hardware-dependent; using the Win API, and a little test program, I
get this value on my machine:
Frequency is 3579545 ticks/sec
Resolution is 0.279365114840015 microsecond/tick

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to measure elapsed time under Windows?

2010-02-09 Thread Paul McGuire

On Feb 9, 10:10 am, Grant Edwards inva...@invalid.invalid wrote:
 Is there another way to measure small periods of elapsed time
 (say in the 1-10ms range)?

On Feb 9, 10:10 am, Grant Edwards inva...@invalid.invalid wrote:
 Is there another way to measure small periods of elapsed time
 (say in the 1-10ms range)?


I made repeated calls to time.clock() in a generator expression, which
is as fast a loop I can think of in Python.  Then I computed the
successive time deltas to see if any granularities jumped out.  Here
are the results:

 import time
 from itertools import groupby

 # get about 1000 different values of time.clock()
 ts = set(time.clock() for i in range(1000))

 # sort in ascending order
 ts = sorted(ts)

 # compute diffs between adjacent time values
 diffs = [j-i for i,j in zip(ts[:-1],ts[1:])]

 # sort and group
 diffs.sort()
 diffgroups = groupby(diffs)

 # print the distribution of time differences in microseconds
 for i in diffgroups: print %3d %12.6f % (len(list(i[1])), i[0]*1e6)
...
 25 2.234921
 28 2.234921
242 2.514286
506 2.514286
 45 2.793651
116 2.793651
  1 3.073016
  8 3.073016
  6 3.352381
  4 3.631746
  3 3.92
  1 3.92
  5 4.190477
  2 4.469842
  1 6.146033
  1 8.660319
  1 9.79
  110.895239
  111.174605
  124.304765
  141.904767

There seems to be a step size of about .28 microseconds.  So I would
guess time.clock() has enough resolution.  But also beware of the
overhead of the calls to clock() - using timeit, I find that each call
takes about 2 microseconds (consistent with the smallest time
difference in the above data set).

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to print all expressions that match a regular expression

2010-02-07 Thread Paul McGuire

On Feb 6, 1:36 pm, hzh...@gmail.com hzh...@gmail.com wrote:
 Hi,

 I am a fresh man with python. I know there is regular expressions in
 Python. What I need is that given a particular regular expression,
 output all the matches. For example, given “[1|2|3]{2}” as the regular
 expression, the program should output all 9 matches, i.e., 11 12 13
 21 22 23 31 32 33.

 Is there any well-written routine in Python or third-party program to
 do this? If there isn't, could somebody make some suggestions on how
 to write it myself?

 Thanks.

 Zhuo

Please check out this example on the pyparsing wiki, invRegex.py:
http://pyparsing.wikispaces.com/file/view/invRegex.py.  This code
implements a generator that returns successive matching strings for
the given regex.  Running it, I see that you actually have a typo in
your example.

 print list(invert([1|2|3]{2}))
['11', '1|', '12', '13', '|1', '||', '|2', '|3', '21', '2|', '22',
'23', '31', '3|', '32', '33']

I think you meant either [123]{2} or (1|2|3){2}.

 print list(invert([123]{2}))
['11', '12', '13', '21', '22', '23', '31', '32', '33']

 print list(invert((1|2|3){2}))
['11', '12', '13', '21', '22', '23', '31', '32', '33']

Of course, as other posters have pointed out, this inverter does not
accept regexen with unbounded multiple characters '+' or '*', but '?'
and {min,max} notation will work.  Even '.' is supported, although
this can generate a large number of return values.

Of course, you'll also have to install pyparsing to get this to work.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread Paul McGuire

I never represented that this parser would handle any and all Excel
formulas!  But I should hope the basic structure of a pyparsing
solution might help the OP add some of the other features you cited,
if necessary. It's actually pretty common to take an incremental
approach in making such a parser, and so here are some of the changes
that you would need to make based on the deficiencies you pointed out:

functions can have a variable number of arguments, of any kind of
expression
- statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList
(expr) + RPAR

sheet name could also be a quoted string
- sheetRef = Word(alphas, alphanums) | QuotedString(',escQuote='')

add boolean literal support
- boolLiteral = oneOf(TRUE FALSE)
- operand = numericLiteral | funcCall | boolLiteral | cellRange |
cellRef

These small changes are enough to extend the parser to successfully
handle the test2a, 2b, and 3a cases.  (I'll add this to the pyparsing
wiki examples, as it looks like it is a good start on a familiar but
complex expression.)

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-13 Thread Paul McGuire

On Jan 5, 1:49 pm, Tim Chase python.l...@tim.thechases.com wrote:
 vsoler wrote:
  Hence, I need toparseExcel formulas. Can I do it by means only of re
  (regular expressions)?

  I know that for simple formulas such as =3*A7+5 it is indeed
  possible. What about complex for formulas that include functions,
  sheet names and possibly other *.xls files?

 Where things start getting ugly is when you have nested function
 calls, such as

    =if(Sum(A1:A25)42,Min(B1:B25), if(Sum(C1:C25)3.14,
 (Min(C1:C25)+3)*18,Max(B1:B25)))

 Regular expressions don't do well with nested parens (especially
 arbitrarily-nesting-depth such as are possible), so I'd suggest
 going for a full-blown parsing solution like pyparsing.

 If you have fair control over what can be contained in the
 formulas and you know they won't contain nested parens/functions,
 you might be able to formulate some sort of kinda, sorta, maybe
 parses some forms of formulas regexp.

 -tkc

This might give the OP a running start:

from pyparsing import (CaselessKeyword, Suppress, Word, alphas,
alphanums, nums, Optional, Group, oneOf, Forward, Regex,
operatorPrecedence, opAssoc, dblQuotedString)

test1 = =3*A7+5
test2 = =3*Sheet1!$A$7+5
test3 = =if(Sum(A1:A25)42,Min(B1:B25),  \
 if(Sum(C1:C25)3.14, (Min(C1:C25)+3)*18,Max(B1:B25)))

EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$')
sheetRef = Word(alphas, alphanums)
colRef = Optional(DOLLAR) + Word(alphas,max=2)
rowRef = Optional(DOLLAR) + Word(nums)
cellRef = Group(Optional(sheetRef + EXCL)(sheet) + colRef(col) +
rowRef(row))

cellRange = (Group(cellRef(start) + COLON + cellRef(end))
(range)
| cellRef )

expr = Forward()

COMPARISON_OP = oneOf( =  = = != )
condExpr = expr + COMPARISON_OP + expr

ifFunc = (CaselessKeyword(if) +
  LPAR +
  Group(condExpr)(condition) +
  COMMA + expr(if_true) +
  COMMA + expr(if_false) + RPAR)
statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange +
RPAR
sumFunc = statFunc(sum)
minFunc = statFunc(min)
maxFunc = statFunc(max)
aveFunc = statFunc(ave)
funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc

multOp = oneOf(* /)
addOp = oneOf(+ -)
numericLiteral = Regex(r\-?\d+(\.\d+)?)
operand = numericLiteral | funcCall | cellRange | cellRef
arithExpr = operatorPrecedence(operand,
[
(multOp, 2, opAssoc.LEFT),
(addOp, 2, opAssoc.LEFT),
])

textOperand = dblQuotedString | cellRef
textExpr = operatorPrecedence(textOperand,
[
('', 2, opAssoc.LEFT),
])
expr  (arithExpr | textExpr)

import pprint
for test in (test1,test2, test3):
print test
pprint.pprint( (EQ + expr).parseString(test).asList() )
print


Prints:

=3*A7+5
[[['3', '*', ['A', '7']], '+', '5']]

=3*Sheet1!$A$7+5
[[['3', '*', ['Sheet1', 'A', '7']], '+', '5']]

=if(Sum(A1:A25)42,Min(B1:B25), if(Sum(C1:C25)3.14, (Min(C1:C25)+3)
*18,Max(B1:B25)))
['if',
 ['sum', [['A', '1'], ['A', '25']], '', '42'],
 'min',
 [['B', '1'], ['B', '25']],
 'if',
 ['sum', [['C', '1'], ['C', '25']], '', '3.14'],
 [['min', [['C', '1'], ['C', '25']], '+', '3'], '*', '18'],
 'max',
 [['B', '1'], ['B', '25']]]


-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regex help needed!

2009-12-22 Thread Paul McGuire

On Dec 21, 5:38 am, Oltmans rolf.oltm...@gmail.com wrote:
 Hello,. everyone.

 I've a string that looks something like
 
 lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
 =   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
 

 From above string I need the digits within the ID attribute. For
 example, required output from above string is
 - 35343433
 - 345343
 - 8898

 I've written this regex that's kind of working
 re.findall(\w+\s*\W+amazon_(\d+),str)


The issue with using regexen for parsing HTML is that you often get
surprised by attributes that you never expected, or out of order, or
with weird or missing quotation marks, or tags or attributes that are
in upper/lower case.  BeautifulSoup is one tool to use for HTML
scraping, here is a pyparsing example, with hopefully descriptive
comments:


from pyparsing import makeHTMLTags,ParseException

src = 
lksjdfls div id ='amazon_345343' kdjff lsdfs /div sdjfls div id
=   amazon_35343433sdfsd/divdiv id='amazon_8898'welcome/div
hello, my age is 86 years old and I was born in 1945. Do you know
that
PI is roughly 3.1443534534534534534 

# use makeHTMLTags to return an expression that will match
# HTML div tags, including attributes, upper/lower case,
# etc. (makeHTMLTags will return expressions for both
# opening and closing tags, but we only care about the
# opening one, so just use the [0]th returned item
div = makeHTMLTags(div)[0]

# define a parse action to filter only for div tags
# with the proper id form
def filterByIdStartingWithAmazon(tokens):
if not tokens.id.startswith(amazon_):
raise ParseException(
  must have id attribute starting with 'amazon_')

# define a parse action that will add a pseudo-
# attribute 'amazon_id', to make it easier to get the
# numeric portion of the id after the leading 'amazon_'
def makeAmazonIdAttribute(tokens):
tokens[amazon_id] = tokens.id[len(amazon_):]

# attach parse action callbacks to the div expression -
# these will be called during parse time
div.setParseAction(filterByIdStartingWithAmazon,
 makeAmazonIdAttribute)

# search through the input string for matching divs,
# and print out their amazon_id's
for divtag in div.searchString(src):
print divtag.amazon_id


Prints:

345343
35343433
8898

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to create a docstring for a module?

2009-12-06 Thread Paul McGuire

On Dec 6, 7:43 am, Steven D'Aprano st...@remove-this-
cybersource.com.au wrote:
 On Sun, 06 Dec 2009 06:34:17 -0600, Tim Chase wrote:
  I've occasionally wanted something like this, and have found that it can
  be done by manually assigning to __doc__ (either at the module-level or
  classes) which can make some documentation bits a little easier:

 Unfortunately, and surprisingly, assigning to __doc__ doesn't work with
 new-style classes.

 --
 Steven

Fortunately, in the OP's case, he isn't trying to do this with a
class, but with a module.  For me, assigning to __doc__ at the module
works in defining a docstring for pyparsing, at least for Py2.5.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: trouble with regex?

2009-10-08 Thread Paul McGuire

On Oct 8, 11:42 am, MRAB pyt...@mrabarnett.plus.com wrote:
 inhahe wrote:
  Can someone tell me why this doesn't work?

  colorre = re.compile ('('
                          '^'
                         '|'
                          '(?:'
                             '\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
                             '(?:'
                                ',(?:10|11|12|13|14|15|0\\d|\\d)'
                             ')?'
                          ')'
                        ')(.*?)')

  I'm trying to extract mirc color codes.


You might find this site interesting (http://utilitymill.com/utility/
Regex_For_Range) to generate RE's for numeric ranges.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: bug with itertools.groupby?

2009-10-07 Thread Paul McGuire

On Oct 6, 6:06 pm, Kitlbast vlad.shevche...@gmail.com wrote:

 grouped acc:  61
 grouped acc:  64
 grouped acc:  61

 am I doing something wrong?

sort first, then groupby.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Regular expression to structure HTML

2009-10-02 Thread Paul McGuire

On Oct 2, 12:10 am, 504cr...@gmail.com 504cr...@gmail.com wrote:
 I'm kind of new to regular expressions, and I've spent hours trying to
 finesse a regular expression to build a substitution.

 What I'd like to do is extract data elements from HTML and structure
 them so that they can more readily be imported into a database.

Oy! If I had a nickel for every misguided coder who tried to scrape
HTML with regexes...

Some reasons why RE's are no good at parsing HTML:
- tags can be mixed case
- tags can have whitespace in many unexpected places
- tags with no body can combine opening and closing tag with a '/'
before the closing '', as in BR/
- tags can have attributes that you did not expect (like BR
CLEAR=ALL)
- attributes can occur in any order within the tag
- attribute names can also be in unexpected upper/lower case
- attribute values can be enclosed in double quotes, single quotes, or
even (surprise!) NO quotes

For HTML that is machine-generated, you *may* be able to make some
page-specific assumptions.  But if edited by human hands, or if you
are trying to make a generic page scraper, RE's will never cut it.

-- Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Are min() and max() thread-safe?

2009-09-17 Thread Paul McGuire

On Sep 16, 11:33 pm, Steven D'Aprano
ste...@remove.this.cybersource.com.au wrote:
 I have two threads, one running min() and the other running max() over
 the same list. I'm getting some mysterious results which I'm having
 trouble debugging. Are min() and max() thread-safe, or am I doing
 something fundamentally silly by having them walk over the same list
 simultaneously?


If you are calculating both min and max of a sequence, here is an
algorithm that can cut your comparisons by 25% - for objects with rich/
time-consuming comparisons, that can really add up.

import sys
if sys.version[0] == 2:
range = xrange

def minmax(seq):
if not seq:
return None, None
min_ = seq[0]
max_ = seq[0]
seqlen = len(seq)
start = seqlen % 2
for i in range(start,seqlen,2):
a,b = seq[i],seq[i+1]
if a  b:
a,b = b,a
if a  min_:
min_ = a
if b  max_:
max_ = b
return min_,max_

With this test code, I verified that the comparison count drops from
2*len to 1.5*len:

if __name__ == __main__:

import sys
if sys.version[0] == 2:
range = xrange

import random

def minmax_bf(seq):
# brute force, just call min and max on sequence
return min(seq),max(seq)

testseq = [random.random() for i in range(100)]

print minmax_bf(testseq)
print minmax(testseq)

class ComparisonCounter(object):
tally = 0
def __init__(self,obj):
self.obj = obj
def __cmp__(self,other):
ComparisonCounter.tally += 1
return cmp(self.obj,other.obj)
def __getattr__(self,attr):
return getattr(self.obj, attr)
def __str__(self):
return str(self.obj)
def __repr__(self):
return repr(self.obj)

testseq = [ComparisonCounter(random.random()) for i in range
(10001)]

print minmax_bf(testseq)
print ComparisonCounter.tally
ComparisonCounter.tally = 0

print minmax(testseq)
print ComparisonCounter.tally


Plus, now that you are finding both min and max in a single pass
through the sequence, you can wrap this in a lock to make sure of the
atomicity of your results.

(Just for grins, I also tried sorting the list and returning elements
0 and -1 for min and max - I got numbers of comparisons in the range
of 12X to 15X the length of the sequence.)

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Retrieve url's of all jpegs at a web page URL

2009-09-15 Thread Paul McGuire

On Sep 15, 11:32 pm, Stefan Behnel stefan...@behnel.de wrote:
 Also untested:

         from lxml import html

         doc = html.parse(page_url)
         doc.make_links_absolute(page_url)

         urls = [ img.src for img in doc.xpath('//img') ]

 Then use e.g. urllib2 to save the images.

Looks similar to what a pyparsing approach would look like:

from pyparsing import makeHTMLTags, htmlComment

import urllib
html = urllib.urlopen(url).read()

imgTag = makeHTMLTags(img)[0]
imgTag.ignore(htmlComment)

urls = [ img.src for img in imgTag.searchString(html) ]

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Where regexs listed for Python language's tokenizer/lexer?

2009-09-12 Thread Paul McGuire

On Sep 12, 1:10 am, Chris Seberino cseber...@gmail.com wrote:
 Where regexs listed for Python language's tokenizer/lexer?

 If I'm not mistaken, the grammar is not sufficient to specify the
 language
 you also need to specify the regexs that define the tokens
 right?..where is that?

I think the OP is asking for the regexs that define the terminals
referenced in the Python grammar, similar to those found in yacc token
definitions.  He's not implying that there are regexs that implement
the whole grammar.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Something confusing about non-greedy reg exp match

2009-09-07 Thread Paul McGuire

On Sep 6, 11:23 pm, Ben Finney ben+pyt...@benfinney.id.au wrote:
 George Burdell gburde...@gmail.com writes:
  I want to find every occurrence of money, and for each
  occurrence, I want to scan back to the first occurrence
  of hello. How can this be done?

 By recognising the task: not expression matching, but lexing and
 parsing. For which you might find the ‘pyparsing’ library of use
 URL:http://pyparsing.wikispaces.com/.


Even pyparsing has to go through some gyrations to do this sort of
match, then backup parsing.  Here is my solution:

 from pyparsing import SkipTo, originalTextFor
 expr = originalTextFor(hello + SkipTo(money, failOn=hello, 
 include=True))
 print expr.searchString('hello how are you hello funny money')
[['hello funny money']]


SkipTo is analogous to the OP's .*?, but the failOn attribute adds the
logic if this string is found before matching the target string, then
fail.  So pyparsing scans through the string, matches the first
hello, attempts to skip to the next occurrence of money, but finds
another hello first, so this parse fails.  Then the scan continues
until the next hello is found, and this time, SkipTo successfully
finds money without first hitting a hello.  I then had to wrap the
whole thing in a helper method originalTextFor, otherwise I get an
ugly grouping of separate strings.

So I still don't really have any kind of backup after matching
parsing, I just turned this into a qualified forward match.  One could
do a similar thing with a parse action.  If you could attach some kind
of validating function to a field within a regex, you could have done
the same thing there.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Smallest float different from 0.0?

2009-09-07 Thread Paul McGuire

On Sep 7, 9:47 am, kj no.em...@please.post wrote:
 Is there some standardized way (e.g. some official module of such
 limit constants) to get the smallest positive float that Python
 will regard as distinct from 0.0?

 TIA!

 kj

You could find it for yourself:

 for i in range(400):
...if 10**-i == 0:
...   print i
...   break
...
324

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Creating slice notation from string

2009-09-02 Thread Paul McGuire

On Sep 2, 4:55 pm, bvdp b...@mellowood.ca wrote:
 I'm trying to NOT create a parser to do this  and I'm sure that
 it's easy if I could only see the light!


Well, this is a nice puzzler, better than a sudoku.  Maybe a quick
parser with pyparsing will give you some guidance on how to do this
without a parser library:

from pyparsing import *

# relevant punctuation, suppress after parsing
LBR,RBR,COLON = map(Suppress,[]:)

# expression to parse numerics and convert to int's
integer = Regex(-?\d+).setParseAction(lambda t: int(t[0]))

# first try, almost good enough, but wrongly parses [2] - [2::]
sliceExpr = ( LBR + Optional(integer,default=None) +
Optional(COLON + Optional(integer,default=None),
default=None) +
Optional(COLON + Optional(integer,default=None),
default=None) +
RBR )

# better, this version special-cases [n] - [n:n+1]
# otherwise, just create slice from parsed int's
singleInteger = integer + ~FollowedBy(COLON)
singleInteger.setParseAction(lambda t : [t[0],t[0]+1])

sliceExpr = ( LBR +
(singleInteger |
Optional(integer,default=None) +
Optional(COLON + Optional(integer,default=None),
default=None) +
Optional(COLON + Optional(integer,default=None),
default=None)
) +
  RBR )

# attach parse action to convert parsed int's to a slice
sliceExpr.setParseAction(lambda t: slice(*(t.asList(


tests = \
[2]
[2:3]
[2:]
[2::2]
[-1:-1:-1]
[:-1]
[::-1]
[:].splitlines()

testlist = range(10)
for t in tests:
parsedSlice = sliceExpr.parseString(t)[0]
print t, parsedSlice, testlist[parsedSlice]


Prints:

[2] slice(2, 3, None) [2]
[2:3] slice(2, 3, None) [2]
[2:] slice(2, None, None) [2, 3, 4, 5, 6, 7, 8, 9]
[2::2] slice(2, None, 2) [2, 4, 6, 8]
[-1:-1:-1] slice(-1, -1, -1) []
[:-1] slice(None, -1, None) [0, 1, 2, 3, 4, 5, 6, 7, 8]
[::-1] slice(None, None, -1) [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
[:] slice(None, None, None) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Yes, it is necessary to handle the special case of a slice that is
really just a single index.  If your list of parsed integers has only
a single value n, then the slice constructor creates a slice
(None,n,None).  What you really want, if you want everything to create
a slice, is to get slice(n,n+1,None).  That is what the singleInteger
special case does in the pyparsing parser.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is behavior of += intentional for int?

2009-08-30 Thread Paul McGuire

On Aug 30, 2:33 am, Derek Martin c...@pizzashack.org wrote:
 THAT is why Python's behavior with regard to numerical objects is
 not intuitive, and frankly bizzare to me, and I dare say to others who
 find it so.

 Yes, that's right.  BIZZARE.


Can't we all just get along?

I think the question boils down to where is the object?.  In this
statement:

a = 3

which is the object, a or 3?

There exist languages (such as C++) that allow you to override the '='
assignment as a class operator.  So that I could create a class where
I decided that assigning an integer value to it applies some
application logic, probably the setting of some fundamental
attribute.  In that language, 'a' is the object, and 3 is a value
being assigned to it.  This can cause some consternation when a reader
(or worse, maintainer) isn't familiar with my code, sees this simple
assignment, and figures that they can use 'a' elsewhere as a simple
integer, with some surprising or disturbing results.

Python just doesn't work that way.

Python binds values to names. Always. In Python, = is not and never
could be a class operator.  In Python, any expression of LHS = RHS,
LHS is always a name, and in this statement it is being bound to some
object found by evaluating the right hand side, RHS.

The bit of confusion here is that the in-place operators like +=, -=,
etc. are something of a misnomer - obviously a *name* can't be
incremented or decremented (unlike a pointer in C or C++).  One has to
see that these are really shortcuts for LHS = LHS + RHS, and once
again, our LHS is just a name getting bound to the result of LHS +
RHS.  Is this confusing, or non-intuitive? Maybe. Do you want to write
code in Python? Get used to it.  It is surprising how many times we
think things are intuitive when we really mean they are familiar.
For long-time C and Java developers, it is intuitive that variables
are memory locations, and switching to Python's name model for them is
non-intuitive.

As for your quibble about 3 is not an object, I'm afraid that may be
your own personal set of blinders.  Integer constants as objects is
not unique to Python, you can see it in other languages - Smalltalk
and Ruby are two that I know personally.  Ruby implements a loop using
this interesting notation:

3.times do
   ...do something...
end

Of course, it is a core idiom of the language, and if I adopted Ruby,
I would adopt its idioms and object model.

Is it any odder that 3 is an object than that the string literal
Hello, World! is an object?  Perhaps we are just not reminded of it
so often, because Python's int class defines no methods that are not
__ special methods (type in dir(3) at the Python prompt).  So we
never see any Python code referencing a numeric literal and
immediately calling a method on it, as in Ruby's simple loop
construct.  But we do see methods implemented on str like split(), and
so about above across after against.split() gives me a list of the
English prepositions that begin with a. We see this kind of thing
often enough, we get accustomed to the objectness of string literals.
It gets to be so familiar, it eventually seems intuitive.  You
yourself mentioned that intuition is subjective - unfortunately, the
intuitiveness of a feature is often tied to its value as a coding
concept, and so statements of non-intuitiveness can be interpreted as
a slant against the virtue of that concept, or even against the
language itself.

Once we accept that 3 is an object, we clearly have to stipulate that
there can be no changes allowed to it.  3 must *always* have the value
of the integer between 2 and 4.  So our language invokes the concept
that some classes create instances that are immutable.

For a Python long-timer like Mr. D'Aprano, I don't think he even
consciously thinks about this kind of thing any more; his intuition
has aligned with the Python stars, so he extrapolates from the OP's
suggestion to the resulting aberrant behavior, as he posted it.

You can dispute and rail at this core language concept if you like,
but I think the more entrenched you become in the position that '3 is
an object' is bizarre, the less enjoyable your Python work will be.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is behavior of += intentional for int?

2009-08-30 Thread Paul McGuire

On Aug 30, 5:42 am, Paul McGuire pt...@austin.rr.com wrote:
 Python binds values to names. Always. In Python, = is not and never
 could be a class operator.  In Python, any expression of LHS = RHS,
 LHS is always a name, and in this statement it is being bound to some
 object found by evaluating the right hand side, RHS.

An interesting side note, and one that could be granted to the OP, is
that Python *does* support the definition of class operator overrides
for in-place assignment operators like += (by defining a method
__iadd__).  This is how numpy's values accomplish their mutability.

 It is surprising how many times we
 think things are intuitive when we really mean they are familiar.
Of course, just as I was typing my response, Steve D'Aprano beat me to
the punch.

Maybe it's time we added a new acronym to this group's ongoing
discussions: PDWTW, or Python doesn't work that way.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is behavior of += intentional for int?

2009-08-29 Thread Paul McGuire

On Aug 29, 7:45 am, zaur szp...@gmail.com wrote:
 Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)
 [GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
 Type copyright, credits or license() for more information. a=1
  x=[a]
  id(a)==id(x[0])
 True
  a+=1
  a
 2
  x[0]

 1

 I thought that += should only change the value of the int object. But
 += create new.
 Is this intentional?

ints are immutable.  But your logic works fine with a mutable object,
like a list:

 a = [1]
 x = [a]
 print id(a) == id(x[0])
True
 a += [1]
 print a
[1, 1]
 print x[0]
[1, 1]

What exactly are you trying to do?

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: regexp help

2009-08-27 Thread Paul McGuire

On Aug 27, 1:15 pm, Bakes ba...@ymail.com wrote:
 If I were using the code:

 (?Pdata[0-9]+)

 to get an integer between 0 and 9, how would I allow it to register
 negative integers as well?

With that + sign in there, you will actually get an integer from 0 to
9...

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python fast HTML data extraction library

2009-07-22 Thread Paul McGuire

On Jul 22, 5:43 pm, Filip pink...@gmail.com wrote:

 My library, rather than parsing the whole input into a tree, processes
 it like a char stream with regular expressions.


Filip -

In general, parsing HTML with re's is fraught with easily-overlooked
deviations from the norm.  But since you have stepped up to the task,
here are some comments on your re's:

# You should use raw string literals throughout, as in:
# blah_re = re.compile(r'sljdflsflds')
# (note the leading r before the string literal).  raw string
literals
# really help keep your re expressions clean, so that you don't ever
# have to double up any '\' characters.

# Attributes might be enclosed in single quotes, or not enclosed in
any quotes at all.
attr_re = re.compile('([\da-z]+?)\s*=\s*\(.*?)\', re.DOTALL |
re.UNICODE | re.IGNORECASE)

# Needs re.IGNORECASE, and can have tag attributes, such as BR
CLEAR=ALL
line_break_re = re.compile('br\/?', re.UNICODE)

# what about HTML entities defined using hex syntax, such as #;
amp_re = re.compile('\(?![a-z]+?\;)', re.UNICODE | re.IGNORECASE)

How would you extract data from a table?  For instance, how would you
extract the data entries from the table at this URL:
http://tf.nist.gov/tf-cgi/servers.cgi ?  This would be a good example
snippet for your module documentation.

Try extracting all of the a href=...sldjlsfjd/a links from
yahoo.com, and see how much of what you expect actually gets matched.

Good luck!

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Override a method but inherit the docstring

2009-07-16 Thread Paul McGuire

On Jul 16, 8:01 pm, Ben Finney ben+pyt...@benfinney.id.au wrote:
 Howdy all,

 The following is a common idiom::

     class FooGonk(object):
         def frobnicate(self):
              Frobnicate this gonk. 
             basic_implementation(self.wobble)

     class BarGonk(FooGonk):
         def frobnicate(self):
             special_implementation(self.warble)

 The docstring for ‘FooGonk.frobnicate’ is, intentionally, perfectly
 applicable to the ‘BarGonk.frobnicate’ method also. Yet in overriding
 the method, the original docstring is not associated with it.

 What is the most Pythonic, DRY-adherent, and preferably least-ugly
 approach to override a method, but have the same docstring on both
 methods?


Two ideas come to mind, the decorator way and the metaclass way.  I am
not a guru at either, but these two examples work:

# the decorator way
def inherit_docstring_from(cls):
def docstring_inheriting_decorator(fn):
fn.__doc__ = getattr(cls,fn.__name__).__doc__
return fn
return docstring_inheriting_decorator


class FooGonk(object):
def frobnicate(self):
 Frobnicate this gonk. 
basic_implementation(self.wobble)


class BarGonk(FooGonk):
@inherit_docstring_from(FooGonk)
def frobnicate(self):
special_implementation(self.warble)

bg = BarGonk()
help(bg.frobnicate)

Prints:
Help on method frobnicate in module __main__:

frobnicate(self) method of __main__.BarGonk instance
Frobnicate this gonk.


Using a decorator in this manner requires repeating the super class
name.  Perhaps there is a way to get the bases of BarGonk, but I don't
think so, because at the time that the decorator is called, BarGonk is
not yet fully defined.



# The metaclass way

from types import FunctionType

class DocStringInheritor(type):
def __new__(meta, classname, bases, classDict):
newClassDict = {}
for attributeName, attribute in classDict.items():
if type(attribute) == FunctionType:
# look through bases for matching function by name
for baseclass in bases:
if hasattr(baseclass, attributeName):
basefn = getattr(baseclass,attributeName)
if basefn.__doc__:
attribute.__doc__ = basefn.__doc__
break

newClassDict[attributeName] = attribute

return type.__new__(meta, classname, bases, newClassDict)

class FooGonk2(object):
def frobnicate(self):
 Frobnicate this gonk. 
basic_implementation(self.wobble)


class BarGonk2(FooGonk2):
__metaclass__ = DocStringInheritor
def frobnicate(self):
special_implementation(self.warble)

bg = BarGonk2()
help(bg.frobnicate)

Prints:

Help on method frobnicate in module __main__:

frobnicate(self) method of __main__.BarGonk2 instance
Frobnicate this gonk.


This metaclass will walk the list of bases until the desired
superclass method is found AND if that method has a docstring and only
THEN does it attach the superdocstring to the derived class method.

Please use carefully, I just did the metaclass thing by following
Michael Foord's Metaclass tutorial (http://www.voidspace.org.uk/python/
articles/metaclasses.shtml), I may have missed a step or two.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: c++ Source Code for acm 2004-2005 problems

2009-07-12 Thread Paul McGuire

On Jul 12, 5:24 pm, Davood Vahdati davoodvahdati2...@gmail.com
wrote:
 Dear Sirs And Madams :

 it is an Acm programming competition Questions in year 2004-2005 .
 could you please solve problems is question ? I  Wan't C++ Source Code
 program About this questions OR Problems . thank you for your prompt
 attention to this matter

 2 1
 4 3
 5 1
 4 2

huge chunk of OT content snipped

looking for the Python content in this post...

m, nope, didn't find any...

I guess the OP tried on a C++ newsgroup and got told to do his own
homework, so he came here instead?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Examples of Python driven Microsoft UI Automation wanted

2009-07-09 Thread Paul McGuire

On Jul 9, 1:09 pm, DuaneKaufman duane.kauf...@gmail.com wrote:
 The application I wish to interact with is not my own, but an ERP
 system GUI front-end.


I have used pywinauto to drive a Flash game running inside of an
Internet Explorer browser - that's pretty GUI!

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing times like 5 minutes ago?

2009-07-07 Thread Paul McGuire

On Jul 6, 7:21 pm, m...@pixar.com wrote:
 I'm looking for something like Tcl's [clock scan] command which parses
 human-readable time strings such as:

     % clock scan 5 minutes ago
     1246925569
     % clock scan tomorrow 12:00
     1246993200
     % clock scan today + 1 fortnight
     1248135628

 Does any such package exist for Python?

 Many TIA!
 Mark

 --
 Mark Harrison
 Pixar Animation Studios

I've been dabbling with such a parser with pyparsing - here is my
progress so far: http://pyparsing.wikispaces.com/UnderDevelopment

It parses these test cases:

today
tomorrow
yesterday
in a couple of days
a couple of days from now
a couple of days from today
in a day
3 days ago
3 days from now
a day ago
now
10 minutes ago
10 minutes from now
in 10 minutes
in a minute
in a couple of minutes
20 seconds ago
in 30 seconds
20 seconds before noon
20 seconds before noon tomorrow
noon
midnight
noon tomorrow


-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Paul McGuire

On Jul 5, 3:12 am, Hendrik van Rooyen m...@microcorp.co.za wrote:

 Use a dispatch dict, and have each state return the next state.
 Then you can use strings representing state names, and
 everybody will be able to understand the code.

 toy example, not tested, nor completed:

 protocol = {start:initialiser,hunt:hunter,classify:classifier,other
 states}

 def state_machine():
 next_step = protocol[start]()
 while True:
 next_step = protocol[next_step]()


I've just spent about an hour looking over this code, with a few
comments to inject to the thread here:

- To all those suggesting the OP convert to a dispatch table, be
assured that this code is well aware of this idiom.  It is used
HEAVILY at a macro level, picking through the various HTML states
(starting a tag, reading attributes, reading body, etc.).  There still
are a number of cascading if-elif's within some of these states, and
some of them *may* be candidates for further optimization.

- There is an underlying HTMLInputStream that seems to be doing some
unnecessary position bookkeeping (positionLine and positionCol).
Commenting this out increases my test speed by about 13%.  In my
ignorance, I may be removing some important behavior, but this does
not seem to be critical as I tested against a few megs of HTML
source.  Before blaming the tokenizer for everything, there may be
more performance to be wrung from the input stream processor.  For
that matter, I would guess that about 90% of all HTML files that this
code would process would easily fit in memory - in that case, the
stream processing (and all of the attendant if I'm not at the end of
the current chunk code) could be skipped/removed entirely.

- The HTMLInputStream's charsUntil code is an already-identified
bottleneck, and some re enhancements have been applied here to help
out.

- Run-time construction of tuple literals where the tuple members are
constants can be lifted out.  emitCurrentToken rebuilds this tuple
every time it is called (which is a lot!):

if (token[type] in (tokenTypes[StartTag], tokenTypes
[EndTag], tokenTypes[EmptyTag])):

Move this tuple literal into a class constant (or if you can tolerate
it, a default method argument to gain LOAD_FAST benefits - sometimes
optimization isn't pretty).

- These kinds of optimizations are pretty small, and only make sense
if they are called frequently.  Tallying which states are called in my
test gives the following list in decreasing frequency.  Such a list
would help guide your further tuning efforts:

tagNameState194848
dataState   182179
attributeNameState  116507
attributeValueDoubleQuotedState 114931
tagOpenState105556
beforeAttributeNameState58612
beforeAttributeValueState   58216
afterAttributeValueState58083
closeTagOpenState   50547
entityDataState 1673
attributeValueSingleQuotedState 1098
commentEndDashState 372
markupDeclarationOpenState  370
commentEndState 364
commentStartState   362
commentState362
selfClosingStartTagState359
doctypePublicIdentifierDoubleQuotedState291
doctypeSystemIdentifierDoubleQuotedState247
attributeValueUnQuotedState 191
doctypeNameState32
beforeDoctypePublicIdentifierState  16
afterDoctypePublicIdentifierState   14
afterDoctypeNameState   9
doctypeState8
beforeDoctypeNameState  8
afterDoctypeSystemIdentifierState   6
afterAttributeNameState 5
commentStartDashState   2
bogusCommentState   2

For instance, I wouldn't bother doing much tuning of the
bogusCommentState.  Anything called fewer than 50,000 times in this
test doesn't look like it would be worth the trouble.


-- Paul

(Thanks to those who suggested pyparsing as an alternative, but I
think this code is already beyond pyparsing in a few respects.  For
one thing, this code works with an input stream, in order to process
large HTML files; pyparsing *only* works with an in-memory string.
This code can also take advantage of some performance short cuts,
knowing that it is parsing HTML; pyparsing's generic classes can't do
that.)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to insert string in each match using RegEx iterator

2009-06-10 Thread Paul McGuire

On Jun 9, 11:13 pm, 504cr...@gmail.com 504cr...@gmail.com wrote:
 By what method would a string be inserted at each instance of a RegEx
 match?


Some might say that using a parsing library for this problem is
overkill, but let me just put this out there as another data point for
you.  Pyparsing (http://pyparsing.wikispaces.com) supports callbacks
that allow you to embellish the matched tokens, and create a new
string containing the modified text for each match of a pyparsing
expression.  Hmm, maybe the code example is easier to follow than the
explanation...


from pyparsing import Word, nums, Regex

# an integer is a 'word' composed of numeric characters
integer = Word(nums)

# or use this if you prefer
integer = Regex(r'\d+')

# attach a parse action to prefix 'INSERT ' before the matched token
integer.setParseAction(lambda tokens: INSERT  + tokens[0])

# use transformString to search through the input, applying the
# parse action to all matches of the given expression
test = '123 abc 456 def 789 ghi'
print integer.transformString(test)

# prints
# INSERT 123 abc INSERT 456 def INSERT 789 ghi


I offer this because often the simple examples that get posted are
just the barest tip of the iceberg of what the poster eventually plans
to tackle.

Good luck in your Pythonic adventure!
-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: random number including 1 - i.e. [0,1]

2009-06-10 Thread Paul McGuire

On Jun 9, 11:23 pm, Esmail ebo...@hotmail.com wrote:
 Here is part of the specification of an algorithm I'm implementing that
 shows the reason for my original query:

 vid = w * vid + c1 * rand( ) * ( pid – xid ) + c2 * Rand( ) * (pgd –xid ) (1a)

 xid = xid + vid (1b)

 where c1 and c2 are two positive constants,
 rand() and Rand() are two random functions in the range [0,1],
 ^
 and w is the inertia weight.

It is entirely possible that the documentation you have for the
original rand() and Rand() functions have misstated their range.  In
my experience, rand() functions that I have worked with have always
been [0,1).

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: random number including 1 - i.e. [0,1]

2009-06-09 Thread Paul McGuire

On Jun 9, 4:33 pm, Esmail ebo...@hotmail.com wrote:
 Hi,

 random.random() will generate a random value in the range [0, 1).

 Is there an easy way to generate random values in the range [0, 1]?
 I.e., including 1?


Are you trying to generate a number in the range [0,n] by multiplying
a random function that returns [0,1] * n?  If so, then you want to do
this using: int(random.random()*(n+1))  This will give equal chance of
getting any number from 0 to n.

If you had a function that returned a random in the range [0,1], then
multiplying by n and then truncating would give only the barest sliver
of a chance of giving the value n.  You could try rounding, but then
you get this skew:

0 for values [0, 0.5) (width of 0.5)
1 for value [0.5, 1.5) (width of 1)
...
n for value [n-0.5, n] (width of ~0.50001)

Still not a uniform die roll.  You have only about 1/2 the probability
of getting 0 or n as any other value.

If you want to perform a fair roll of a 6-sided die, you would start
with int(random.random() * 6).  This gives a random number in the
range [0,5], with each value of the same probability.  How to get our
die roll that goes from 1 to 6?  Add 1.  Thus:

  die_roll = lambda : int(random.random() * 6) + 1

Or for a n-sided die:

  die_roll = lambda n : int(random.random() * n) + 1

This is just guessing on my part, but otherwise, I don't know why you
would care if random.random() returned values in the range [0,1) or
[0,1].

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: networking simulator on python

2009-06-08 Thread Paul McGuire

On Jun 8, 7:18 pm, Ala shaib...@ymail.com wrote:
 Hello everyone.

 I plan on starting to write a network simulator on python for testing a
 modified version of TCP.

 I am wondering if a python network simulator exists? Also, if anyone
 tried using simpy for doing a simulation.

 Thank you

There was an article on just this topic in the April issue of Python
Magazine.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: I need help building a data structure for a state diagram

2009-05-25 Thread Paul McGuire

On May 24, 1:16 pm, Matthew Wilson m...@tplus1.com wrote:
 I'm working on a really simple workflow for my bug tracker.  I want
 filed bugs to start in an UNSTARTED status.  From there, they can go to
 STARTED.


I just wrote an article for the April issue of Python Magazine on how
to add embedded DSL code to your Python scripts using Python's imputil
module, and I used a state pattern for my example.  Two state machine
examples I used to illustrate the work were a traffic light and a
library book checkin/checkout.  The traffic light state machine is
just a simple cycle through the 3 light states.  But the library book
state machine is more complex (your bug tracking example made me think
of it), with different transitions allowed one state into multiple
different states.  Here is how the code looks for these examples:

==
# trafficLight.pystate
statemachine TrafficLight:
Red - Green
Green - Yellow
Yellow - Red

Red.carsCanGo= False
Yellow.carsCanGo = True
Green.carsCanGo  = True

# ... other class definitions for state-specific behavior ...

==
# trafficLightDemo.py

# add support for .pystate files, with
# embedded state machine DSL
import stateMachine

import trafficLight

tlight = trafficLight.Red()
while 1:
print tlight, GO if tlight.carsCanGo else STOP
tlight.delay()
tlight = tlight.next_state()


==
# libraryBook.pystate

statemachine BookCheckout:
New   -(create)-   Available
Available -(reserve)-  Reserved
Available -(checkout)- CheckedOut
Reserved  -(cancel)-   Available
Reserved  -(checkout)- CheckedOut
CheckedOut -(checkin)-  Available
CheckedOut -(renew)-   CheckedOut


You don't need to adopt this whole DSL implementation, but the article
might give you some other ideas.

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: slice iterator?

2009-05-08 Thread Paul McGuire

On May 8, 11:14 pm, Ned Deily n...@acm.org wrote:
 In article 7xprejoswg@ruckus.brouhaha.com,
  Paul Rubin http://phr...@nospam.invalid wrote:





  Ross ross.j...@gmail.com writes:
   I have a really long list that I would like segmented into smaller
   lists. Let's say I had a list a = [1,2,3,4,5,6,7,8,9,10,11,12] and I
   wanted to split it into groups of 2 or groups of 3 or 4, etc. Is there
   a way to do this without explicitly defining new lists?

  That question comes up so often it should probably be a standard
  library function.

  Anyway, here is an iterator, if that's what you want:
      from itertools import islice
      a = range(12)
      xs = iter(lambda x=iter(a): list(islice(x,3)), [])
      print list(xs)
     [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
  Of course, as the saying goes, there's more than one way to do it ;-)

 python2.6 itertools introduces the izip_longest function and the grouper
 recipe http://docs.python.org/library/itertools.html:

 def grouper(n, iterable, fillvalue=None):
     grouper(3, 'ABCDEFG', 'x') -- ABC DEF Gxx
     args = [iter(iterable)] * n
     return izip_longest(fillvalue=fillvalue, *args)

 --
  Ned Deily,
  n...@acm.org- Hide quoted text -

 - Show quoted text -

Here's a version that works pre-2.6:

 grouper = lambda iterable,size,fill=None : 
 zip(*[(iterable+[fill,]*(size-1))[i::size] for i in range(size)])
 a = range(12)
 grouper(a,6)
[(0, 1, 2, 3, 4, 5), (6, 7, 8, 9, 10, 11)]
 grouper(a,5)
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, None, None, None)]
 grouper(a,3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11)]

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: string processing question

2009-04-30 Thread Paul McGuire

On Apr 30, 11:55 am, Kurt Mueller m...@problemlos.ch wrote:
 Hi,

 on a Linux system and python 2.5.1 I have the
 following behaviour which I do not understand:

 case 1 python -c 'a=ä; print a ; print a.center(6,-) ; b=unicode(a, 
 utf8); print b.center(6,-)'

 ä
 --ä--
 --ä---



Weird.  What happens if you change the second print statement to:

print b.center(6,u-)

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: if statement, with function inside it: if (t = Test()) == True:

2009-04-24 Thread Paul McGuire

On Apr 24, 5:00 am, GC-Martijn gcmart...@gmail.com wrote:
 Hello,

 I'm trying to do a if statement with a function inside it.
 I want to use that variable inside that if loop , without defining it.

 def Test():
     return 'Vla'

 I searching something like this:

 if (t = Test()) == 'Vla':
     print t # Vla


Here is a thread from 3 weeks ago on this very topic, with a couple of
proposed solutions.

http://groups.google.com/group/comp.lang.python/browse_frm/thread/9f8e79fa28d69905/e934c73ee3c2dbc2?hl=enq=

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

ANN: pyparsing 1.5.2 released!

2009-04-20 Thread Paul McGuire

Well, it has been about 6 months since the release of pyparsing 1.5.1,
and there have been no new functional enhancements to pyparsing.  I
take
this as a further sign that pyparsing is reaching a development/
maturity
plateau.

With the help of the pyparsing community, there are some
compatibility
upgrades, and few bug fixes.  The major news is compatibility with
Python 3 and IronPython 2.0.1.  Here is the high-level summary of
what's
new in pyparsing 1.5.2:

- Removed __slots__ declaration on ParseBaseException, for
  compatibility with IronPython 2.0.1.  Raised by David
  Lawler on the pyparsing wiki, thanks David!

- Added pyparsing_py3.py module, so that Python 3 users can use
  pyparsing by changing their pyparsing import statement to:

  import pyparsing_py3

  Thanks for help from Patrick Laban and his friend Geremy
  Condra on the pyparsing wiki.

- Fixed bug in SkipTo/failOn handling - caught by eagle eye
  cpennington on the pyparsing wiki!

- Fixed second bug in SkipTo when using the ignore constructor
  argument, reported by Catherine Devlin, thanks!

- Fixed obscure bug reported by Eike Welk when using a class
  as a ParseAction with an errant __getitem__ method.

- Simplified exception stack traces when reporting parse
  exceptions back to caller of parseString or parseFile - thanks
  to a tip from Peter Otten on comp.lang.python.

- Changed behavior of scanString to avoid infinitely looping on
  expressions that match zero-length strings.  Prompted by a
  question posted by ellisonbg on the wiki.

- Enhanced classes that take a list of expressions (And, Or,
  MatchFirst, and Each) to accept generator expressions also.
  This can be useful when generating lists of alternative
  expressions, as in this case, where the user wanted to match
  any repetitions of '+', '*', '#', or '.', but not mixtures
  of them (that is, match '+++', but not '+-+'):

  codes = +*#.
  format = MatchFirst(Word(c) for c in codes)

  Based on a problem posed by Denis Spir on the Python tutor
  list.

- Added new example eval_arith.py, which extends the example
  simpleArith.py to actually evaluate the parsed expressions.


Download pyparsing 1.5.2 at http://sourceforge.net/projects/pyparsing/.
The pyparsing Wiki is at http://pyparsing.wikispaces.com

-- Paul


Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or.  No separate code-generation or external
files are required.  Pyparsing can be used in many cases in place of
regular expressions, with shorter learning curve and greater
readability and maintainability.  Pyparsing comes with a number of
parsing examples, including:
- Hello, World! (English, Korean, Greek, and Spanish(new))
- chemical formulas
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
- Python value string parser (lists, dicts, tuples, with nesting)
  (safe alternative to eval)
- HTML tag stripper
- S-expression parser
- macro substitution preprocessor
--
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html

ANN: pyparsing 1.5.2 released!

2009-04-20 Thread Paul McGuire

Well, it has been about 6 months since the release of pyparsing 1.5.1,
and there have been no new functional enhancements to pyparsing.  I
take
this as a further sign that pyparsing is reaching a development/
maturity
plateau.

With the help of the pyparsing community, there are some
compatibility
upgrades, and few bug fixes.  The major news is compatibility with
Python 3 and IronPython 2.0.1.  Here is the high-level summary of
what's
new in pyparsing 1.5.2:

- Removed __slots__ declaration on ParseBaseException, for
  compatibility with IronPython 2.0.1.  Raised by David
  Lawler on the pyparsing wiki, thanks David!

- Added pyparsing_py3.py module, so that Python 3 users can use
  pyparsing by changing their pyparsing import statement to:

  import pyparsing_py3

  Thanks for help from Patrick Laban and his friend Geremy
  Condra on the pyparsing wiki.

- Fixed bug in SkipTo/failOn handling - caught by eagle eye
  cpennington on the pyparsing wiki!

- Fixed second bug in SkipTo when using the ignore constructor
  argument, reported by Catherine Devlin, thanks!

- Fixed obscure bug reported by Eike Welk when using a class
  as a ParseAction with an errant __getitem__ method.

- Simplified exception stack traces when reporting parse
  exceptions back to caller of parseString or parseFile - thanks
  to a tip from Peter Otten on comp.lang.python.

- Changed behavior of scanString to avoid infinitely looping on
  expressions that match zero-length strings.  Prompted by a
  question posted by ellisonbg on the wiki.

- Enhanced classes that take a list of expressions (And, Or,
  MatchFirst, and Each) to accept generator expressions also.
  This can be useful when generating lists of alternative
  expressions, as in this case, where the user wanted to match
  any repetitions of '+', '*', '#', or '.', but not mixtures
  of them (that is, match '+++', but not '+-+'):

  codes = +*#.
  format = MatchFirst(Word(c) for c in codes)

  Based on a problem posed by Denis Spir on the Python tutor
  list.

- Added new example eval_arith.py, which extends the example
  simpleArith.py to actually evaluate the parsed expressions.


Download pyparsing 1.5.2 at http://sourceforge.net/projects/pyparsing/.
The pyparsing Wiki is at http://pyparsing.wikispaces.com

-- Paul


Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or.  No separate code-generation or external
files are required.  Pyparsing can be used in many cases in place of
regular expressions, with shorter learning curve and greater
readability and maintainability.  Pyparsing comes with a number of
parsing examples, including:
- Hello, World! (English, Korean, Greek, and Spanish(new))
- chemical formulas
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
- Python value string parser (lists, dicts, tuples, with nesting)
  (safe alternative to eval)
- HTML tag stripper
- S-expression parser
- macro substitution preprocessor
--
http://mail.python.org/mailman/listinfo/python-list

Re: Help improve program for parsing simple rules

2009-04-17 Thread Paul McGuire

On Apr 16, 10:57 am, prueba...@latinmail.com wrote:
 Another interesting task for those that are looking for some
 interesting problem:
 I inherited some rule system that checks for programmers program
 outputs that to be ported: given some simple rules and the values it
 has to determine if the program is still working correctly and give
 the details of what the values are. If you have a better idea of how
 to do this kind of parsing please chime in. I am using tokenize but
 that might be more complex than it needs to be. This is what I have
 come up so far:

I've been meaning to expand on pyparsing's simpleArith.py example for
a while, to include the evaluation of the parsed tokens.  Here is the
online version, http://pyparsing.wikispaces.com/file/view/eval_arith.py,
it will be included in version 1.5.2 (coming shortly).  I took the
liberty of including your rule set as a list of embedded test cases.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: Help improve program for parsing simple rules

2009-04-17 Thread Paul McGuire

On Apr 17, 10:43 am, John Machin sjmac...@lexicon.net wrote:

 I don't see how it can handle the chained relop in the last two
 testcases e. g. '0.00 LE A LE 4.00' -- unless relops are chained by
 default in your parser.


John -

First of all, to respect precedence of operations, higher level
precedences are parsed and grouped first.  If you left off the parse
actions and just printed out the parse tree created by the example
(using asList()), for A + B * C you would get ['A', '+', [ 'B', '*',
'C' ]].  If you expand that test case to A + B * C + D, you would
get ['A', '+', [ 'B', '*', 'C' ], '+', 'D' ].  This is counter to the
conventional infix parser that would create [['A', '+', [ 'B', '*',
'C' ]], '+', 'D' ], in which binary operators typicaly return
'operand' 'operator' 'operand' triples, and either operand might be a
nested parse tree.

As it happens, when using pyparsing's operatorPrecedence helper, *all*
binary operators at the same precedence level are actually parsed in a
single chain.

This is why you see this logic in EvalAddOp.eval:

def eval(self):
sum = self.value[0].eval()
for op,val in operatorOperands(self.value[1:]):
if op == '+':
sum += val.eval()
if op == '-':
sum -= val.eval()
return sum

operatorOperands is a little generator that returns operator-operand
pairs, beginning at the second (that is, the 1th) token in the
list.  You can't just do the simple evaluation of operand1 operator
operand2, you have to build up the sum by first evaluating operand1,
and then iterating over the operator-operand pairs in the rest of the
list.  Same thing for the muliplication operators.

For the comparison operators, things are a little more involved.
operand1 operator1 operand2 operator2 operand3 (as in 0.00 LE A LE
4.00) has to evaluate as

op1 operator1 op2 AND op2 operator2 op3

So EvalComparisonOp's eval method looks like:

def eval(self):
val1 = self.value[0].eval()
ret = True
for op,val in operatorOperands(self.value[1:]):
fn = EvalComparisonOp.opMap[op]
val2 = val.eval()
ret = ret and fn(val1,val2)
val1 = val2
return ret

The first term is evaluated and stored in val1.  Then each
comparisonop-operand pair is extracted, the operand is eval()'ed and
stored in val2, and the comparison method that is mapped to the
comparisonop is called using val1 and val2.  Then, to move on to do
the next comparison, val2 is stored into val1, and the we iterate to
the next comparison-operand pair.  In fact, not only does this handle
0.00 LE A LE 4.00, but it could also evaluate 0.00 LE A LE 4.00 LE
E  D.  (I see that I should actually do some short-circuiting here -
if ret is false after calling fn(val1,val2), I should just break out
at that point.  I'll have that fixed in the online version shortly.)

-- Paul

--
http://mail.python.org/mailman/listinfo/python-list

Re: Help improve program for parsing simple rules

2009-04-17 Thread Paul McGuire

On Apr 17, 1:26 pm, Aaron Brady castiro...@gmail.com wrote:
 Hi, not to offend; I don't know your background.  

Courtesy on Usenet!!!  I'm going to go buy a lottery ticket!

Not to worry, I'm a big boy.  People have even called my baby ugly,
and I manage to keep my blood pressure under control.

 One thing I like
 about Python is it and the docs are careful about short-circuiting
 conditions.  ISTR that C left some of those details up to the compiler
 at one point.

  def f():

 ...     print( 'in f' )
 ...     return 10
 ... 0f()20

 in f
 True 0f() and f()20

 in f
 in f
 True

 Therefore, if op{n} has side effects, 'op1 operator1 op2 AND op2
 operator2 op3' is not equivalent to 'op1 optor1 op2 optor2 op3'.

Interesting point, but I don't remember that A  B  C is valid C
syntax, are you perhaps thinking of a different language?

By luck, my implementation of EvalComparisonOp.eval does in fact
capture the post-eval value of op2, so that if its evaluation caused
any side effects, they would not be repeated.

-- Paul

--
http://mail.python.org/mailman/listinfo/python-list

Re: question about xrange performance

2009-04-17 Thread Paul McGuire

On Apr 17, 1:39 pm, _wolf wolfgang.l...@gmail.com wrote:

 can it be that a simple diy-class outperforms a python built-in by a
 factor of 180? is there something i have done the wrong way?
 omissions, oversights? do other people get similar figures?

 cheers

I wouldn't say you are outperforming xrange until your class also
supports:

for i in xxrange( 1, 2 ):
# do something with i

Wouldn't be difficult, but you're not there yet.

And along the lines with MRAB's comments, xrange is not really
intended for in testing, it is there for iteration over a range
without constructing the list of range elements first, which one
notices right away when looping over xrange(1e8) vs. range(1e8).

Your observation is especially useful to keep in mind as Python 3 now
imbues range with xrange behavior, so if you have code that tests
blah in range(blee,bloo):, you will get similar poor results.)

And of course, you are cheating a bit with your xxrange in test,
since you aren't really verifying that the number is actually in the
given list, you are just testing against the extrema, and relying on
your in-built knowledge that xrange (as you are using it) contains all
the intermediate values.  Compare to testing with xrange(1,100,2) and
you'll find that 10 is *not* in this range, even though 1 = 10 
100.  (Extending xxrange to do this as well is also

One might wonder why you are even writing code to test for existence
in a range list, when blee = blah  bloo is obviously going to
outperform this kind of code.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: Help improve program for parsing simple rules

2009-04-17 Thread Paul McGuire

On Apr 17, 2:40 pm, prueba...@latinmail.com wrote:
 On Apr 17, 11:26 am, Paul McGuire pt...@austin.rr.com wrote:





  On Apr 16, 10:57 am, prueba...@latinmail.com wrote:

   Another interesting task for those that are looking for some
   interesting problem:
   I inherited some rule system that checks for programmers program
   outputs that to be ported: given some simple rules and the values it
   has to determine if the program is still working correctly and give
   the details of what the values are. If you have a better idea of how
   to do this kind of parsing please chime in. I am using tokenize but
   that might be more complex than it needs to be. This is what I have
   come up so far:

  I've been meaning to expand on pyparsing's simpleArith.py example for
  a while, to include the evaluation of the parsed tokens.  Here is the
  online version,http://pyparsing.wikispaces.com/file/view/eval_arith.py,
  it will be included in version 1.5.2 (coming shortly).  I took the
  liberty of including your rule set as a list of embedded test cases.

  -- Paul

 That is fine with me. I don't know how feasible it is for me to use
 pyparsing for this project considering I don't have admin access on
 the box that is eventually going to run this. To add insult to injury
 Python is in the version 2-3 transition (I really would like to push
 the admins to install 3.1 by the end of the year before the amount of
 code written by us gets any bigger) meaning that any third party
 library is an additional burden on the future upgrade. I can't
 remember if pyparsing is pure Python. If it is I might be able to
 include it alongside my code if it is not too big.- Hide quoted text -

 - Show quoted text -

It *is* pure Python, and consists of a single source file for the very
purpose of ease-of-inclusion.  A number of projects include their own
versions of pyparsing for version compatibility management, matplotlib
is one that comes to mind.

The upcoming version 1.5.2 download includes a pyparsing_py3.py file
for Python 3 compatibility, I should have that ready for users to
download *VERY SOON NOW*!

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: regex alternation problem

2009-04-17 Thread Paul McGuire

On Apr 17, 4:49 pm, Jesse Aldridge jessealdri...@gmail.com wrote:
 import re

 s1 = I am an american

 s2 = I am american an 

 for s in [s1, s2]:
     print re.findall( (am|an) , s)

 # Results:
 # ['am']
 # ['am', 'an']

 ---

 I want the results to be the same for each string.  What am I doing
 wrong?

Does it help if you expand your RE to its full expression, with '_'s
where the blanks go:

_am_ or _an_

Now look for these in I_am_an_american.  After the first _am_ is
processed, findall picks up at the leading 'a' of 'an', and there is
no leading blank, so no match.  If you search through
I_am_american_an_, both am and an have surrounding spaces, so
both match.

Instead of using explicit spaces, try using '\b' meaning word break:

 import re
 re.findall(r\b(am|an)\b, I am an american)
['am', 'an']
 re.findall(r\b(am|an)\b, I am american an)
['am', 'an']

-- Paul




Your find pattern includes (and consumes) a leading AND trailing space
around each word.  In the first string I am an american, there is a
leading and trailing space around am, but the trailing space for
am is the leading space for an, so  an 
--
http://mail.python.org/mailman/listinfo/python-list

Re: regex alternation problem

2009-04-17 Thread Paul McGuire

On Apr 17, 5:28 pm, Paul McGuire pt...@austin.rr.com wrote:
 -- Paul

 Your find pattern includes (and consumes) a leading AND trailing space
 around each word.  In the first string I am an american, there is a
 leading and trailing space around am, but the trailing space for
 am is the leading space for an, so  an - Hide quoted text -

Oops, sorry, ignore debris after sig...
--
http://mail.python.org/mailman/listinfo/python-list

Re: Automatically generating arithmetic operations for a subclass

2009-04-14 Thread Paul McGuire

On Apr 14, 4:09 am, Steven D'Aprano
ste...@remove.this.cybersource.com.au wrote:
 I have a subclass of int where I want all the standard arithmetic
 operators to return my subclass, but with no other differences:

 class MyInt(int):
     def __add__(self, other):
         return self.__class__(super(MyInt, self).__add__(other))
     # and so on for __mul__, __sub__, etc.

 My quick-and-dirty count of the __magic__ methods that need to be over-
 ridden comes to about 30. That's a fair chunk of unexciting boilerplate.


Something like this maybe?

def takesOneArg(fn):
try:
fn(1)
except TypeError:
return False
else:
return True

class MyInt(int): pass

template = MyInt.__%s__ = lambda self, other: self.__class__(super
(MyInt, self).__%s__(other))
fns = [fn for fn in dir(int) if fn.startswith('__') and takesOneArg
(getattr(1,fn))]
print fns
for fn in fns:
exec(template % (fn,fn))


Little harm in this usage of exec, since it is your own code that you
are running.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: safe eval of moderately simple math expressions

2009-04-11 Thread Paul McGuire

On Apr 11, 2:41 am, Aaron Brady castiro...@gmail.com wrote:

 Why do I get the feeling that the authors of 'pyparsing' are out of
 breath?


What kind of breathlessness do you mean?  I'm still breathing, last
time I checked.

The-rumors-of-my-demise-have-been-greatly-exaggerated'ly yours,
-- Paul


--
http://mail.python.org/mailman/listinfo/python-list

Re: safe eval of moderately simple math expressions

2009-04-09 Thread Paul McGuire

On Apr 9, 10:56 am, Joel Hedlund joel.hedl...@gmail.com wrote:
 Hi all!

 I'm writing a program that presents a lot of numbers to the user, and I
 want to let the user apply moderately simple arithmentics to these
 numbers.

Joel -

Take a look at the examples page on the pyparsing wiki (http://
pyparsing.wikispaces.com/Examples).  Look at the examples fourFn.py
and simpleArith.py for some expression parsers that you could extend
to support whatever math builtins you wish.  Since you would be doing
your own parsing and eval code, you could be sure that no dangerous
code was being run, just simple arithmetic.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: Best way to extract from regex in if statement

2009-04-04 Thread Paul McGuire

On Apr 3, 9:26 pm, Paul Rubin http://phr...@nospam.invalid wrote:
 bwgoudey bwgou...@gmail.com writes:
  elif re.match(^DATASET:\s*(.+) , line):
          m=re.match(^DATASET:\s*(.+) , line)
          print m.group(1))

 Sometimes I like to make a special class that saves the result:

   class Reg(object):   # illustrative code, not tested
      def match(self, pattern, line):
         self.result = re.match(pattern, line)
         return self.result

I took this a little further, *and* lightly tested it too.

Since this idiom makes repeated references to the input line, I added
that to the constructor of the matching class.

By using __call__, I made the created object callable, taking the RE
expression as its lone argument and returning a boolean indicating
match success or failure.  The result of the re.match call is saved in
self.matchresult.

By using __getattr__, the created object proxies for the results of
the re.match call.

I think the resulting code looks pretty close to the original C or
Perl idiom of cascading elif (c=re_expr_match(...)) blocks.

(I thought about cacheing previously seen REs, or adding support for
compiled REs instead of just strings - after all, this idiom usually
occurs in a loop while iterating of some large body of text.  It turns
out that the re module already caches previously compiled REs, so I
left my cacheing out in favor of that already being done in the std
lib.)

-- Paul

import re

class REmatcher(object):
def __init__(self,sourceline):
self.line = sourceline
def __call__(self, regexp):
self.matchresult = re.match(regexp, self.line)
self.success = self.matchresult is not None
return self.success
def __getattr__(self, attr):
return getattr(self.matchresult, attr)


This test:

test = \
ABC
123
xyzzy
Holy Hand Grenade
Take the pebble from my hand, Grasshopper


outfmt = '%s' is %s [%s]
for line in test.splitlines():
matchexpr = REmatcher(line)
if matchexpr(r\d+$):
print outfmt % (line, numeric, matchexpr.group())
elif matchexpr(r[a-z]+$):
print outfmt % (line, lowercase, matchexpr.group())
elif matchexpr(r[A-Z]+$):
print outfmt % (line, uppercase, matchexpr.group())
elif matchexpr(r([A-Z][a-z]*)(\s[A-Z][a-z]*)*$):
print outfmt % (line, a proper word or phrase,
matchexpr.group())
else:
print outfmt % (line, something completely different, ...)

Produces:
'ABC' is uppercase [ABC]
'123' is numeric [123]
'xyzzy' is lowercase [xyzzy]
'Holy Hand Grenade' is a proper word or phrase [Holy Hand Grenade]
'Take the pebble from my hand, Grasshopper' is something completely
different [...]
--
http://mail.python.org/mailman/listinfo/python-list

Re: python needs leaning stuff from other language

2009-04-04 Thread Paul McGuire

On Apr 3, 11:48 pm, Tim Wintle tim.win...@teamrubber.com wrote:
 del mylist[:]
 * or *
 mylist[:] = []
 * or *
 mylist = []

 which, although semantically similar are different as far as the
 interpreter are concerned (since two of them create a new list):


Only the last item creates a new list of any consequence.  The first
two retain the original list and delete or discard the items in it.  A
temporary list gets created in the 2nd option, and is then used to
assign new contents to mylist's [:] slice - so yes, technically, a new
list *is* created in the case of this option. But mylist does not get
bound to it as in the 3rd case.  In case 2, mylist's binding is
unchanged, and the temporary list gets GC'ed almost immediately.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: List of paths

2009-04-01 Thread Paul McGuire

On Apr 1, 3:57 am, Nico Grubert nicogrub...@gmail.com wrote:
 Dear Python developers

 I have the following (sorted) list.
 ['/notebook',
   '/notebook/mac',
   '/notebook/mac/macbook',
   '/notebook/mac/macbookpro',
   '/notebook/pc',
   '/notebook/pc/lenovo',
   '/notebook/pc/hp',
   '/notebook/pc/sony',
   '/desktop',
   '/desktop/pc/dell',
   '/desktop/mac/imac',
   '/server/hp/proliant',
   '/server/hp/proliant/385',
   '/server/hp/proliant/585'
 ]

 I want to remove all paths x from the list if there is a path y in the
 list which is part of x so y.startswith(x) is true.

 The list I want to have is:
 ['/notebook', '/desktop', '/server/hp/proliant']

 Any idea how I can do this in Python?

 Thanks in advance
 Nico

paths = ['/notebook',
  '/notebook/mac',
  '/notebook/mac/macbook',
  '/notebook/mac/macbookpro',
  '/notebook/pc',
  '/notebook/pc/lenovo',
  '/notebook/pc/hp',
  '/notebook/pc/sony',
  '/desktop',
  '/desktop/pc/dell',
  '/desktop/mac/imac',
  '/server/hp/proliant',
  '/server/hp/proliant/385',
  '/server/hp/proliant/585'
]

seen = set()
basepaths = [ seen.add(s) or s for s in paths
if not any(s.startswith(ss) for ss in seen) ]

gives:

['/notebook', '/desktop', '/server/hp/proliant']

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1263 matches

Mail list logo