Re: regexp weirdness (bug?)

2005-04-06 Thread Andr Malo
* Fredrik Lundh wrote:

 Sergey Schetinin wrote:
 
 it's line #159 here, but it did work! thanks. so it IS a bug?
 
 sure looks like one.  please report it here:
 
 http://sourceforge.net/tracker/?group_id=5470atid=105470

done :)

nd
-- 
Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine
beiden Gefhrten nicht zu zhlen brauchte -- Karl May, Winnetou III

Im Westen was neues: http://pub.perlig.de/books.html#apache2
--
http://mail.python.org/mailman/listinfo/python-list


Re: regexp weirdness (bug?)

2005-04-05 Thread Andr Malo
* Sergey Schetinin wrote:

 Here's the session log:
 
 _re_pair=(?(plus).|-)
 _re1=((?Pplus\+)+_re_pair)
 _re2=(((?Pplus\+))+_re_pair)
 _re3=((?:(?Pplus\+))+_re_pair)
 _re4=(%s)%_re3
 import re
 print [re.match(_re, +a) and 'match' for _re in [_re1, _re2,
 _re3, _re4]]
 ['match', None, 'match', None]
 
 this is not the supposed behaivour. all theese patterns should match,
 right?

No, I suppose they shouldn't compile.
_re_pair should be (?P=plus).

nd

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-05 Thread Andr Malo
* lothar wrote:

 a non-greedy match - as implicitly defined in the documentation - is a
 match in which there is no proper substring in the return which could also
 match the regex.

Your argumentation is starting at the wrong place. The documentation doesn't
define the bahviour, it tries to describe it (wrongly, as said).

 you are skirting the issue as to why a matcher should not be able to
 return a non-greedy match.
 
 there is no theoretical reason why it can not be done.

I'm sure, you can explain the theory behind regexps in order to substantiate
your statement.

*shrug*

nd
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-05 Thread Andr Malo
* lothar wrote:

As already said by Georg, regexes are the wrong tool for such tasks, but
anyway...

 give an re to find every innermost table element:

table(?:\s[^]*)?[^]*(?:(?!/table|table(?:\s[^]*)?)[^]*)*/table

 give an re to find every pre element directly followed by an a
 element:

pre(?:\s[^]*)?[^]*(?:(?!/pre|pre(?:\s[^]*)?)[^]*)*/pre(?=a[\s])

The are written more common than needed for your samples. Depending on the
data to be expected, they can be written a bit shorter, but this is left as
an exercise for the reader.

HTH, nd
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: regexp weirdness (bug?)

2005-04-05 Thread Andr Malo
* Fredrik Lundh wrote:

 Andr Malo wrote:
 
 No, I suppose they shouldn't compile.
 _re_pair should be (?P=plus).
 
 the (?(NAME)RE|RE) form was added in 2.4.
 
 looks like a bug to me; the plus group is set to + in all four cases,
 so the
 final pattern should match.  but I might be missing something...

Uh, yeah, I'm not up to date and still using 2.3 ;)
However, I've investigated a bit in the sre code and found this in
sre_compile.py:

   150  elif op is GROUPREF_EXISTS:
   151  emit(OPCODES[op])
   152  emit((av[0]-1)*2)
   153  skipyes = _len(code); emit(0)
   154  _compile(code, av[1], flags)

Sergey, can you please change line 152 in your sre_compile.py to

emit(av[0]-1)

and see if it works then? (The matcher also multiplies with 2, so this is
most likely the bug). But it's just theoretically, since I don't have
python 2.4 installed for a test :)

nd
--
http://mail.python.org/mailman/listinfo/python-list


Re: re module non-greedy matches broken

2005-04-03 Thread Andr Malo
* lothar wrote:

 re:
 4.2.1 Regular Expression Syntax
 http://docs.python.org/lib/re-syntax.html
 
   *?, +?, ??
   Adding ? after the qualifier makes it perform the match in non-greedy
   or
 minimal fashion; as few characters as possible will be matched.
 
 the regular expression module fails to perform non-greedy matches as
 described in the documentation: more than as few characters as possible
 are matched.
 
 this is a bug and it needs to be fixed.

The documentation is just incomplete. Non-greedy regexps still start
matching the leftmost. So instead the longest of the leftmost you get the
shortest of the leftmost. One may consider this as a documentation bug,
yes.

nd
-- 
# Andr Malo, http://www.perlig.de/ #
--
http://mail.python.org/mailman/listinfo/python-list


Re: testing -- what to do for testing code with behaviour dependant upon which files exist?

2005-04-02 Thread Andr Malo
* Brian van den Broek wrote:

 The relevant part of the validation method code looks like:
 
  # self.universe_files is a list of file paths
  non_existent_files = [ x for x in self.universe_files if
 not os.path.isfile(x) ]
  if non_existent_files:
  raise Files_dont_existError, non_existent_files
 
 I can test the custom error class just fine, but I don't see how to
 test the validation method itself.

The logic is simple -- you don't want to test os.path.isfile, so mock it.
Just encapsulate the os.path.isfile call in an own method, which can be
overridden by your test.

nd
-- 
# Andr Malo, http://pub.perlig.de/ #
--
http://mail.python.org/mailman/listinfo/python-list