[issue38119] resource tracker destroys shared memory segments when other processes should still have valid access

2021-04-04 Thread Steve Newcomb


Steve Newcomb  added the comment:

Sometimes a leak is exactly what's wanted, i.e. a standing block of shared 
memory that allows sharing processes come and go ad libitum.  I mention this 
because I haven't seen anyone mention it explicitly.  

While turicas's monkeypatch covers the use case in which a manually-named block 
of shared memory is intended to remain indefinitely, it would be best if future 
versions of shared_memory allowed for such a use case, too.  It shouldn't be 
necessary to monkeypatch in order to have a standing block of shared memory 
remain ready and waiting even when nobody's using it.

--
nosy: +steve.newcomb

___
Python tracker 
<https://bugs.python.org/issue38119>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43704] ShareableList() raises TypeError when passing "name" keyword

2021-04-04 Thread Steve Newcomb


Steve Newcomb  added the comment:

The documentation, which needs some improvement, I think.  I'll suggest some 
improvements when I understand things a little better.

For the record, it turned out that SharedMemoryManager was irrelevant, as were 
sockets.  That makes sense since memory can't be shared across a network, but 
the doc nevertheless implies that the socket interface is available.  I don't 
see why.  ShareableList is a class in shared_memory.py and is a function name, 
despite its capitalization, in managers.py, with a different signature.  That's 
massively confusing in combination with the foregoing.  In retrospect, I should 
have started by paying most of my attention to the documentation's numpy 
example, even though numpy is irrelevant to my problem and the example was more 
work to sort through than the other, simpler examples.

With all that resolved in my mind, I immediately ran aground on 
https://bugs.python.org/issue38119.  In that discussion, Guido notes that 
[automatic] garbage collection is hard, and I would add that automatic garbage 
collection is especially hard to deal with when it's not wanted.  I'm attaching 
the script I wrote in order to satisfy myself that turicas's monkeypatch (see 
issue38119) allows me to create a standing block of shared memory and to unlink 
it only when I want to.

--
resolution:  -> not a bug
Added file: https://bugs.python.org/file49934/5.py

___
Python tracker 
<https://bugs.python.org/issue43704>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43704] ShareableList() raises TypeError when passing "name" keyword

2021-04-02 Thread Steve Newcomb


Steve Newcomb  added the comment:

And again with 3.8.8, with the same result.

I also tried just using the same shared memory manager again within the same 
process, just as shown in the documentation.  Same result:

TypeError: ShareableList() got an unexpected keyword argument 'name'

I must be missing something too obvious for me to see it?  Or has ShareableList 
not worked at least since 3.8.8?

--

___
Python tracker 
<https://bugs.python.org/issue43704>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43704] ShareableList() raises TypeError when passing "name" keyword

2021-04-02 Thread Steve Newcomb


Steve Newcomb  added the comment:

I just tried the same thing on Python-3.10.0a6.  Same behavior.

--

___
Python tracker 
<https://bugs.python.org/issue43704>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43704] ShareableList() raises TypeError when passing "name" keyword

2021-04-02 Thread Steve Newcomb


New submission from Steve Newcomb :

This is especially weird because the Python source code says:

class ShareableList:
[...]
   def __init__(self, sequence=None, *, name=None):

--
components: Library (Lib)
files: shareableListBug.txt
messages: 390080
nosy: steve.newcomb
priority: normal
severity: normal
status: open
title: ShareableList() raises TypeError when passing "name" keyword
type: behavior
versions: Python 3.9
Added file: https://bugs.python.org/file49928/shareableListBug.txt

___
Python tracker 
<https://bugs.python.org/issue43704>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35496] left-to-right violation in match order

2018-12-16 Thread Steve Newcomb


Steve Newcomb  added the comment:

I'm very grateful for your time and attention, and sorry to have distracted 
you.  You're correct when you say:  

Steven D'Aprano: ...the rightmost alternative matches from position 1 of the 
text, while the leftmost alternative doesn't match until position 8. So 
starting from position 0, the IPV6 check matches first, and so wins.

I see now that what I was trying to do is simply not possible. I was looking 
for a way to do a kind of hat trick: to keep a matched substring (":::") 
out of matchObject.group(0).  I guess I just don't get to do that.  

It would be a nice feature to add: a "consume-and-forget" or "suppress" 
extension group type. Non-capturing groups forget about themselves, but they 
don't suppress their matched contents.  It's a nice thing to be able to do 
because some software accepts regular expressions as configuration items but 
doesn't allow configuration of selection among the groups that may appear 
within it.  (I admit there aren't many occasions when suppression of substrings 
from group(0) is really necessary, but I think they do occur.)

--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue35496>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35496] left-to-right violation in match order

2018-12-14 Thread Steve Newcomb


New submission from Steve Newcomb :

Documentation for the re module insists that matches are made left-to-right 
within the alternatives delimited by an "or* | group.  I seem to have found a 
case where the rightmost alternative is matched unless it (and only it) is 
commented out.  See attached script, which is self-explanatory.

--
files: left-to-right_violation_in_python3_re_match.py
messages: 331838
nosy: steve.newcomb
priority: normal
severity: normal
status: open
title: left-to-right violation in match order
type: behavior
versions: Python 3.6
Added file: 
https://bugs.python.org/file47997/left-to-right_violation_in_python3_re_match.py

___
Python tracker 
<https://bugs.python.org/issue35496>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28490] inappropriate OS.Error "Invalid cross-device link"

2016-10-20 Thread Steve Newcomb

Steve Newcomb added the comment:

Oops.  My bad.  There was a symlink in one of those paths.  The message is 
valid.

--
resolution:  -> not a bug
status: open -> closed

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28490>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28490] inappropriate OS.Error "Invalid cross-device link"

2016-10-20 Thread Steve Newcomb

New submission from Steve Newcomb:

os.rename() raises OSError with a misleading message saying "cross-device" when 
no cross-device activity is involved.  

Here, running on Ubuntu 16.04.1 using and ext4 filesystem, both filepaths are 
in the same filesystem, and the error is evidently due to the fact that a file 
already exists at the target path:

(Pdb) os.path.isfile( 
'/persist/nobackup/backupDisks/d38BasLijPupBak/d38-backup.20161020/d38-_,.,_home2_,.,_rack/.Xauthority')
True
(Pdb) os.path.isfile( 
'/persist/nobackup/backupDisks/d38BasLijPupBak/d38-20161020/home2/rack/.Xauthority')
True
(Pdb) os.rename( 
'/persist/nobackup/backupDisks/d38BasLijPupBak/d38-backup.20161020/d38-_,.,_home2_,.,_rack/.Xauthority',
 
'/persist/nobackup/backupDisks/d38BasLijPupBak/d38-20161020/home2/rack/.Xauthor\
ity')
*** OSError: [Errno 18] Invalid cross-device link: 
'/persist/nobackup/backupDisks/d38BasLijPupBak/d38-backup.20161020/d38-_,.,_home2_,.,_rack/.Xauthority'
 -> '/persist/nobackup/backupDisks/d38BasLijPup\
Bak/d38-20161020/home2/rack/.Xauthority'

--
components: IO
messages: 279061
nosy: steve.newcomb
priority: normal
severity: normal
status: open
title: inappropriate OS.Error "Invalid cross-device link"
versions: Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28490>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue11632] difflib.unified_diff loses context

2016-10-18 Thread Steve Newcomb

Steve Newcomb added the comment:

Context reporting is still buggy in Python 3.5.2:

>>> [ x for x in difflib.unified_diff( "'on'\n", "'on'\n\n\n")]
['--- \n', '+++ \n', '@@ -3,3 +3,5 @@\n', ' n', " '", ' \n', '+\n', '+\n']
>>> import sys
>>> sys.version
'3.5.2 (default, Sep 10 2016, 08:21:44) \n[GCC 5.4.0 20160609]'
>>> 
(compiled under Ubuntu 16.04.1 LTS)

--
nosy: +steve.newcomb

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11632>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27898] regexp performance degradation between 2.7.6 and 2.7.12

2016-09-02 Thread Steve Newcomb

Steve Newcomb added the comment:

On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
> Raymond Hettinger added the comment:
>
> It would be helpful if you ... make a small set of regular expressions that 
> demonstrate the performance regression.
>
Done.  Attachments:

test.py : Code that exercises re.sub() and outputs a profile report.

test_output_2.7.6.txt : Output of test.py under Python 2.7.6.

test_output_2.7.12.txt : Output of test.py under Python 2.7.12.

p17.188.htm -- test data: public information from the U.S. Internal 
Revenue Service.

Equivalent hardware was used in both cases.

The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 
2.7.6's.  It's a significant difference, but...

...it was not the dramatic degradation I expected to find in this 
exercise.  Therefore I attempted to tease what I was looking for out of 
the profile stats I already uploaded to this site, made from actual 
production runs.  My attempts are all found in an hg repository that can 
be downloaded from 
sftp://s...@coolheads.com//files/py-re-perform-276-2712 using password 
bysIe20H .

I do not feel the latter work took me where I wanted to go, and I think 
the reason is that, at least for purposes of our application, Python 
2.7.12 has been so extensively refactored since Python 2.7.6.  So it's 
an apples-to-oranges comparison, apparently.  Still, the performance 
difference for re.sub() is quite dramatic , and re.sub() is the only 
comparable function whose performance dramatically worsened: in our 
application, 2.7.12's re.sub() takes 3.04 times as long as 2.7.6's.

The good news, of course, is that by and large the performance of the 
other *comparable* functions largely improved, often dramatically.  But 
at least in our application, it doesn't come close to making up for the 
degradation in re.sub().

My by-the-gut bottom line: somebody who really knows the re module 
should take a deep look at re.sub().  Why would re.sub(), unlike all 
others, take so much longer to run, while *every* other function in the 
re module get (often much) faster?  It feels like there's a bug 
somewhere in re.sub().

Steve Newcomb

--
Added file: http://bugs.python.org/file44335/test.py
Added file: http://bugs.python.org/file44336/test_output_2.7.6.txt
Added file: http://bugs.python.org/file44337/p17-188.htm
Added file: http://bugs.python.org/file44338/test_output_2.7.12.txt

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27898>
___#!/usr/bin/env python2

import codecs, profile, os, re, sys

hrefRE = re.compile(
''.join(
[
r'href=',
r'(?P["\'])',
r'(?P',
r'.*?',
r')',
r'(?=quote)',
],
),
)
###
onePathSegmentMS = ''.join(
[
r'(?P<_pathSeg>',
r'(',
r'/?',
r'(',
r'(?!',
r'[ \t\r\n]+',
r'$',
r')',
u'[^%s]' % ( re.escape( r'/?#')),
r')+',
r'|',
r'/',
r')',
r')',
],
)
onePathSegmentRE = re.compile( onePathSegmentMS)

###
uriMS = r''.join(
(
r'(?P',  ## leading whitespace is OK and ignorable; 
see http://dev.w3.org/html5/spec-LC/urls.html
r'[ \t\r\n]+',
r')?',
r'(',
r'(?P',
r'https?',  
r')',
r':\/{0,2}',   ## accounts for encountered error: only 0 or 1 slash 
instead of 2
r')?',
r'(?P',
r'(?P',
r'(',
r'(?P<_userinfo>',
r'[^%s]+' % ( re.escape( r'@/[:?#')),
r')',
re.escape( '@'),
r')?',
r')',
r'(?P',
r'(?P',
re.escape( r'['),
r')?',
r'(',
r'(?P',
r'(',
r'[0-9]{1,3}%s' % ( re.escape( r'.')),
r'){3}',
r'[0-9]{1,3}',
r')',
r'|',
r'(?P',
r'(',
r'[0-9A-Fa-f]{0,4}%s' % ( re.escape( ':')),
r'){1,7}',
r'[0-9A-Fa-f]{0,4}',
r')',
r'|',
r'(?P',
r'(',
r'[^%s]+?' % ( re.escape( r']:/?#')),  ## this may 
have dots
r'\.',
r')+',
r'(?P',  ## top-level domain, e.g. &quo

[issue27898] regexp performance degradation between 2.7.6 and 2.7.12

2016-09-01 Thread Steve Newcomb

Steve Newcomb added the comment:

On 09/01/2016 05:01 PM, Steve Newcomb wrote:
>
>> The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 
>> 2.7.6's.  It's a significant difference, but...
>>
>> ...it was not the dramatic degradation I expected to find in this 
>> exercise.
On second (third?) thought, the degree of degradation could easily 
depend on the source data being processed.  Maybe test.py does, in fact, 
demonstrate the problem, but the test data I used (p17-118.htm) do not 
demonstrate a terribly severe case of the problem.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27898>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27898] regexp performance degradation between 2.7.6 and 2.7.12

2016-09-01 Thread Steve Newcomb

Steve Newcomb added the comment:

Oops.  The correct url is sftp://coolheads.com/files/py-re-perform-276v2712/

On 09/01/2016 04:52 PM, Steve Newcomb wrote:
> On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
>> Raymond Hettinger added the comment:
>>
>> It would be helpful if you ... make a small set of regular 
>> expressions that demonstrate the performance regression.
>>
> Done.  Attachments:
>
> test.py : Code that exercises re.sub() and outputs a profile report.
>
> test_output_2.7.6.txt : Output of test.py under Python 2.7.6.
>
> test_output_2.7.12.txt : Output of test.py under Python 2.7.12.
>
> p17.188.htm -- test data: public information from the U.S. Internal 
> Revenue Service.
>
> Equivalent hardware was used in both cases.
>
> The outputs show that 2.7.12's re.sub() takes 1.2 times as long as 
> 2.7.6's.  It's a significant difference, but...
>
> ...it was not the dramatic degradation I expected to find in this 
> exercise.  Therefore I attempted to tease what I was looking for out 
> of the profile stats I already uploaded to this site, made from actual 
> production runs.  My attempts are all found in an hg repository that 
> can be downloaded from 
> sftp://s...@coolheads.com//files/py-re-perform-276-2712 using password 
> bysIe20H .
>
> I do not feel the latter work took me where I wanted to go, and I 
> think the reason is that, at least for purposes of our application, 
> Python 2.7.12 has been so extensively refactored since Python 2.7.6.  
> So it's an apples-to-oranges comparison, apparently.  Still, the 
> performance difference for re.sub() is quite dramatic , and re.sub() 
> is the only comparable function whose performance dramatically 
> worsened: in our application, 2.7.12's re.sub() takes 3.04 times as 
> long as 2.7.6's.
>
> The good news, of course, is that by and large the performance of the 
> other *comparable* functions largely improved, often dramatically.  
> But at least in our application, it doesn't come close to making up 
> for the degradation in re.sub().
>
> My by-the-gut bottom line: somebody who really knows the re module 
> should take a deep look at re.sub().  Why would re.sub(), unlike all 
> others, take so much longer to run, while *every* other function in 
> the re module get (often much) faster?  It feels like there's a bug 
> somewhere in re.sub().
>
> Steve Newcomb
>

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27898>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27898] regexp performance degradation between 2.7.6 and 2.7.12

2016-08-30 Thread Steve Newcomb

Steve Newcomb added the comment:

On 08/30/2016 01:24 PM, Serhiy Storchaka wrote:
> Serhiy Storchaka added the comment:
>
> According to your profile results all re functions are 2.5-4 times faster 
> under 2.7.12 than under 2.7.6. May be I misinterpret it?
I can't explain the profiler's report.  I'm kind of glad that you, too, 
find it baffling.  Is it possible that the profiler doesn't actually 
work predictably in the multiprocessing context?  If so, one thing I can 
*easily* do is to disable multiprocessing in that code and see what the 
profiler reports are then.  It will take all night, but I'm beginning to 
think it would be worthwhile, because it might point the finger of blame 
at either the multiprocessing module or the re module, but not both at once.

(I originally provided a "disable multiprocessing" capability in that 
code in order to use the Python debugger with it.  It would kind of make 
sense if the profiler had limitations similar to those of the debugger.)
>
> Note that 96-99% of time (2847.099 of 2980.718 seconds under 2.7.6 and 
> 4474.890 of 4519.872 seconds under 2.7.12) is spent in posix.waitpid. The 
> rest of time is larger under 2.7.6 (2980.718 - 2847.099 = 133.619) than under 
> 2.7.12 (4519.872 - 4474.890 = 44.982).
Yeah, I'm beginning to wonder if those strange statistics, too, are 
artifacts of using a single-process profiler in a multiprocessing context.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27898>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27898] regexp performance degradation between 2.7.6 and 2.7.12

2016-08-30 Thread Steve Newcomb

Steve Newcomb added the comment:

On 08/30/2016 12:46 PM, Raymond Hettinger wrote:
> Raymond Hettinger added the comment:
>
> It would be helpful if you could run "hg bisect" with your set-up to isolate 
> the change that causes the problem.
I don't think I understand you.  There's no difference in the Python 
code we're using in both cases.  The only differences, AFAIK, are in the 
Python interpreter and in the Linux distribution.  I'm not qualified to 
analyze the differences in the latter items.
>Alternatively, make a small set of regular expressions that demonstrate 
> the performance regression.
It will be hard to do that, because the code is so complex, and because 
debugging in the multiprocessing context is so hairy. Still, it's the 
only approach I can think of, too.  Sigh.  I'm thinking about it.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27898>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27898] regexp performance degradation between 2.7.6 and 2.7.12

2016-08-30 Thread Steve Newcomb

New submission from Steve Newcomb:

Our most regular-expression-processing-intensive Python 2.7 code takes 2.5x 
more execution time in 2.7.12 than it did in 2.7.6.  I discovered this after 
upgrading from Ubuntu 14.04 to Ubuntu 16.04.  Basically this code runs 
thousands of compiled regular expressions on thousands of texts.  Both the 
multiprocessing module and the re module are heavily used.

See attached profiler outputs, which look quite different in several respects.  
I used the profiling module to profile the same Python code, processing the 
same data, using the same hardware, under both Ubuntu 14.04 (Python 2.7.6) and 
Ubuntu 16.04 (Python 2.7.12).  

It is striking, for example, that cPickle.load appears so prominently in the 
2.7.12 profile -- a fact which appears to implicate the multiprocessing module 
somehow.  But I suspect that the re module is more likely the main source of 
the problem, because the execution times of other production steps -- steps 
that do not call the multiprocessing module -- also appear to be extended to a 
degree that is roughly proportional to the amount of regular expression 
processing done in those other steps.

I will happily provide any further information I can.  Any insights about this 
surprisingly severe performance degradation would be welcome.

--
files: profiles_2.7.6_vs_2.7.12
messages: 273932
nosy: steve.newcomb
priority: normal
severity: normal
status: open
title: regexp performance degradation between 2.7.6 and 2.7.12
Added file: http://bugs.python.org/file44277/profiles_2.7.6_vs_2.7.12

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27898>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15956] backreference to named group does not work

2012-09-18 Thread Steve Newcomb

Steve Newcomb added the comment:

 But this way exists: (?P=startquote) is what you want.

I know how I missed it: I searched for backref in the documentation.  I did 
not find it in the discussion of the pattern language, because that word does 
not appear where ?P= is discussed.

 contributions are welcome.

See attached brief patch for the documentation.  It changes the example, adds a 
table of the three processing contexts in which named groups can be referenced, 
and accounts for users who, like me, may search for backref.  (I tested 
everything.  I think it's correct.)

Thanks again for the advice, Amaury.

--
Added file: http://bugs.python.org/file27217/patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15956
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15956] backreference to named group does not work

2012-09-18 Thread Steve Newcomb

Steve Newcomb added the comment:

 I preferred the previous example id because it's not obvious what 
 \042\047 is. 

Yeah, but the example I wrote has an in-pattern backreference and a real reason 
to use one.

In the attached patch, I have changed [\042\047] to [\'\].  That's certainly 
clearer for everyone who has not memorized the ASCII table in octal!  (Oops.)

 And a bullet list would be less heavyweight IMO.

Well... I rejected that choice because there would be no clarifying columnar 
distinction between contexts and syntaxes.  Personally, I think the table is 
clearer.  It makes it easier for users to find what they need know.

(Also please use diff -u; without context, the patch cannot be applied 
automatically)

Oops.  Attached.

--
Added file: http://bugs.python.org/file27219/patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15956
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15956] backreference to named group does not work

2012-09-17 Thread Steve Newcomb

New submission from Steve Newcomb:

The '\\gstartquote' in the below does not work:

 repr( re.compile( '!ENTITY[ \\011\\012\\015]+\\%[ 
 \\011\\012\\015]*(?PentityName[A-Za-z][A-Za-z0-9\\.\\-\\_\\:]*)[ 
 \\011\\012\\015]*(?Pstartquote[\\042\\047])(?PentityText.+?)\\gstartquote[
  \\011\\012\\015]*\\', re.IGNORECASE | re.DOTALL).search( '!ENTITY % 
 m.mixedContent ( #PCDATA | i | b)'))
'None'

In the following, the '\\gstartquote' has been replaced by '\\2'.  It works.

 repr( re.compile( '!ENTITY[ \\011\\012\\015]+\\%[ 
 \\011\\012\\015]*(?PentityName[A-Za-z][A-Za-z0-9\\.\\-\\_\\:]*)[ 
 \\011\\012\\015]*(?Pstartquote[\\042\\047])(?PentityText.+?)\\2[ 
 \\011\\012\\015]*\\', re.IGNORECASE | re.DOTALL).search( '!ENTITY % 
 m.mixedContent ( #PCDATA | i | b)'))
'_sre.SRE_Match object at 0x7f77503d1918'

Either this feature is broken or the re module documentation is somehow 
misleading me.

(Yes, I know there is an XML error in the above.  That's because it's SGML.)

--
components: Regular Expressions
messages: 170605
nosy: ezio.melotti, mrabarnett, steve.newcomb
priority: normal
severity: normal
status: open
title: backreference to named group does not work
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15956
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15956] backreference to named group does not work

2012-09-17 Thread Steve Newcomb

Steve Newcomb added the comment:

I have re-read the documentation on re.sub().  Even now, now that I understand 
that the \ggroupname syntax applies to the repl argument only, I cannot see 
how the documentation can be understood that way.  The paragraph in which the 
explanation of the \ggroupname syntax appears does not mention the repl 
argument by name, and neither does the preceding paragraph. 

The paragraph before the preceding paragraph is about the pattern argument, not 
the repl argument, and it consists entirely of the words, The pattern may be a 
string or an RE object. 

So I don't see how the explanation of the \ggroupname syntax can be 
understood as applying only to the repl argument, even though you have now 
informed me that that's the case (which is helpful to know -- thanks!).  
Indeed, the paragraph that explains the \ggroupname syntax *still* appears to 
me to be discussing the pattern argument.  And it even mentions the ?Pname 
syntax, which can only appear in a pattern, not in a repl, in the very same 
sentence as the \ggroupname syntax, even though those two syntactic features 
appear in *different* expression languages, and no single expression language 
has both of them.  

So there is no clear indication that it is discussing two different expression 
languages.  Indeed, another syntactic feature, \groupnumber, also discussed in 
the same paragraph, *is* found in both expression languages, so it's even more 
confusing to a person who knows that both ?Pgroupname and \groupnumber 
appear in the pattern expression language.  There is nothing in the 
documentation that would inform a person (such as myself) that the 
\ggroupname syntax is not also part of the pattern expression language, just 
as the other two features are.

(And why isn't \ggroupname part of the pattern language, anyway, or at least 
some way to refer to a match made in a previous *named* group?  It would be 
very convenient to be able to do that, particularly when using a 
dynamically-created regexp to parse strings delimited with a choice of 
delimiters that must match at both ends.)

In other words, this documentation could be beneficially improved.

--
resolution: invalid - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15956
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com