[issue39949] truncating match in regular expression match objects repr

2020-06-18 Thread Seth Troisi


Seth Troisi  added the comment:

I was thinking about how to add the end quote and found these weird cases:
  >>> "asdf'asdf'asdf"
  "asdf'asdf'asdf"
  >>> "asdf\"asdf\"asdf"
  'asdf"asdf"asdf'
  >>> "asdf\"asdf'asdf"
  'asdf"asdf\'asdf'

This means that len(s) +2 (or 3 for bytes) != len(repr(s))
e.g.

>>> s = "\"''''''"
'"\'\'\'\'\'\''
>>> s
>>> len(s)
7
>>> len(repr(s))
15

This can lead to a weird partial trailing character 
  >>> re.match(".*", "a"*48 + "'\"")
  <_sre.SRE_Match object; span=(0, 50), 
match='\>


This means I'll need to rethink len(group0) >= 48 as the condition for 
truncation (as a 30 length string can be truncated by %.50R)

Maybe it makes sense to write group0 to a temp string and then check if that's 
truncated and extract the quote character from that
OR
PyUnicode_FromFormat('%R', group0[:50]) # avoids trailing escape character 
('\') but might be longer than 50 characters

--

___
Python tracker 
<https://bugs.python.org/issue39949>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39949] truncating match in regular expression match objects repr

2020-06-16 Thread Seth Troisi


Seth Troisi  added the comment:

@matpi

The current behavior is for the right quote to not appear I kept this behavior 
but happy to consider changing that.

See the linked patch for examples

--

___
Python tracker 
<https://bugs.python.org/issue39949>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39949] truncating match in regular expression match objects repr

2020-06-16 Thread Seth Troisi


Change by Seth Troisi :


--
keywords: +patch
pull_requests: +20100
stage: needs patch -> patch review
pull_request: https://github.com/python/cpython/pull/20922

___
Python tracker 
<https://bugs.python.org/issue39949>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39949] truncating match in regular expression match objects repr

2020-06-16 Thread Seth Troisi


Seth Troisi  added the comment:

I didn't propose a patch before because I was unsure of decision. Now that 
there is a +1 from Raymond I'll working on a patch and some documentation. 
Expect a patch within the week.

--

___
Python tracker 
<https://bugs.python.org/issue39949>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39949] truncating match in regular expression match objects repr

2020-03-24 Thread Seth Troisi


Change by Seth Troisi :


--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue39949>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39949] truncating match in regular expression match objects repr

2020-03-12 Thread Seth Troisi


New submission from Seth Troisi :

Following on https://bugs.python.org/issue17087

Today I was mystified by why a regex wasn't working.

>>> import re
>>> re.match(r'.{10}', 'A'*49+'B')
<_sre.SRE_Match object; span=(0, 10), match='AA'>

>>> re.match(r'.{49}', 'A'*49+'B')
<_sre.SRE_Match object; span=(0, 49), 
match='A>

>>> re.match(r'.{50}', 'A'*49+'B')
<_sre.SRE_Match object; span=(0, 50), 
match='A>

I became confused on why the B wasn't matching in the third example; It is 
matching just
in the interactive debugger it doesn't fit on the line and doesn't show


My suggestion would be to truncate match (in the repr) and append '...' when 
it's right quote wouldn't show


with short matches (or exactly enough space) there would be no change

>>> re.match(r'.{48}', string.ascii_letters)
<_sre.SRE_Match object; span=(0, 48), 
match='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV'>

when not all of match can be displayed

>>> re.match(r'.{49}', string.ascii_letters)
<_sre.SRE_Match object; span=(0, 49), 
match='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVW>
<_sre.SRE_Match object; span=(0, 49), 
match='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRS'...>


I'm happy to help out by writing tests or impl if folks thing this is a good 
idea.

I couldn't think of other examples (urllib maybe?) in Python of how this is 
handled but I could potentially look for some if that would help

--
components: Library (Lib)
messages: 364052
nosy: Seth.Troisi, serhiy.storchaka
priority: normal
severity: normal
status: open
title: truncating match in regular expression match objects repr
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue39949>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39318] NamedTemporaryFile could cause double-close on an fd if _TemporaryFileWrapper throws

2020-01-13 Thread Seth Troisi


Change by Seth Troisi :


--
nosy: +Seth.Troisi

___
Python tracker 
<https://bugs.python.org/issue39318>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12162] Documentation about re \number

2011-05-23 Thread Seth Troisi

Seth Troisi  added the comment:

Given David Murray's input I think the example would be best done as 

>>> re.search(r'(\w+) \1', "can you do the can can?") # Matches the duplicate 
>>> can
<_sre.SRE_Match object at ...>


I want to stress that the documentation is not wrong but confusing, especially 
for someone unfamiliar with regression expressions.

--

___
Python tracker 
<http://bugs.python.org/issue12162>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12162] Documentation about re \number

2011-05-23 Thread Seth Troisi

New submission from Seth Troisi :

It would be nice to clarify re documentation on how to use \number.

current documentation lists three half examples:
"(.+) \1 matches 'the the' or '55 55', but not 'the end' (note the space after 
the group)."

This is rather confusing (at least to me) as it might be assumed that
re.search("(.+) \1", "the the") would return a match, which it does not.

A better example would be re.search("(\w+) \\1", "the the") which does match.

the other confusing portion is the requirement of the second "\" to make it 
match.

I would think that a quick example below the text would help.

>>> re.search("(\w+) \\1", "can you do the can can?") # \\1 matches the second 
>>> can at the end of the sentence
<_sre.SRE_Match object at ...>

This is my first python issue and if I have misfiled or left out some 
information please tell me how to proceed.

--
assignee: docs@python
components: Documentation
messages: 136708
nosy: Seth.Troisi, docs@python
priority: normal
severity: normal
status: open
title: Documentation about re \number
type: behavior

___
Python tracker 
<http://bugs.python.org/issue12162>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com