[issue2746] ElementTree ProcessingInstruction uses character entities in content

2009-10-24 Thread Tom Lynn

Changes by Tom Lynn :


--
nosy: +tlynn

___
Python tracker 
<http://bugs.python.org/issue2746>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1859] textwrap doesn't linebreak on "\n"

2009-11-19 Thread Tom Lynn

Tom Lynn  added the comment:

This bug should be re-opened, since there is definitely a bug here.  
I think the patch was incorrectly rejected.

If I can expand palfrey's example:

from textwrap import *
T = TextWrapper(replace_whitespace=False, width=75)
text = '''\
a a a a a
b b b b b
c c c c c
d d d d d
e e e e e'''
for line in T.wrap(text): print line

Python 2.5 textwrap turns it into:

a a a a a
b b b b b
c c
c c c
d d d d d
e e e e
e

That can't be right.  palfrey's patch leaves the input unchanged, which 
seems correct to me.  I think Guido guessed wrong here: the docs for 
replace_whitespace say:

  If true, each whitespace character (as defined by string.whitespace)
  remaining after tab expansion will be replaced by a single space

The text should therefore not be reflowed in this case since 
replace_whitespace=False.  palfrey's patch seems correct to me.

It can be made to reflow to the full width by editing palfrey's patch, 
but that would disagree with the docs and break code.

--
nosy: +tlynn

___
Python tracker 
<http://bugs.python.org/issue1859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1708652] Exact matching

2010-09-17 Thread Tom Lynn

Tom Lynn  added the comment:

I don't know whether it should stand, I'm somewhere around 0 on it myself. So I 
guess that means it shouldn't, since it's easier to add features than remove 
them. The problem is that once you're aware of the need for it you need it less.

In case other people are +1, I'll note that "exact" isn't a very nice name 
either, not being a verb. "exact_match" is a bit long but probably better (and 
better than "match_exact").

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1708652] Exact matching

2010-09-18 Thread Tom Lynn

Tom Lynn  added the comment:

I'm still unsure.  I think this confusion does cause bugs in real-world code.  
Perhaps more prominence for \A and \Z in the docs?  There's already a section 
comparing regexps starting '^' with match under "Matching vs Searching".

The problem is basically that ^ and $ have weird semantics but are better 
recognised than \A and \Z.  Looking over the docs again I see that the docs for 
$ are still misleading, in a way that's related to this issue:

foo matches both 'foo' and 'foobar', while the regular
expression foo$ matches only 'foo'.

"foo$ matches only 'foo' (out of 'foo' and 'foobar')" is the correct 
interpretation of that, but it's easy to read it as "foo$ means 
exact_match('foo')", which is the misconception I was hoping to put to rest 
with this (foo$ also matches the 'foo' part of 'foo\nbar', even with flags=0).

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1708652] Exact matching

2010-09-18 Thread Tom Lynn

Tom Lynn  added the comment:

Actually, looking at the second part of the docs for $ (on "foo.$") makes me 
think the main motivating case here may be a bug in re.match::

>>> re.match('foo$', 'foo\n\n')
>>> re.match('foo$', 'foo\n')
<_sre.SRE_Match object at 0x00A98678>

Shortening an input string shouldn't ever cause it to match, should it?

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1708652] Exact matching

2010-09-18 Thread Tom Lynn

Tom Lynn  added the comment:

Oh dear, I'm wrong on two fronts (I wish Roundup had post editing).

a) foo$ doesn't match the 'foo' part of 'foo\nbar' as I stated above, but does 
match the 'foo' part of 'foo\n'.
b) Obviously shortening an input string can cause it to match.  It's still 
weird though.

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1708652] Exact matching

2010-09-18 Thread Tom Lynn

Tom Lynn  added the comment:

(Sorry to comment on a closed issue, it was closed as I was writing this.)  
It's not that I'm not convinced of the need, just not of the solution.  I still 
think that there are problems here:

a) forgetting any \Z or $ terminator to .match() is easy,
b) $ is easily misunderstood (and not just by me) and I suspect commonly 
dangerously misused in validation routines as a result,
c) '(?:%s)\Z' % regexp is noisy, combines two less-understood features, and 
makes simple regexps hard to read,
d) '(?:%s)\Z' % regexp.pattern requires recompilation of the regexp.

I think another method is probably the best solution to these, but it may have 
too much cost (though I'm not sure what that cost would be).

Largely orthogonally, I'd like to see \Z encouraged over $ in the docs, and 
preferably a version of this table (probably under Matching vs Searching), 
corrected if I'm wrong of course:

NON-MULTILINE:
'^' is equivalent to '\A'
'$' is equivalent to '(?:\Z|(?=\n\Z))'

MULTILINE:
'^' is equivalent to '(?:\A|(?<=\n))'
'$' is equivalent to '(?:\Z|(?=\n))'

But the docs already try to express the above table (or its correction) in 
English, so you may feel it wouldn't add anything, in which case I'd still like 
to see any corrections for my own edification if possible.

--

___
Python tracker 
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1859] textwrap doesn't linebreak on "\n"

2010-11-23 Thread Tom Lynn

Tom Lynn  added the comment:

I've also been attempting to look into this and came up with an almost 
identical patch, which is promising:
https://bitbucket.org/tlynn/issue1859/diff/textwrap.py?diff2=041c9deb90a2&diff1=f2c093077fbf

I missed the wordsep_simple_re though.

Testing it is the hard part.  I've got a few examples that could become tests 
in that repository, but it's far from conclusive.

One corner case I found is trailing whitespace becoming a blank line:

>>> from textwrap import TextWrapper
>>> T = TextWrapper(replace_whitespace=False, drop_whitespace=False, width=9)
>>> T.wrap('x'*9 + ' \nfoo')
['x', ' ', 'foo']

I think it's fine.  drop_whitespace=True removes the blank line, and those who 
really want drop_whitespace=False can remove the blank lines easily.

--

___
Python tracker 
<http://bugs.python.org/issue1859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1708652] Exact matching

2008-10-13 Thread Tom Lynn

Tom Lynn <[EMAIL PROTECTED]> added the comment:

Yes, that's right. The binary aspect of it was something of a red
herring, I'm afraid, although I still think that (or parsing in general)
is an important use case. The prime motivation it that it's easy to
either forget the '\Z' or to use '$' instead, which both cause subtle
bugs. An exact() method might help to avoid that.

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1708652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5021] doctest.testfile should set __name__, can't use namedtuple

2009-01-21 Thread Tom Lynn

New submission from Tom Lynn :

This file fails when run with doctest.testfile::

  >>> print __name__
  __builtin__
  >>> print globals()['__name__']  # fails with KeyError: __name__
  __builtin__

"__builtin__" is probably not a good value, but more importantly, this 
means that you can't use namedtuples in text file doctests, since 
namedtuple() inspects the calling frame::

  >>> from namedtuple import namedtuple
  >>> t = namedtuple('fred', 'x')  # fails

(I presume this is the same for "from collections import namedtuple", 
but I've not tested with 2.6+.)

A workaround is to add this line at the start of the test::

   >>> __name__ = 'test'

--
components: Library (Lib)
messages: 80322
nosy: tlynn
severity: normal
status: open
title: doctest.testfile should set __name__, can't use namedtuple
type: feature request
versions: Python 2.5

___
Python tracker 
<http://bugs.python.org/issue5021>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5022] doctest should allow running tests with "python -m doctest"

2009-01-21 Thread Tom Lynn

New submission from Tom Lynn :

It would be good to be able to do something like::

  $ python -m doctest foo.py
  $ python -m doctest --text foo.txt bar.txt

(or preferably some command line options design which could handle 
both .py and .txt files).

--
components: Library (Lib)
messages: 80323
nosy: tlynn
severity: normal
status: open
title: doctest should allow running tests with "python -m doctest"
type: feature request
versions: Python 2.5

___
Python tracker 
<http://bugs.python.org/issue5022>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5079] time.ctime docs refer to "time tuple" for default

2009-01-27 Thread Tom Lynn

New submission from Tom Lynn :

The docs for time.ctime() (quoted below) seem to have been copied from 
time.asctime(). They refer to a time tuple and localtime(), where they 
should refer to seconds and time().

Current docs::

ctime(seconds) -> string

Convert a time in seconds since the Epoch to a string in local time.
This is equivalent to asctime(localtime(seconds)). When the time 
tuple is not present, current time as returned by localtime() is 
used.

--
messages: 80644
nosy: tlynn
severity: normal
status: open
title: time.ctime docs refer to "time tuple" for default

___
Python tracker 
<http://bugs.python.org/issue5079>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5079] time.ctime docs refer to "time tuple" for default

2009-01-27 Thread Tom Lynn

Changes by Tom Lynn :


--
components: +Library (Lib)
type:  -> feature request
versions: +Python 2.5, Python 3.0

___
Python tracker 
<http://bugs.python.org/issue5079>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1079] decode_header does not follow RFC 2047

2009-02-03 Thread Tom Lynn

Tom Lynn  added the comment:

The only difference between the two regexps is that the email/header.py
version looks for::

  (?=[ \t]|$)   # whitespace or the end of the string

at the end (with re.MULTILINE, so $ also matches '\n').

To expand on "There is nothing about that thing in RFC 2047", it says::

   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.

RFC 822 says::

   atom=  1*
  ...
   specials=  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
   /  "," / ";" / ":" / "\" / <">  ;  string, to use
   /  "." / "[" / "]"  ;  within a word.

So an example of mis-parsing is::

   >>> import email.header
   >>> h = '=?utf-8?q?=E2=98=BA?=(unicode white smiling face)'
   >>> email.header.decode_header(h)
   [('=?utf-8?q?=E2=98=BA?=(unicode white smiling face)', None)]

The correct result would be::

   >>> email.header.decode_header(h)
   [('\xe2\x98\xba', 'utf-8'), ('(unicode white smiling face)', None)]

which is what you get if you insert a space before the '(' in h.

--
nosy: +tlynn

___
Python tracker 
<http://bugs.python.org/issue1079>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4958] email/header.py ecre regular expression issue

2009-02-03 Thread Tom Lynn

Tom Lynn  added the comment:

Duplicates issue1047.

--
nosy: +tlynn

___
Python tracker 
<http://bugs.python.org/issue4958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4958] email/header.py ecre regular expression issue

2009-02-03 Thread Tom Lynn

Tom Lynn  added the comment:

Oops, duplicates issue 1079 even.

___
Python tracker 
<http://bugs.python.org/issue4958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4491] email.Header.decode_header() doesn't work if encoded-word was separeted by CRLF

2009-02-03 Thread Tom Lynn

Tom Lynn  added the comment:

Duplicates issue1079.

--
nosy: +tlynn

___
Python tracker 
<http://bugs.python.org/issue4491>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1631394] sre module has misleading docs

2008-01-25 Thread Tom Lynn

Tom Lynn added the comment:

Thanks for fixing this. I now also note that (?<=...), (?http://bugs.python.org/file9284/undoc-patch.txt

_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1631394>
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1631394] sre module has misleading docs

2008-01-26 Thread Tom Lynn

Tom Lynn added the comment:

Nice changes to the wording. (For the record: it's r60316 in fact.)

_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1631394>
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2670] urllib2 build_opener() fails if two handlers use the same default base class

2008-04-22 Thread Tom Lynn

New submission from Tom Lynn <[EMAIL PROTECTED]>:

urllib2.py:424 (Py 2.4) or urllib2.py:443 (Py 2.5) in build_opener()::

skip = []
for klass in default_classes:
for check in handlers:
if inspect.isclass(check):
if issubclass(check, klass):
skip.append(klass)
elif isinstance(check, klass):
skip.append(klass)
for klass in skip:
   default_classes.remove(klass)

This can cause klass to be appended to skip multiple times,
which then causes an exception in the final line quoted above.

skip should be a set (and append changed to add), or "if klass
not in skip:" should be added before each "skip.append(klass)".

--
components: Library (Lib)
messages: 65683
nosy: tlynn
severity: normal
status: open
title: urllib2 build_opener() fails if two handlers use the same default base 
class
type: behavior
versions: Python 2.4, Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2670>
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15858] tarfile missing entries due to omitted uid/gid fields

2012-09-03 Thread Tom Lynn

New submission from Tom Lynn:

The tarfile module silently truncates the list of entries when reading a tar 
file if it sees an entry with a uid/gid field containing only spaces/NULs.  I 
got such a tarball from Java Maven/plexus-archiver.  I don't know whether they 
write such fields deliberately, but it seems reasonable to me, especially since 
they were providing the user/group names textually.

I'd like to see two fixes - a None/-1/0 value for the uid/gid and not silently 
swallowing HeaderErrors in TarFile.next() (or at least documenting why it's 
being done).  0 would be consistent with the default value when writing, but 
None seems more honest.  -1 seems hard to defend.

Only tested on silly Python versions (2.6, PyPy-1.8), sorry.  It's what I've 
got to hand, but I think this issue also applies to recent Python too going by 
looking at the hg trunk.

--
components: Library (Lib)
messages: 169799
nosy: tlynn
priority: normal
severity: normal
status: open
title: tarfile missing entries due to omitted uid/gid fields
type: behavior
versions: 3rd party, Python 2.6

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15858] tarfile missing entries due to omitted uid/gid fields

2012-09-04 Thread Tom Lynn

Tom Lynn added the comment:

I think the default has to be 0 for consistency with how other empty numeric 
fields are handled.

In theory spaces and NULs are supposed to be equivalent terminators in numeric 
fields, but I've just noticed that plexus-archiver is also using leading spaces 
rather than leading zeros (against the spec), e.g. ' 10422\x00 '.  tarfile 
currently supports this, which I think is good, so I think the right approach 
is to lstrip(' ') fields and then treat space as a terminator.

--

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15858] tarfile missing entries due to omitted uid/gid fields

2012-09-05 Thread Tom Lynn

Tom Lynn added the comment:

See attached bad.tar.

$ less bad.tar | cat
drwxr-xr-x 0/0   0 2012-09-05 20:04 foo/
-rw-rw-r-- uname/gname   0 2012-09-05 20:04 foo/a
$ python -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
['foo']
$ python -c 'import tarfile, patch; patch.patch_tarfile(); print 
(tarfile.open("bad.tar").getnames())'
['foo', 'foo/a']

I'm only allowed to attach one file via the tracker web UI, so patch.py will 
follow.

Creation code for bad.tar, largely for my benefit:

import java.io.FileOutputStream;
import java.io.IOException;
import org.codehaus.plexus.archiver.tar.TarOutputStream;
import org.codehaus.plexus.archiver.tar.TarEntry;

class TarTest {
public static void main(String[] args) throws IOException {
FileOutputStream fos = new FileOutputStream("bad.tar");
TarOutputStream tos = new TarOutputStream(fos);

TarEntry entry = new TarEntry("foo/");
entry.setMode(16877); // 0o40755
entry.setUserId(0);
entry.setGroupId(0);
entry.setUserName("");
entry.setGroupName("");
tos.putNextEntry(entry);

TarEntry entry2 = new TarEntry("foo/a");
entry2.setMode(33204); // 0o100664
entry2.setUserId(-1);  // XXX: dodgy
entry2.setGroupId(-1); // XXX: dodgy
entry2.setUserName("uname");
entry2.setGroupName("gname");
tos.putNextEntry(entry2);

tos.close();
fos.close();
}
}

--
Added file: http://bugs.python.org/file27129/bad.tar

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15858] tarfile missing entries due to omitted uid/gid fields

2012-09-05 Thread Tom Lynn

Tom Lynn added the comment:

patch.py attached - what I'm using as a workaround at the moment.

--
Added file: http://bugs.python.org/file27130/patch.py

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15858] tarfile missing entries due to omitted uid/gid fields

2015-12-08 Thread Tom Lynn

Tom Lynn added the comment:

I think issue24514 (fixed in Py2.7.11) is a duplicate of this issue.

--

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19560] PEP 8 operator precedence across parens

2013-11-12 Thread Tom Lynn

New submission from Tom Lynn:

PEP 8 currently has::

  Yes::

  ...
  c = (a+b) * (a-b)

  No::

  ...
  c = (a + b) * (a - b)

That looks wrong to me -- surely the parens are a sufficient
precedence hint, and don't need further squashing inside?
This will be worse with any non-trivial example.  I suspect
it may also lead to silly complications in code formatting tools.

This was changed by Guido as part of a reversion in issue 16239,
but I wonder whether that example was intended to be included?

--
assignee: docs@python
components: Documentation
messages: 202687
nosy: docs@python, tlynn
priority: normal
severity: normal
status: open
title: PEP 8 operator precedence across parens
type: enhancement

___
Python tracker 
<http://bugs.python.org/issue19560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19560] PEP 8 operator precedence across parens

2013-11-12 Thread Tom Lynn

Tom Lynn added the comment:

FWIW, this pair of examples differs from the others in this section
as they were both explicitly okayed in the first version of PEP 8
<http://hg.python.org/peps/rev/4c31c25bdc03?revcount=120>::

- Use your better judgment for the insertion of spaces around
  arithmetic operators.  Always be consistent about whitespace on
  either side of a binary operator.  Some examples:

  i = i+1
  submitted = submitted + 1
  x = x*2 - 1
  hypot2 = x*x + y*y
  c = (a+b) * (a-b)
  c = (a + b) * (a - b)

My guess is that this is still the intention?

--

___
Python tracker 
<http://bugs.python.org/issue19560>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15858] tarfile missing entries due to omitted uid/gid fields

2014-01-15 Thread Tom Lynn

Tom Lynn added the comment:

The secondary issue, which the patch doesn't address, is that TarFile.next() 
seems unpythonic; it treats any {Invalid,Empty,Truncated}HeaderError after 
offset 0 as EOF rather than propagating the exception.  It looks deliberate, 
but I'm not sure why it would be done like that or if it should be changed.

--

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com