[issue13987] Handling of broken markup in HTMLParser on 2.7

2012-02-15 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 11a31eb5da93 by Ezio Melotti in branch '2.7':
#13987: HTMLParser is now able to handle EOFs in the middle of a construct.
http://hg.python.org/cpython/rev/11a31eb5da93

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13987
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13987] Handling of broken markup in HTMLParser on 2.7

2012-02-15 Thread Roundup Robot

Roundup Robot devn...@psf.upfronthosting.co.za added the comment:

New changeset 3d7904e3f4b9 by Ezio Melotti in branch '2.7':
#13987: HTMLParser is now able to handle malformed start tags.
http://hg.python.org/cpython/rev/3d7904e3f4b9

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13987
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13987] Handling of broken markup in HTMLParser on 2.7

2012-02-15 Thread Ezio Melotti

Ezio Melotti ezio.melo...@gmail.com added the comment:

This should be fixed now.
The first two chunks of the attached patch have been committed in the two 
changesets linked in the previous messages.  The third chunk about the end tag 
has been fixed as part of #13933.  The error previously raised by unknown_decl 
has been removed in 4743a3a1e669.  More fixes have been backported as part of 
#13960.
2.7 should now behave like 3.2 non-strict.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13987
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13987] Handling of broken markup in HTMLParser on 2.7

2012-02-10 Thread Ezio Melotti

New submission from Ezio Melotti ezio.melo...@gmail.com:

The attached patch fixes a few problems with HTMLParser on 2.7.
Instead of raising error when invalid markup is detected, the parser now 
consumes the invalid input and proceeds.  This patch is a partial backport of 
#1486713.

After this two more patches will follow.
The first will get rid of errors raised while parsing declarations and should 
also solve #13576:
 def unknown_decl(self, data):
-self.error(unknown declaration: %r % (data,))
+pass

The second will take care of bogus comments (see #13960).

Once this is done HTMLParser should be able to parse (almost) everything.  I'm 
planning to commit this before the release of 2.7.3.

--
assignee: ezio.melotti
components: Library (Lib)
files: issue13987.diff
keywords: patch
messages: 153043
nosy: benjamin.peterson, eric.araujo, ezio.melotti, r.david.murray
priority: normal
severity: normal
stage: patch review
status: open
title: Handling of broken markup in HTMLParser on 2.7
type: behavior
versions: Python 2.7
Added file: http://bugs.python.org/file24475/issue13987.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13987
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13987] Handling of broken markup in HTMLParser on 2.7

2012-02-10 Thread Eli Bendersky

Changes by Eli Bendersky eli...@gmail.com:


--
nosy: +eli.bendersky

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13987
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com