[issue13987] Handling of broken markup in HTMLParser on 2.7
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 11a31eb5da93 by Ezio Melotti in branch '2.7': #13987: HTMLParser is now able to handle EOFs in the middle of a construct. http://hg.python.org/cpython/rev/11a31eb5da93 -- nosy: +python-dev ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13987 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13987] Handling of broken markup in HTMLParser on 2.7
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 3d7904e3f4b9 by Ezio Melotti in branch '2.7': #13987: HTMLParser is now able to handle malformed start tags. http://hg.python.org/cpython/rev/3d7904e3f4b9 -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13987 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13987] Handling of broken markup in HTMLParser on 2.7
Ezio Melotti ezio.melo...@gmail.com added the comment: This should be fixed now. The first two chunks of the attached patch have been committed in the two changesets linked in the previous messages. The third chunk about the end tag has been fixed as part of #13933. The error previously raised by unknown_decl has been removed in 4743a3a1e669. More fixes have been backported as part of #13960. 2.7 should now behave like 3.2 non-strict. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13987 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13987] Handling of broken markup in HTMLParser on 2.7
New submission from Ezio Melotti ezio.melo...@gmail.com: The attached patch fixes a few problems with HTMLParser on 2.7. Instead of raising error when invalid markup is detected, the parser now consumes the invalid input and proceeds. This patch is a partial backport of #1486713. After this two more patches will follow. The first will get rid of errors raised while parsing declarations and should also solve #13576: def unknown_decl(self, data): -self.error(unknown declaration: %r % (data,)) +pass The second will take care of bogus comments (see #13960). Once this is done HTMLParser should be able to parse (almost) everything. I'm planning to commit this before the release of 2.7.3. -- assignee: ezio.melotti components: Library (Lib) files: issue13987.diff keywords: patch messages: 153043 nosy: benjamin.peterson, eric.araujo, ezio.melotti, r.david.murray priority: normal severity: normal stage: patch review status: open title: Handling of broken markup in HTMLParser on 2.7 type: behavior versions: Python 2.7 Added file: http://bugs.python.org/file24475/issue13987.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13987 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13987] Handling of broken markup in HTMLParser on 2.7
Changes by Eli Bendersky eli...@gmail.com: -- nosy: +eli.bendersky ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue13987 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com