[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-11-01 Thread Roundup Robot
Roundup Robot devn...@psf.upfronthosting.co.za added the comment: New changeset 0a5eb57d5876 by Ezio Melotti in branch '2.7': #670664: Fix HTMLParser to correctly handle the content of ``script.../script`` and ``style.../style``. http://hg.python.org/cpython/rev/0a5eb57d5876 New changeset

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-11-01 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Fixed, thanks to everyone who contributed to this over the years! -- resolution: - fixed stage: commit review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-10-31 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: -def set_cdata_mode(self): +def set_cdata_mode(self, elem): Looks like an incompatible behavior change. Is it only an internal method that will never affect users’ code (even subclasses)? --

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-10-31 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I think it's internal. While it's not explicitly mentioned in the source, the method is not documented and I don't think people subclassed it. All that it does is changing the regex used to parse the data, and if someone needs to change

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-10-30 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Attached a new patch with a few more tests and minor refactoring. -- keywords: +needs review stage: patch review - commit review versions: +Python 2.7 Added file: http://bugs.python.org/file23553/issue670664.diff

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-10-29 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- assignee: - ezio.melotti ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-10-03 Thread Fred L. Drake, Jr.
Changes by Fred L. Drake, Jr. f...@fdrake.net: -- nosy: -fdrake ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___ Python-bugs-list

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-08-08 Thread Chris Palmer
Changes by Chris Palmer ch...@isecpartners.com: -- nosy: -cpalmer ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___ Python-bugs-list

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-08-01 Thread Alexander
Alexander b3n...@yandex.ru added the comment: It sounds like the early consensus on python-dev is that html5 support is a good thing. Yeah... But wait another 8 years untill these guys decides that there is enough tests and other cool stuff. --

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-08-01 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: Yeah... But wait another 8 years untill these guys decides that there is enough tests and other cool stuff. Which guys are you talking about? Granted, this issue has been around for a lng time... but now that we have a patch that seems ok

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-08-01 Thread Matt Basta
Matt Basta bastaw...@gmail.com added the comment: Seeing as everyone seems pretty satisfied with the 2.7 version, I'd be happy to put together a patch for 3 as well. To confirm, though, this fix is NOT going behind the strict parameter, correct? --

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-30 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: As I said somewhere else, the only use case I can think of where the 'strict' flag is useful is validation, but AFAIK even in strict mode it's possible to parse non-valid documents, so I agree it's pretty useless. Moving to HTML5 and

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-29 Thread Antoine Pitrou
Antoine Pitrou pit...@free.fr added the comment: I also think this is a bug that should be fixed. Not being able to parse real-world HTML is a nuisance. I agree with Ezio's review comments about the custom regex. -- assignee: fdrake - nosy: +pitrou stage: - patch review

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-29 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: It sounds like the early consensus on python-dev is that html5 support is a good thing. I'm happy with that. I presume that means the 'strict' keyword in 3.x becomes strict-per-html5, and possibly useless :) --

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-28 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: HTML5 being a spec that builds on HTML 4.01 and real-world ways to deal with non-compliant input, I don’t object to fixes that follow the HTML5 spec. Regarding backward compatibility, we can break it if we decide that the behavior we’re

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-28 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Unless someone else has picked it up, BeautifulSoup is a no longer an issue since its author has abandoned it. That doesn't change the fact that IMO it would be nice for our library to handle input generously. --

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: I left a review about your patch on rietveld, including a description of what I think it's going on there (the patch lacks some context and it's not easy to figure out how everything works there). I also did some tests with and without the

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: Ezio wrote: myhp.feed('scriptpfoo/p/script') data: 'pfoo' # where's the /p? http://www.w3.org/TR/html4/types#type-cdata says: Although the STYLE and SCRIPT elements use CDATA for their data model, for these elements, CDATA must be

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: It's not buggy, but it is also not helpful. This kind of thing is what we introduced the 'strict' parameter for. And indeed I believe we've fixed some of these cases thereby. So any additional fixes should go into non-strict mode in

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread Matt Basta
Matt Basta bastaw...@gmail.com added the comment: So I think the example is invalid (should escape the ), and that HTMLParser is not buggy. On the other hand, the HTML5 spec clearly dictates otherwise: http://www.w3.org/TR/html5/syntax.html#cdata-rcdata-restrictions The text in raw text and

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: Yes, but we don't claim to support HTML5 yet. The best way to support HTML5 is probably a topic for python-dev. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread Matt Basta
Matt Basta bastaw...@gmail.com added the comment: Yes, but we don't claim to support HTML5 yet. There's also no claim in the docs or the source that HTMLParser specifically adheres to HTML4, either. Ideally, the parser should strive for parity with the functionality of major web browsers,

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: I thought HTLM4 conformance was documented somewhere, but I could be wrong. HTML5, from what I understand (I haven't read the spec), is explicitly or implicitly following what browsers really do exactly because nobody conformed to

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-27 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: IIRC we have been following what browsers do in other cases already. There were also some discussions about supporting HTML5 (see e.g. #7311 and #3) and the strict vs non-strict mode introduced in Python3. Note that changing the way

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-07-26 Thread Matt Basta
Matt Basta bastaw...@gmail.com added the comment: The number of problems produced by this bug can be greatly reduced by adding a relatively small check to the parser. Currently, script and style tags call set_cdata_mode(), which sets self.interesting to HTMLParser.interesting_cdata. This is

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-03-12 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: Thanks for the patch, however it would be better if you could get a clone of the CPython repo and make a patch against it. The patch should also include tests. You can check http://docs.python.org/devguide/ for more information.

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-03-08 Thread Alexander
Alexander b3n...@yandex.ru added the comment: This is small patch for related bug issue9577 which actually is not related to this bug. -- nosy: +friday Added file: http://bugs.python.org/file21045/cdata_patch.diff ___ Python tracker

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-03-08 Thread Alexander
Alexander b3n...@yandex.ru added the comment: And this patch fix the both bugs in more elegant way -- Added file: http://bugs.python.org/file21046/cdata_patch.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-02-14 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: -- nosy: +r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-01-02 Thread Yotam Medini
Changes by Yotam Medini yo...@users.sourceforge.net: Added file: http://bugs.python.org/file20231/endtag-space.html ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-01-02 Thread Yotam Medini
Changes by Yotam Medini yo...@users.sourceforge.net: Added file: http://bugs.python.org/file20232/dollar-extra.html ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-01-02 Thread Yotam Medini
Yotam Medini yo...@users.sourceforge.net added the comment: Suggested fix for the attached cases: lt-in-script-example.tgz endtag-space.html dollar-extra.html -- Added file: http://bugs.python.org/file20233/ltscr-endtag-dollarext.diff ___

Re: [issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2011-01-02 Thread Senthil Kumaran
If you provide some tests augumenting the currently existing tests test_htmlparser.py and also ensure that no existing test breaks, it would be help better to review the patch. I do see some changes made to the regex and parsing. So tests would definitely help.

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2010-11-02 Thread Éric Araujo
Éric Araujo mer...@netwok.org added the comment: Would it be reasonable to add knowledge to html.parser to make it recognize script elements as CDATA and handle it correctly (that is let “” pass)? -- nosy: +eric.araujo ___ Python tracker

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2010-09-30 Thread Yotam Medini
Yotam Medini yo...@users.sourceforge.net added the comment: The HTMLParser.py fails when inside script ... /script it can fooled by JavaScript with less-than '' conditional expressions. In the attached example: $ tar tvzf lt-in-script-example.tgz | cut -c24- 796 2010-09-30 16:52 h2t.py

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2010-09-30 Thread Yotam Medini
Yotam Medini yo...@users.sourceforge.net added the comment: The attached suggested patch fixes the problems shown in msg117762. -- Added file: http://bugs.python.org/file19073/HTMLParser.diff ___ Python tracker rep...@bugs.python.org

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2010-08-17 Thread Mark Lawrence
Changes by Mark Lawrence breamore...@yahoo.co.uk: -- versions: +Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2010-08-13 Thread R. David Murray
Changes by R. David Murray rdmur...@bitdance.com: -- nosy: +Hunanyan ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___ Python-bugs-list

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-06 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- nosy: +ezio.melotti versions: +Python 2.7, Python 3.2 -Python 2.5 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-04 Thread Paweł Widera
Changes by Paweł Widera mo...@man.poznan.pl: -- nosy: +momat ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___ Python-bugs-list mailing

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-04 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: A simple workaround for the BeautifulSoup is the following wrapper. It sanitize the javascript code before passing it to the parser by joining the disjoint strings, so that /scr+ipt becomes /script. def bs(input): pattern =

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-03-06 Thread Gabriel Sean Farrell
Gabriel Sean Farrell g...@breaksalot.org added the comment: Now that BeautifulSoup uses HTMLParser, more people are seeing these errors. See http://groups.google.com/group/beautifulsoup/msg/d5a7540620538d14 and http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=516824 -- nosy: +gsf

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2008-12-01 Thread Chris Palmer
Chris Palmer [EMAIL PROTECTED] added the comment: Here is an additional test case. I have a super simple HTML minifier that burps when given this test file: $ cat test.html 'foo sc'+'ript' The explosion is: $ ./minify.py test.html Warning: malformed start tag 'foo

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2008-12-01 Thread Chris Palmer
Changes by Chris Palmer [EMAIL PROTECTED]: -- type: - behavior ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue670664 ___ ___ Python-bugs-list mailing