Ezio Melotti ezio.melo...@gmail.com added the comment:
What I described in my previous message is what Firefox does. If you think
this should be changed, I suggest you to open another issue, possibly attaching
a test case with the desired behavior and a patch to change it.
--
Paweł Widera mo...@man.poznan.pl added the comment:
No. As the value of the href attribute is not suppose to contain spaces, I'd
rather expect the parser to assume that there is an ending missing before the
space.
--
___
Python tracker
Ezio Melotti ezio.melo...@gmail.com added the comment:
The first case has been fixed already in 1cbfeffea19f, the second case is not
even handled by browsers, so I'm closing this.
--
resolution: - fixed
stage: - committed/rejected
status: open - closed
Paweł Widera mo...@man.poznan.pl added the comment:
Great! With one but... the second case *is* handled by browsers. Browsers do
not throw an exception on it as HTMLParser do. So improvement is definitely
possible here. If it is worth an effort, it is not for me to judge.
--
Ezio Melotti ezio.melo...@gmail.com added the comment:
So you are suggesting that
a href=http://xxx.org/xxx.php?a=1 target=_blankclick me/a
should result in an 'a' element with an href attribute equals to
http://xxx.org/xxx.php?a=1 target= and then discard _blank as extra data?
--
Changes by Ezio Melotti ezio.melo...@gmail.com:
--
versions: +Python 3.2, Python 3.3 -Python 2.6
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6191
___
Ezio Melotti ezio.melo...@gmail.com added the comment:
BeautifulSoup use SGMLParser for all the versions 3.1. BeautifulSoup
3.1 is supposed to be compatible with Python 3 and since SGMLParser is
gone it's now using HTMLParser, but it's not able to handle some things
anymore.
For more
New submission from Paweł Widera mo...@man.poznan.pl:
Of course both are not correct HTML but are easy to guess, so I believe
the parser should not give up too quick here.
1) extra comma between attributes
form action=/xxx.php?a=1amp;b=2amp, method=post
2) missing closing quotation mark for
Georg Brandl ge...@python.org added the comment:
I do not think HTMLParser should guess. Guessing always opens the door
to misinterpretation.
--
nosy: +georg.brandl
resolution: - wont fix
status: open - closed
___
Python tracker
Paweł Widera mo...@man.poznan.pl added the comment:
It depends whether you want a HTMLParser to be an useful tool that can
deal with real world HTML or just a toy without practical meaning.
Crashing on every little deviation from the standard, where more relaxed
approach is possible, doesn't
Georg Brandl ge...@python.org added the comment:
Throwing an exception and giving up is just not good enough.
Yes it is, in some cases. There are forgiving HTML parsers out there,
HTMLParser does not strive to be one.
There are *so many* cases where HTML is a bit malformed that it takes
more
R. David Murray rdmur...@bitdance.com added the comment:
In doing web scraping I started using BeautifulSoup precisely because it
was very lenient in what html it accepted (I haven't written such an ap
for a while, so I'm not sure what BeautifulSoup currently does...I
thought I heard it was now
Georg Brandl ge...@python.org added the comment:
So BeautifulSoup is using HTMLParser? That is interesting, because they
claim to support broken HTML.
In any case, if a quirky mode is added, it should have to be turned on
explicitly by a flag.
--
resolution: wont fix -
13 matches
Mail list logo