I have used html5lib in my project. It runs great except a minor
possible error.
I got the following error message:
File "myfolder\parser.py", line 30, in parser
minidom_document = parser.parse(fp)
File "build\bdist.win32\egg\html5lib\html5parser.py", line 144, in
parse
File "build\bdist.win32\egg\html5lib\html5parser.py", line 116, in
_parse
File "build\bdist.win32\egg\html5lib\tokenizer.py", line 98, in
__iter__
File "build\bdist.win32\egg\html5lib\tokenizer.py", line 333, in
dataState
File "build\bdist.win32\egg\html5lib\inputstream.py", line 282, in
charsUntil
File "build\bdist.win32\egg\html5lib\inputstream.py", line 259, in
readChunk
IndexError: string index out of range
I think it is because the following code:
if (self._lastChunkEndsWithCR and data[0] == "\n"):
data = data[1:]
self._lastChunkEndsWithCR = data[-1] == "\r"
if the data only contains a single "\n" and self._lastChunkEndsWithCR
happens to be True, then data would be "" after the first two lines.
So data[-1] would then raise an exception.
I have added the following code after the second line and the bug
vanished:
if not data:
return
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---