Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

Peter Kenny Mon, 09 Oct 2017 06:04:55 -0700

Correction - I am misrepresenting Sven. What he said was that Zinc would not
look inside the HTML <head> node to find out about coding. It would of
course use information in the HTTP headers, if any.



Peter Kenny wrote
> Henry
> 
> Thanks for the explanations. It's a bit clearer now. I'm still not sure
> about how ZnUrl>>retrieveContents manages to decode correctly in this
> case;
> I'm sure I recall Sven saying it didn't (and in his view shouldn't) look
> at
> the HTTP declarations in the header. There is also the mystery of how the
> string reader in the XML-Parser package (XMLURI>>get) does the same trick,
> when it is presumably what XMLHTMLParser>>parseURL: uses and fails.
> 
> However, all these are second order problems. It all begins because the
> Corriere web site does strange things with encoding, including using a
> UTF8
> character in a page coded with 8859-1, as Paul pointed out. In any case,
> reading the page as a string and then parsing it solves my problem, so I
> shall stick to that as a standard procedure. Most importantly, I don't
> think
> there is any indication of a problem in the XML package for Monty to worry
> about.
> 
> Thanks again
> 
> Peter
> 
> 
> 
> --
> Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html





--
Sent from: http://forum.world.st/Pharo-Smalltalk-Users-f1310670.html

Re: [Pharo-users] Problem with input to XML Parser - 'Invalid UTF8 encoding'

Reply via email to