Re: [Pharo-project] Fwd: Re: Decoding bug with XMLParser ?

Stéphane Ducasse Wed, 19 May 2010 00:42:56 -0700

excellent! Let us know this is good to get more support on the XML/DTD part.


Stef

On May 19, 2010, at 2:17 AM, jaayer wrote:

> 
> 
> ============ Forwarded message ============
> From : jaayer<jaa...@zoho.com>
> To :  <alexandre.ber...@inria.fr>
> Date : Tue, 18 May 2010 16:30:06 -0700
> Subject : Re: Decoding bug with XMLParser ?
> ============ Forwarded message ============
> 
> ---- On Tue, 18 May 2010 02:29:18 -0700 Alexandre Bergel 
> <alexandre.ber...@inria.fr> wrote ---- 
> 
>> To give a bit of context, the problem is: 
>> 
>> -=-=-=-=-=-=-=-=-=-=-=-= 
>> exampleEncodedXML 
>>     ^'<?xml version="1.0" encoding="UTF-8"?> 
>> <test-data>&#8230;</test-data> 
>> ' 
>> 
>> testDecodingCharacters 
>>     | xmlDocument element | 
>>     "XMLTokenizer testDecodingCharacters" 
>> 
>>     xmlDocument := XMLDOMParser parseDocumentFrom: self exampleEncodedXML 
>> readStream. 
>>     element := xmlDocument firstTagNamed: #'test-data'. 
>>      
>>     self assert: element contentString first codePoint = 8230 
>> -=-=-=-=-=-=-=-=-=-=-=-= 
>> 
>> #testDecodingCharacters goes yellow 
>> 
>>> Thinking of it, it's not really an encoding problem, rather a bug in 
>>> the entity->character conversion. I guess there should be a similar 
>>> test where there is an actual ellipsis character in the xml, instead 
>>> of the entity. 
>> 
>> Any idea how your test can goes green? 
>> 
>>> And now I realize our server will not be able to connect outside its 
>>> DMZ, so I won't be able to use the fix :D 
>> 
>> DMZ ? 
>> 
>> Cheers, 
>> Alexandre 
>> 
> 
> Character references like the one above are handled using #nextCharReference. 
> It does so by reading the number after the "&#" or "&x" prefix and then 
> sending #value: to the class Unicode with that as the argument. If you 
> evaluate the following code in a workspace with cmd-p: "(Unicode value: 8230) 
> codePoint", you will see that the resulting code point is not what you would 
> expect. For me it was "1069555750". The same behavior results when creating a 
> Unicode character with #charFromUnicode:. Unless Unicode>>value: and 
> Unicode>>charFromUnicode: are being used incorrectly, I am not sure that this 
> is a bug, or least a bug in XML-Support.
> 
> (I am working on adding full DTD support with validation and refactoring and 
> re-engineering the parser at the moment, which is why minor releases have 
> slowed to a trickle. I will take a closer look at how character encoding is 
> handled in the process.)
> 
> 
> _______________________________________________
> Pharo-project mailing list
> Pharo-project@lists.gforge.inria.fr
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


_______________________________________________
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] Fwd: Re: Decoding bug with XMLParser ?

Reply via email to