On Aug 5, 2011, at 4:41 PM, Hilaire Fernandes wrote: > I gave a look at the latest XMLParser but the API is different with a > lot broken code on my face. Does XMLWriter class>>on: obsolete ? It bugs > me with that but the class and method are still there, a Monticello > trick I forget about? > I don't even now how to port to new API. Is there a port guide? > I guess this is for the better, but still frustrating and distracting > from the main task...
indeed We should really invest into some main packages. For example I worked on SOUP to add comments and add new tests. Now we (the core) do not have the energy to work on the core and external packages. I hope it will change when the core gets fixed. > > > > Le 05/08/2011 16:23, Henrik Johansen a écrit : >> >> On Aug 5, 2011, at 3:41 54PM, Hilaire Fernandes wrote: >> >>> Le 05/08/2011 13:28, Henrik Johansen a écrit : >>>> >>>> On Aug 5, 2011, at 1:14 35PM, Hilaire Fernandes wrote: >>>> >>>>> It seems like when inputing accented character it is not by default in >>>>> UTF-8. >>>>> Is it the case with Pharo 1.3 ? >>>>> >>>>> Hilaire >>>>> >>>>> >>>>> -- >>>>> Education 0.2 -- http://blog.ofset.org/hilaire >>>> >>>> I'm not sure what you mean. >>>> When in image, all the way from InputEvents to String representation, you >>>> only deal with Unicode codePoints. >>> >>> Is seems it is 8 bits chars, when exported through XMLParser, it is >>> 8bits string. I need to investigate further. >>> >>> Hilaire >> It is an 8-bit character, since the codePoint fits in one byte. (see a) >> Accented characters like é could be either: >> a) One Unicode codepoint (U+00E9 (decimal 233) small acute e ) >> b) Two Unicode codepoints ( U+0301 (decimal 769) combining acute accent + >> U0065 (decimal 101) small e ). >> >> Internally, you'd see strings with character values corresponding to those >> listed as decimal, ie the unicode codePoints. >> b) would be a WideString, as 769 does not fit in a byte. >> >> However, if correctly converted to UTF8, their representations should be; >> a) represented in 2 bytes ; 16r C3A9 >> b) represented in 3 bytes: 16r CD81 65. >> >> Ie. it seems XMLParser does not encode it properly to utf8 when exporting. >> Note: This is perfectly legal if the document contains an encoding attribute >> specifying a one-byte encoding like iso-8859-1 or windows-1252. >> (starts with <?xml version="1.0" encoding="windows-1252" ?> or some such) >> Absent such an attribute, or a BOM indicating another Unicode encoding >> though, it is a bug. >> >> Cheers, >> Henry >> >> >> > > > -- > Education 0.2 -- http://blog.ofset.org/hilaire > >
