On Tue, Jul 10, 2012 at 10:09:57AM -0700, dhruvbird wrote:
> 
> On Tuesday, July 10, 2012 12:42:35 AM UTC-7, Henri Gourvest wrote:
> >
> >
> > If you want to decode a xml document by chunks, you can cut a character, 
> > if the partial chunk is decoded from UTF8 to Unicode to be parsed by 
> > node-xml there will be a character lost, ouch! 
> >
> 
> Sorry, but I didn't understand the paragraph above. Please could you 
> elaborate what you mean by it?
> 
> Are you trying to say that if the Buffer() contains half a utf8 character, 
> then the output stream will print just half the character? IIRC, utf8 is a 
> self synchronizing stream, so as long as you are checking each byte to be 
> something that has the high-bit reset, you should be generally okay. Does 
> anyone see any problem with this?

I believe you need to look at the end of the buffer to ensure that the
high-bit is not set. If it is, then you need to move back to a byte the
high-bit and that is the last character in the string, or else the top
two are set and the previous byte is last byte in the string. Then you
carry over any bytes you've not used to the next buffer.

--
Alan Gutierrez - http://twitter.com/bigeasy - http://github.com/bigeasy

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to