Alexander Heidrich wrote:
> Hi all,
> 
> thanks to the people who replied to my question! I finally solved the  
> issue by writing own handlers and using xmlEventParse - which leads  
> to the following problem which is so odd that its probably a bug.
> 
> I use several special charachter in my XML-File, e.g. umlauts or ° or  
> µ - but no matter how I encode my XML (UTF or ISO) or I escape these  
> characters xmlEventParse always stops parsing after the first umlaut  
> and pretends to have more than one node even if there is really just  
> one!
> 
> Example:
> 
> <locations>abc        aböcd   abdec</locations>
> 
> causes two events for locations and produces output in the form of:
> 
>       [,1]    [,2]    [,3]
> [1,]  abc
> [2,]  aböcd   abdec
> 

Well, your output is particular to your text event handlers so 
what you show us does not tell us what were the inputs.
If you have two events and you got "abc     "
and "abocd      abdec" (or the trailing spaces from the first
appeared on the second and not the first), that would not
suprise me. 

The underlying XML parser is extracting content from a stream
of bytes. It makes no guarantee that contiguous text
content is delivered in a single event to the handlers.
Instead, it consumes as much of the stream as it wants
and delivers that and then continues from where it left off
in the stream. If it encounters a text node with a large amount
of text, it will deliver that in smaller chunks. 

This undoubtedly makes the processing of the stream slightly harder
for the handler as it has to remember where it "was", but this is true
of all handlers so not a significant burden.

The branches parameter of the xmlEventParse() function does provide
a way to mix SAX/event parsing with the easier DOM/node style parsing.

 D.

> 
> Should it be like that? If I remove the umlauts, than everything is  
> fine!
> 
> If I do the following:
> 
> <locations>öabc       aböcd   abdec</locations>
> 
> the output is
> 
>       [,1]    [,2]    [,3]
> [1,]  öabc    aböcd   abdec
> 
> Any suggestions?
> 
> Thanks in advance and many greetings!
> 
> Alex
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Duncan Temple Lang                [EMAIL PROTECTED]
Department of Statistics          work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA



Attachment: pgpODqf8VEnD9.pgp
Description: PGP signature

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to