Alexander Heidrich wrote: > Hi all, > > thanks to the people who replied to my question! I finally solved the > issue by writing own handlers and using xmlEventParse - which leads > to the following problem which is so odd that its probably a bug. > > I use several special charachter in my XML-File, e.g. umlauts or ° or > µ - but no matter how I encode my XML (UTF or ISO) or I escape these > characters xmlEventParse always stops parsing after the first umlaut > and pretends to have more than one node even if there is really just > one! > > Example: > > <locations>abc aböcd abdec</locations> > > causes two events for locations and produces output in the form of: > > [,1] [,2] [,3] > [1,] abc > [2,] aböcd abdec >
Well, your output is particular to your text event handlers so what you show us does not tell us what were the inputs. If you have two events and you got "abc " and "abocd abdec" (or the trailing spaces from the first appeared on the second and not the first), that would not suprise me. The underlying XML parser is extracting content from a stream of bytes. It makes no guarantee that contiguous text content is delivered in a single event to the handlers. Instead, it consumes as much of the stream as it wants and delivers that and then continues from where it left off in the stream. If it encounters a text node with a large amount of text, it will deliver that in smaller chunks. This undoubtedly makes the processing of the stream slightly harder for the handler as it has to remember where it "was", but this is true of all handlers so not a significant burden. The branches parameter of the xmlEventParse() function does provide a way to mix SAX/event parsing with the easier DOM/node style parsing. D. > > Should it be like that? If I remove the umlauts, than everything is > fine! > > If I do the following: > > <locations>öabc aböcd abdec</locations> > > the output is > > [,1] [,2] [,3] > [1,] öabc aböcd abdec > > Any suggestions? > > Thanks in advance and many greetings! > > Alex > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Duncan Temple Lang [EMAIL PROTECTED] Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Bldg. fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA
pgpODqf8VEnD9.pgp
Description: PGP signature
______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.