David, 

Thanks.  It turns out that, once you've set up your locales properly, it's
almost impossible to create an example for the problem.


I'm working with scientific text which contains a fair amount of symbols:
 degrees, plus-or-minus, etc.  When I read the text in, I specified UTF-8.  My
locale is UTF-8.  Everything would have been alright had wordStem, the stemmer
in Rstem, properly processed UTF-8.  In fact, though, it did not.  It apparently
broke up the bytes in the multibyte code for these symbols, so I usually ended
up with \xc2 or \xc3.


So the problem is clear, and I will circumvent it by removing symbols before
stemming.


Regards,
Richard

On August 2, 2010 at 7:13 PM David Winsemius <dwinsem...@comcast.net> wrote:

>
> On Aug 2, 2010, at 12:56 PM, Richard R. Liu wrote:
>
> > I have an array with names which contain multibyte characters.  When 
> > I try to
> > write the array to a file using write.table and row.names = T I 
> > receive an error
> > message when the first such name is encountered, saying that I have 
> > not
> > specified the option to generate NA instead.  I really would be 
> > satisfied if the
> > row name in the file were exactly what is displayed when I print the 
> > array on
> > the console, e.g., "en.\xc2".  The only way I have found to avoid 
> > this is create
> > a new array containing in one column a deparse of the original row 
> > name and in
> > the other the value.  This "solution" is ugly; "en.\xc2" becomes 
> > "\"en.\\xc2\"".
> >
>
> > Is there a more straight forward way of dealing with multibyte 
> > characters?
>
> Do you want to provide a worked example that produces the error? I am 
> not getting such an error
>
>  > mtx <-  matrix(1, nrow=1)
>  > rownames(mtx) <- "en.\xc2"
>  > mtx
>          [,1]
> en.\xc2    1
>  > write.table(mtx, file="test.txt")
>
> What I see in that file is
>
> "V1"
> "en.¬" 1
>
> (The character following the period is a logical negation symbol (or 
> an IBM keyboard carriage return) on my display.)
> --
> David Winsemius, MD
> West Hartford, CT
>


Richard R. Liu
richard....@pueo-owl.ch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to