Le mardi 11 décembre 2012 à 23:39 +0100, Richard Zijdeman a écrit : > Dear Milan, please see my results inline > > On 11 Dec 2012, at 16:58, Milan Bouchet-Valat <nalimi...@club.fr> wrote: > > > Le mardi 11 décembre 2012 à 16:41 +0100, Richard Zijdeman a écrit : > >> Dear Milan, > >> > >> thank you for kind suggestion. Converting the characters using: > >>> iconv(department, "ISO-8859-15", "UTF-8") > >> indeed improves the situation in that now all values (names of > >> departments) are displayed in the plot, although the specific special > >> characters are unfortunately appearing as empty boxes. > > I wouldn't call that an improvement... :-/ > > > > What's the result of the other one, i.e. > > iconv(department, "UTF-16", "UTF-8") > > That does not change the outcome, i.e. the names of departments with > special characters are not plotted at all. > > > > >> I have tried to see whether I could 'save' my state file using UTF-8 > >> format, and although this proves to be a popular request it does not > >> seem to have been incorporated in Stata. > > You should not need this. iconv() should be able to convert the strings > > to something usable. The problem is to determine what's the original > > encoding. Could you call > > lapply(department, charToRaw) > > > > and post the output? > > Thanks for providing another suggestions. I have selected 3 cases from > the dataset I am working with that are problematic and have made new > vars based on the iconv conversion. The department variable is called > 'liac' and I now have next to the original three different versions > based on the the UTF16, ISO-8859-1 and ISO-8859-15 conversion. I hope > I executed it properly, but there seems to be an error when executing > your code on the original variable. I guess that's because it's a factor, so you should call as.character() on it first.
But Duncan's solution is the most practical one (though you'll probably have to do the same for "é"). Regards > ## start results > > head(tra.s) > liac liac2 liac3 liac1 > 18 Ard\x8fche Ard\u008fche Ard\u008fche <NA> > 29 Corr\x8fze Corr\u008fze Corr\u008fze <NA> > 31 Vend\x8ee Vend\u008ee Vend\u008ee 噥湤蹥 > > lapply(tra.s$liac,charToRaw) # original (stata import) > Error in FUN(X[[1L]], ...) : > argument must be a character vector of length 1 > > lapply(tra.s$liac1, charToRaw) # UTF16 -> UTF-8 > [[1]] > [1] 4e 41 > > [[2]] > [1] 4e 41 > > [[3]] > [1] e5 99 a5 e6 b9 a4 e8 b9 a5 > > > lapply(tra.s$liac2, charToRaw) # ISO-8859-1 -> UTF-8 > [[1]] > [1] 41 72 64 c2 8f 63 68 65 > > [[2]] > [1] 43 6f 72 72 c2 8f 7a 65 > > [[3]] > [1] 56 65 6e 64 c2 8e 65 > > > lapply(tra.s$liac3, charToRaw) # ISO-8859-15 -> UTF-8 > [[1]] > [1] 41 72 64 c2 8f 63 68 65 > > [[2]] > [1] 43 6f 72 72 c2 8f 7a 65 > > [[3]] > [1] 56 65 6e 64 c2 8e 65 > ## end results > > Best wishes and thanks, > > Richard > > > > > > > Regards > > > >> Best and thank you for your help, > >> > >> Richard > >> > >> > >> On 11 Dec 2012, at 12:11, Milan Bouchet-Valat <nalimi...@club.fr> wrote: > >> > >>> Le mardi 11 décembre 2012 à 01:10 +0100, Richard Zijdeman a écrit : > >>>> Dear all, > >>>> > >>>> I have imported a dataset from Stata using the foreign package. The > >>>> original data contain French characters such as and . > >>>> After importing, string variables containing names of French > >>>> departments have changed. E.g. Ardche became Ard\x8fche. I would like > >>>> to ask how I could plot these changed strings, since now the strings > >>>> with special characters fail to be printed in the plot (either using > >>>> plot() or ggplot2()). > >>>> > >>>> I have googled for solutions, but actually find it hard to determine > >>>> whether I should change my R setup or should read in the data in a > >>>> different way. Since I work on a mac I changed my local according to > >>>> the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and > >>>> code and output on what works for me and what does not. Thank you in > >>>> advance for you comments. > >>> Accentuated characters should work fine on a machine using a UTF-8 > >>> locale as yours. I think the problem is that the imported data uses > >>> ISO8859-15 or UTF-16, not UTF-8. > >>> > >>> I have no idea whether .dta files specify an encoding or not, but I > >>> think you can convert them in R by calling > >>> iconv(department, "ISO-8859-15", "UTF-8") > >>> or > >>> iconv(department, "UTF-16", "UTF-8") > >>> > >>>> Best, > >>>> > >>>> Richard > >>>> > >>>> #-------------- > >>>> rm(list=ls()) > >>>> sessionInfo() > >>>> # R version 2.15.2 (2012-10-26) > >>>> # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > >>>> # > >>>> # locale: > >>>> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > >>>> > >>>> # creating variables > >>>> department <- c("Nord","Paris","Ard\x8fche") > >>> \x8 does not correspond to "è" AFAIK. In ISO8859-1 and -15 and UTF-16, > >>> it's \xE8 ("\uE8" in R). > >>> > >>> In UTF-8, it's C3 A8, "\303\250" in R. > >>> > >>>> department2 <- c("Nord", "Paris", "Ardche") > >>>> n <- c(2,4,1) > >>>> > >>>> # creating dataframes > >>>> df <- data.frame(department,n) > >>>> df2 <- data.frame(department2,n) > >>>> > >>>> department > >>>> # [1] "Nord" "Paris" "Ard\x8fche" > >>>> department2 > >>>> # [1] "Nord" "Paris" "Ardche" > >>>> > >>>> plot(df) # fails to show the text "Ardche" > >>>> plot(df2) # shows text "Ardche" > >>>> > >>>> # EOF > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.