Le lundi 15 décembre 2014 à 13:49 -0500, Simon Urbanek a écrit : > On Dec 15, 2014, at 1:37 PM, Spencer Graves <spencer.gra...@prodsyse.com> > wrote: > > > > > >> On Dec 15, 2014, at 10:13 AM, Simon Urbanek <simon.urba...@r-project.org> > >> wrote: > >> > >>> > >>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik <kurt.hor...@wu.ac.at> wrote: > >>> > >>>>>>>> Spencer Graves writes: > >>> > >>>> Hello, All: > >>>> What would it take to make “iconv” portable? > >>> > >>> > >>>> I ask, because I want to convert accented characters to > >>>> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and > >>>> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x, > >>>> “", "ASCII//TRANSLIT”)’. This worked under Windows but failed > >>>> on Linux and Mac. It’s part of the “subNonStandardCharacters” > >>>> function in the Ecfun package. The development version on > >>>> R-Forge uses this and returns “Raul” under Windows and NA > >>>> under Mac OS X (and presumably also Linux). > >>> > >>> Hmm. > >>> > >>> R> iconv("Raúl", "", "ASCII//TRANSLIT") > >>> [1] "Raul" > >>> > >>> seems to work for me on Linux ... > >>> > >> > >> also on OS X: > >> > >>> iconv("Raúl", "", "ASCII//TRANSLIT") > >> [1] “Ra'ul" > > > > > > Thanks for the replies. I should have checked my examples more > > carefully. Consider the following example and a slight modification from > > help(“iconv”): > > > > > > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") > > > Encoding(x) <- "latin1" > > > x > > [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher" > > > iconv(x, "latin1", "ASCII//TRANSLIT") # platform-dependent > > [1] "Ekstrom" "J\"oreskog" "bisschen Z\"urcher" > > > > > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") > > > x > > [1] "Ekstr\xf8m" "J\xf6reskog" "bi\xdfchen Z\xfcrcher" > > > iconv(x, "", "ASCII//TRANSLIT") # platform-dependent > > [1] NA NA NA > > > > > > This suggests a two-step fix to my problem: (1) Check Encoding(x) > > and set to “latin1” if it’s “unknown”. > > Well, that depends heavily on your source. In the above it is hand-crafted > latin1 so if you don't declare it, the native encoding will be assumed - > which can be anything and has nothing to do with your actual input in this > particular case where you hand-constructed latin1. > > > > (2) Delete any new \” added by iconv. > > > > The whole point of translit is to create combinations of ASCII > characters that represent the unicode characters, so " is just one > many characters that can be used. But it's quite unexpected that ö is transliterated to "o and ú to 'u. Looks like iconv on OS X has a different idea of what ASCII transliteration means than on Linux and Windows...
Anyway it's easy to remove " and ' if needed. Regards > Cheers, > S > > > > > > Thanks again, > > Spencer > > > >> > >> > >> > >>> -k > >>> > >>> > >>>> The “iconv” R code merely calls compiled code, which I’ve used very > >>>> little in 30 years. > >>> > >>> > >>>> Thanks, > >>>> Spencer > >>> > >>> > >>> > >>>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves > >>>>> <spencer.gra...@structuremonitoring.com > >>>>> <mailto:spencer.gra...@structuremonitoring.com>> wrote: > >>>>> > >>>>> Wonderful. Thanks very much. Spencer > >>>>> > >>>>> > >>>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote: > >>> > >>>> [[alternative HTML version deleted]] > >>> > >>>> ______________________________________________ > >>>> R-devel@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel