On Dec 15, 2014, at 1:37 PM, Spencer Graves <spencer.gra...@prodsyse.com> wrote: > > >> On Dec 15, 2014, at 10:13 AM, Simon Urbanek <simon.urba...@r-project.org> >> wrote: >> >>> >>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik <kurt.hor...@wu.ac.at> wrote: >>> >>>>>>>> Spencer Graves writes: >>> >>>> Hello, All: >>>> What would it take to make “iconv” portable? >>> >>> >>>> I ask, because I want to convert accented characters to >>>> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and >>>> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x, >>>> “", "ASCII//TRANSLIT”)’. This worked under Windows but failed >>>> on Linux and Mac. It’s part of the “subNonStandardCharacters” >>>> function in the Ecfun package. The development version on >>>> R-Forge uses this and returns “Raul” under Windows and NA >>>> under Mac OS X (and presumably also Linux). >>> >>> Hmm. >>> >>> R> iconv("Raúl", "", "ASCII//TRANSLIT") >>> [1] "Raul" >>> >>> seems to work for me on Linux ... >>> >> >> also on OS X: >> >>> iconv("Raúl", "", "ASCII//TRANSLIT") >> [1] “Ra'ul" > > > Thanks for the replies. I should have checked my examples more > carefully. Consider the following example and a slight modification from > help(“iconv”): > > > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") > > Encoding(x) <- "latin1" > > x > [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher" > > iconv(x, "latin1", "ASCII//TRANSLIT") # platform-dependent > [1] "Ekstrom" "J\"oreskog" "bisschen Z\"urcher" > > > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") > > x > [1] "Ekstr\xf8m" "J\xf6reskog" "bi\xdfchen Z\xfcrcher" > > iconv(x, "", "ASCII//TRANSLIT") # platform-dependent > [1] NA NA NA > > > This suggests a two-step fix to my problem: (1) Check Encoding(x) > and set to “latin1” if it’s “unknown”.
Well, that depends heavily on your source. In the above it is hand-crafted latin1 so if you don't declare it, the native encoding will be assumed - which can be anything and has nothing to do with your actual input in this particular case where you hand-constructed latin1. > (2) Delete any new \” added by iconv. > The whole point of translit is to create combinations of ASCII characters that represent the unicode characters, so " is just one many characters that can be used. Cheers, S > > Thanks again, > Spencer > >> >> >> >>> -k >>> >>> >>>> The “iconv” R code merely calls compiled code, which I’ve used very >>>> little in 30 years. >>> >>> >>>> Thanks, >>>> Spencer >>> >>> >>> >>>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves >>>>> <spencer.gra...@structuremonitoring.com >>>>> <mailto:spencer.gra...@structuremonitoring.com>> wrote: >>>>> >>>>> Wonderful. Thanks very much. Spencer >>>>> >>>>> >>>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote: >>> >>>> [[alternative HTML version deleted]] >>> >>>> ______________________________________________ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel