>>>>> Uwe Ligges <lig...@statistik.tu-dortmund.de> >>>>> on Wed, 28 Jun 2017 18:45:59 +0200 writes:
> On 27.06.2017 17:36, Martin Maechler wrote: >> This is a continuation of the R-devel thread with subject >> "suggestion to fix packageDescription() for Windows users" : >> >> As I said there, a patch should rather address the underlying >> problem in packageDescription rather than a kludgy workaround >> patch for citation(). >> (For that same reason, Ben Marwick proposed to fix >> packageDescription() rather than the symptom seen in citation().) >> >> It's not hard to see that the problem is that iconv() in >> Windows does not always succeed to translate from "UTF-8" to the >> "current locale", in the case mentioned there. >> >> I'm giving some easier reproducible examples: no need to install >> half of tidyverse just to get citation("readr") : >> >>> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher") >>> Encoding(x1) <- "latin1" >>> xU <- iconv(x1, "latin1", "UTF-8") >> >>> Sys.setlocale("LC_CTYPE", "Chinese") >> [1] "Chinese (Simplified)_People's Republic of China.936" >>> >>> iconv(x1, "latin1", "") # NA NA NA >> [1] NA NA NA >>> iconv(xU, "UTF-8", "") # NA NA NA >> [1] NA NA NA >>> iconv(xU, "UTF-8", "//TRANSLIT") >> [1] "Ekstrøm" "Jöreskog" "bißchen Zürcher" > Interesting, I get chinese characters here. For which one of the above cases; can you show them (it may survive E-mail servers; we had other Chinese R strings on R-help / R-devel recently, right?) In any case, I think that is even worse, isn't it? As also in a Chinese locale you'd want explicit-latin1 text to see in something that looks like latin-1 (I know from a master's student that Windows+Chinese can well show latin-1-like letters also interspersed in the Chinese text), no ? > Beside the comments from Duncan Murdoch: > iconv(x1, "latin1", "", sub="?") > etc. would be an alternative in case some characters really cannot be > converted into the target encoding and should perhaps be considered for > the time after Duncan commits the fix for the underlying porblem. Yes. I'd had the same idea that's why I used it in the code I sent along. So, 1) we definitely won't commit the workaround patch for citation(). 2) I have a "workaround patch" for packageDescription() which is more useful in the sense that only if iconv() produces NA's, it tries alternatives, notably "//TRANSLIT", "ASCII//TRANSLIT" (the latter Ben also mentioned, but my patch would only use it in the NA case) and also the same 'sub="?"' that you mention above, Uwe. That patch is not Windows-specific and will automatically also help in other cases / platforms where the iconv() re-encoding leads to partial NAs. @Duncan M: would you _not_ want me to commit that either? Martin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel