This is a continuation of the R-devel thread with subject
"suggestion to fix packageDescription() for Windows users" :
As I said there, a patch should rather address the underlying
problem in packageDescription rather than a kludgy workaround
patch for citation().
(For that same reason, Ben Marwick proposed to fix
packageDescription() rather than the symptom seen in citation().)
It's not hard to see that the problem is that iconv() in
Windows does not always succeed to translate from "UTF-8" to the
"current locale", in the case mentioned there.
I'm giving some easier reproducible examples: no need to install
half of tidyverse just to get citation("readr") :
x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x1) <- "latin1"
xU <- iconv(x1, "latin1", "UTF-8")
Sys.setlocale("LC_CTYPE", "Chinese")
[1] "Chinese (Simplified)_People's Republic of China.936"
iconv(x1, "latin1", "") # NA NA NA
[1] NA NA NA
iconv(xU, "UTF-8", "") # NA NA NA
[1] NA NA NA
iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstrøm" "Jöreskog" "bißchen Zürcher"
iconv(xU, "UTF-8", "", sub = "byte")
[1] "Ekstr<c3><b8>m" "J<c3><b6>reskog" "bi<c3><9f>chen Z¨¹rcher"
Sys.setlocale("LC_CTYPE", "Arabic")
[1] "Arabic_Saudi Arabia.1256"
iconv(x1, "latin1", "") # NA NA NA
[1] NA NA NA
iconv(xU, "UTF-8", "") # NA NA NA
[1] NA NA NA
iconv(xU, "UTF-8", "//TRANSLIT")
[1] "Ekstr\370m" "J\366reskog" "bißchen Zürcher"
iconv(xU, "UTF-8", "", sub="byte")
[1] "Ekstr<c3><b8>m" "J<c3><b6>reskog" "bi<c3><9f>chen Zürcher"
iconv(xU, "UTF-8", "", sub="?")
[1] "Ekstr??m" "J??reskog" "bi??chen Zürcher"
Etc... . As the above is typically garbled between e-mail
transfer agents, I append both the iconv-Windows.R R script and
the corresponding iconv-Windows.Rout R transcript to this
e-mail (using MIME type text/plain (easy using emacs for mail..)),
and they contain a bit more than the above.
Note that the above shows that using 'sub = *' and using
"//TRANSLIT" in case of a previous NA result helps quite a bit,
in the sense that it gives much more information to see
"J?reskog" instead NA.
I'm considering updating packageDescription() to try these in
case it first returns NA. This would make the citation() hack
unnecessary.