Am .05.2015, 09:01 Uhr, schrieb Richard Cotton <richiero...@gmail.com>:
On 25 May 2015 at 19:43, Duncan Murdoch <murdoch.dun...@gmail.com> wrote:
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
Yes, but it is a bug, just a hard one to fix. It needs someone to
dedicate
a serious amount of time to deal with it.
Since most of the people who tend to do that generally use systems in
UTF-8
locales where this isn't a problem, or don't use Windows, it is
languishing.
Thanks for the link and the explanation of why the bug exists.
On May 25, 2015 9:39 AM, "Richard Cotton" <richiero...@gmail.com>
wrote:
> Here's a data frame with some Unicode symbols (set intersection and
> union).
>
> d <- data.frame(x = "A \u222a B \u2229 C")
>
> Printing this data frame under R 3.2.0 patched (r68378) and Windows
7, I
> see
>
> d
> ## x
> ## 1 A <U+222A> B n C
For future readers searching for a solution to this, you can get
correct printing by setting the CTYPE part of the locale to
Chinese/Japanese/Korean.
Sys.setlocale("LC_CTYPE", "Chinese")
## [1] "Chinese (Simplified)_People's Republic of China.936"
d
## x
## 1 A ∪ B ∩ C
There is another workaround.
The problem with the character transformation on printing data frames
stems from format() used within print.default(). Defining your own class
and print function that does not use format() allows for correct printing
in all locales.
Like this:
d <- data.frame(x = "A \u222a B \u2229 C")
d
## x
## 1 A <U+222A> B n C
class(d) <- c("unicode_df","data.frame")
# this is print.default from base R with only two lines modified, see #old#
print.unicode_df <- function (x, ..., digits = NULL, quote = FALSE, right
= TRUE,
row.names = TRUE)
{
n <- length(row.names(x))
if (length(x) == 0L) {
cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
"data frame with 0 columns and %d rows", domain = "R-base"),
n), "\n", sep = "")
}
else if (n == 0L) {
print.default(names(x), quote = FALSE)
cat(gettext("<0 rows> (or 0-length row.names)\n"))
}
else {
#old# m <- as.matrix(format.data.frame(x, digits = digits,
#old# na.encode = FALSE))
m <- as.matrix(x)
if (!isTRUE(row.names))
dimnames(m)[[1L]] <- if (identical(row.names, FALSE))
rep.int("", n)
else row.names
print(m, ..., quote = quote, right = right)
}
invisible(x)
}
d
## x
## [1,] A ∪ B ∩ C
--
Erstellt mit Operas E-Mail-Modul: http://www.opera.com/mail/
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel