Am .05.2015, 09:01 Uhr, schrieb Richard Cotton <richiero...@gmail.com>:

On 25 May 2015 at 19:43, Duncan Murdoch <murdoch.dun...@gmail.com> wrote:
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r

Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate
a serious amount of time to deal with it.

Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing.

Thanks for the link and the explanation of why the bug exists.

On May 25, 2015 9:39 AM, "Richard Cotton" <richiero...@gmail.com> wrote:

> Here's a data frame with some Unicode symbols (set intersection and
> union).
>
> d <- data.frame(x = "A \u222a B \u2229 C")
>
> Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I
> see
>
> d
> ##                  x
> ## 1 A <U+222A> B n C

For future readers searching for a solution to this, you can get
correct printing by setting the CTYPE part of the locale to
Chinese/Japanese/Korean.

Sys.setlocale("LC_CTYPE", "Chinese")
## [1] "Chinese (Simplified)_People's Republic of China.936"

d
##            x
## 1 A ∪ B ∩ C



There is another workaround.

The problem with the character transformation on printing data frames stems from format() used within print.default(). Defining your own class and print function that does not use format() allows for correct printing in all locales.

Like this:


d <- data.frame(x = "A \u222a B \u2229 C")
d
##                  x
## 1 A <U+222A> B n C


class(d) <- c("unicode_df","data.frame")

# this is print.default from base R with only two lines modified, see #old#
print.unicode_df <- function (x, ..., digits = NULL, quote = FALSE, right = TRUE,
    row.names = TRUE)
{
    n <- length(row.names(x))
    if (length(x) == 0L) {
        cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
            "data frame with 0 columns and %d rows", domain = "R-base"),
            n), "\n", sep = "")
    }
    else if (n == 0L) {
        print.default(names(x), quote = FALSE)
        cat(gettext("<0 rows> (or 0-length row.names)\n"))
    }
    else {
        #old# m <- as.matrix(format.data.frame(x, digits = digits,
        #old#     na.encode = FALSE))
        m <- as.matrix(x)
        if (!isTRUE(row.names))
            dimnames(m)[[1L]] <- if (identical(row.names, FALSE))
                rep.int("", n)
            else row.names
        print(m, ..., quote = quote, right = right)
    }
    invisible(x)
}


d
##              x
## [1,] A ∪ B ∩ C




--
Erstellt mit Operas E-Mail-Modul: http://www.opera.com/mail/

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to