Re: [Rd] Unicode display problem with data frames under Windows
On 25 May 2015 at 19:43, Duncan Murdoch murdoch.dun...@gmail.com wrote: http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Thanks for the link and the explanation of why the bug exists. On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote: Here's a data frame with some Unicode symbols (set intersection and union). d - data.frame(x = A \u222a B \u2229 C) Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see d ## x ## 1 A U+222A B n C For future readers searching for a solution to this, you can get correct printing by setting the CTYPE part of the locale to Chinese/Japanese/Korean. Sys.setlocale(LC_CTYPE, Chinese) ## [1] Chinese (Simplified)_People's Republic of China.936 d ##x ## 1 A ∪ B ∩ C -- Regards, Richie Learning R 4dpiecharts.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unicode display problem with data frames under Windows
Am .05.2015, 09:01 Uhr, schrieb Richard Cotton richiero...@gmail.com: On 25 May 2015 at 19:43, Duncan Murdoch murdoch.dun...@gmail.com wrote: http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Thanks for the link and the explanation of why the bug exists. On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote: Here's a data frame with some Unicode symbols (set intersection and union). d - data.frame(x = A \u222a B \u2229 C) Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see d ## x ## 1 A U+222A B n C For future readers searching for a solution to this, you can get correct printing by setting the CTYPE part of the locale to Chinese/Japanese/Korean. Sys.setlocale(LC_CTYPE, Chinese) ## [1] Chinese (Simplified)_People's Republic of China.936 d ##x ## 1 A ∪ B ∩ C There is another workaround. The problem with the character transformation on printing data frames stems from format() used within print.default(). Defining your own class and print function that does not use format() allows for correct printing in all locales. Like this: d - data.frame(x = A \u222a B \u2229 C) d ## x ## 1 A U+222A B n C class(d) - c(unicode_df,data.frame) # this is print.default from base R with only two lines modified, see #old# print.unicode_df - function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE) { n - length(row.names(x)) if (length(x) == 0L) { cat(sprintf(ngettext(n, data frame with 0 columns and %d row, data frame with 0 columns and %d rows, domain = R-base), n), \n, sep = ) } else if (n == 0L) { print.default(names(x), quote = FALSE) cat(gettext(0 rows (or 0-length row.names)\n)) } else { #old# m - as.matrix(format.data.frame(x, digits = digits, #old# na.encode = FALSE)) m - as.matrix(x) if (!isTRUE(row.names)) dimnames(m)[[1L]] - if (identical(row.names, FALSE)) rep.int(, n) else row.names print(m, ..., quote = quote, right = right) } invisible(x) } d ## x ## [1,] A ∪ B ∩ C -- Erstellt mit Operas E-Mail-Modul: http://www.opera.com/mail/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Unicode display problem with data frames under Windows
Here's a data frame with some Unicode symbols (set intersection and union). d - data.frame(x = A \u222a B \u2229 C) Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see d ## x ## 1 A U+222A B n C Printing the column itself works fine. d$x ## [1] A ∪ B ∩ C ## Levels: A ∪ B ∩ C The encoding is correctly UTF-8. Encoding(as.character(d$x)) ## [1] UTF-8 Under Linux both forms of printing are fine for me. I'm not quite sure whether I've missed a setting or if this is a bug, so Am I doing something silly? Can anyone else reproduce this? -- Regards, Richie Learning R 4dpiecharts.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unicode display problem with data frames under Windows
On 25/05/2015 11:37 AM, Ista Zahn wrote: AFAIK this is the way it works on Windows. It has been discussed in several places, e.g. http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r , http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r (both of these came up when I googled the subject line of your email). Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Duncan Murdoch Best, Ista On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote: Here's a data frame with some Unicode symbols (set intersection and union). d - data.frame(x = A \u222a B \u2229 C) Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see d ## x ## 1 A U+222A B n C Printing the column itself works fine. d$x ## [1] A ∪ B ∩ C ## Levels: A ∪ B ∩ C The encoding is correctly UTF-8. Encoding(as.character(d$x)) ## [1] UTF-8 Under Linux both forms of printing are fine for me. I'm not quite sure whether I've missed a setting or if this is a bug, so Am I doing something silly? Can anyone else reproduce this? -- Regards, Richie Learning R 4dpiecharts.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unicode display problem with data frames under Windows
On 25/05/2015 12:43 PM, Duncan Murdoch wrote: On 25/05/2015 11:37 AM, Ista Zahn wrote: AFAIK this is the way it works on Windows. It has been discussed in several places, e.g. http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r , http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r (both of these came up when I googled the subject line of your email). Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Oops, I meant to write or don't use non-ascii characters, the UTF-8 locales implies non-Windows. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unicode display problem with data frames under Windows
AFAIK this is the way it works on Windows. It has been discussed in several places, e.g. http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r , http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r (both of these came up when I googled the subject line of your email). Best, Ista On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote: Here's a data frame with some Unicode symbols (set intersection and union). d - data.frame(x = A \u222a B \u2229 C) Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see d ## x ## 1 A U+222A B n C Printing the column itself works fine. d$x ## [1] A ∪ B ∩ C ## Levels: A ∪ B ∩ C The encoding is correctly UTF-8. Encoding(as.character(d$x)) ## [1] UTF-8 Under Linux both forms of printing are fine for me. I'm not quite sure whether I've missed a setting or if this is a bug, so Am I doing something silly? Can anyone else reproduce this? -- Regards, Richie Learning R 4dpiecharts.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unicode display problem with data frames under Windows
Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch murdoch.dun...@gmail.com: On 25/05/2015 11:37 AM, Ista Zahn wrote: AFAIK this is the way it works on Windows. It has been discussed in several places, e.g. http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r , http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r (both of these came up when I googled the subject line of your email). Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Duncan Murdoch I understand that these problems are not easy to fix but ... I think that most of the people who tend to do that generally use systems in UTF-8 locales is a biased perception. Developers might tend to use Mac or Linux most often. For others Windows still is and probably will be the OS most often used. For most of them switching to something else is a major hurdle. What I often witness is that those non existent Windows users try to muddle through with numerous calls to Encoding() , iconv() and the like while at the same time never being sure if the strange behavior is due to their lack of understanding, Windows specifics or due to R. In the end they either succeed with their muddling or give up, - but do not change the system. So whoever might attempt the Hercules task will be praised by thousands ;-) Best, Peter Best, Ista On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote: Here's a data frame with some Unicode symbols (set intersection and union). d - data.frame(x = A \u222a B \u2229 C) Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see d ## x ## 1 A U+222A B n C Printing the column itself works fine. d$x ## [1] A ∪ B ∩ C ## Levels: A ∪ B ∩ C The encoding is correctly UTF-8. Encoding(as.character(d$x)) ## [1] UTF-8 Under Linux both forms of printing are fine for me. I'm not quite sure whether I've missed a setting or if this is a bug, so Am I doing something silly? Can anyone else reproduce this? -- Regards, Richie Learning R 4dpiecharts.com [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Unicode display problem with data frames under Windows
On 25/05/2015 3:12 PM, Peter Meissner wrote: Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch murdoch.dun...@gmail.com: On 25/05/2015 11:37 AM, Ista Zahn wrote: AFAIK this is the way it works on Windows. It has been discussed in several places, e.g. http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r , http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r (both of these came up when I googled the subject line of your email). Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Duncan Murdoch I understand that these problems are not easy to fix but ... I think that most of the people who tend to do that generally use systems in UTF-8 locales is a biased perception. Developers might tend to use Mac or Linux most often. For others Windows still is and probably will be the OS most often used. For most of them switching to something else is a major hurdle. What I often witness is that those non existent Windows users try to muddle through with numerous calls to Encoding() , iconv() and the like while at the same time never being sure if the strange behavior is due to their lack of understanding, Windows specifics or due to R. In the end they either succeed with their muddling or give up, - but do not change the system. So whoever might attempt the Hercules task will be praised by thousands ;-) I'm not sure we disagree. R is a volunteer project, and the things that get done are the things that someone volunteers to do. But in this particular case, the volunteer needs a lot of knowledge about R internals to make progress, and there just aren't that many people like that. They are all developers. If you aren't one of those people, you need to motivate one of them to volunteer to take this on. I don't think a financial contribution would work, but people do return favours: so do something that makes one of the developers' lives a lot easier, and then point out how this particular bug is causing trouble for you, and maybe they'll choose to return the favour. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel