Re: [Rd] Unicode display problem with data frames under Windows

2015-05-26 Thread Richard Cotton
On 25 May 2015 at 19:43, Duncan Murdoch murdoch.dun...@gmail.com wrote:
 http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r

 Yes, but it is a bug, just a hard one to fix.  It needs someone to dedicate
 a serious amount of time to deal with it.

 Since most of the people who tend to do that generally use systems in UTF-8
 locales where this isn't a problem, or don't use Windows, it is languishing.

Thanks for the link and the explanation of why the bug exists.

 On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote:

  Here's a data frame with some Unicode symbols (set intersection and
  union).
 
  d - data.frame(x = A \u222a B \u2229 C)
 
  Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I
  see
 
  d
  ##  x
  ## 1 A U+222A B n C

For future readers searching for a solution to this, you can get
correct printing by setting the CTYPE part of the locale to
Chinese/Japanese/Korean.

Sys.setlocale(LC_CTYPE, Chinese)
## [1] Chinese (Simplified)_People's Republic of China.936

d
##x
## 1 A ∪ B ∩ C

-- 
Regards,
Richie

Learning R
4dpiecharts.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unicode display problem with data frames under Windows

2015-05-26 Thread Peter Meissner

Am .05.2015, 09:01 Uhr, schrieb Richard Cotton richiero...@gmail.com:


On 25 May 2015 at 19:43, Duncan Murdoch murdoch.dun...@gmail.com wrote:

http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r


Yes, but it is a bug, just a hard one to fix.  It needs someone to  
dedicate

a serious amount of time to deal with it.

Since most of the people who tend to do that generally use systems in  
UTF-8
locales where this isn't a problem, or don't use Windows, it is  
languishing.


Thanks for the link and the explanation of why the bug exists.

On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com  
wrote:


 Here's a data frame with some Unicode symbols (set intersection and
 union).

 d - data.frame(x = A \u222a B \u2229 C)

 Printing this data frame under R 3.2.0 patched (r68378) and Windows  
7, I

 see

 d
 ##  x
 ## 1 A U+222A B n C


For future readers searching for a solution to this, you can get
correct printing by setting the CTYPE part of the locale to
Chinese/Japanese/Korean.

Sys.setlocale(LC_CTYPE, Chinese)
## [1] Chinese (Simplified)_People's Republic of China.936

d
##x
## 1 A ∪ B ∩ C




There is another workaround.

The problem with the character transformation on printing data frames  
stems from format() used within print.default(). Defining your own class  
and print function that does not use format() allows for correct printing  
in all locales.


Like this:


d - data.frame(x = A \u222a B \u2229 C)
d
##  x
## 1 A U+222A B n C


class(d) - c(unicode_df,data.frame)

# this is print.default from base R with only two lines modified, see #old#
print.unicode_df - function (x, ..., digits = NULL, quote = FALSE, right  
= TRUE,

row.names = TRUE)
{
n - length(row.names(x))
if (length(x) == 0L) {
cat(sprintf(ngettext(n, data frame with 0 columns and %d row,
data frame with 0 columns and %d rows, domain = R-base),
n), \n, sep = )
}
else if (n == 0L) {
print.default(names(x), quote = FALSE)
cat(gettext(0 rows (or 0-length row.names)\n))
}
else {
#old# m - as.matrix(format.data.frame(x, digits = digits,
#old# na.encode = FALSE))
m - as.matrix(x)
if (!isTRUE(row.names))
dimnames(m)[[1L]] - if (identical(row.names, FALSE))
rep.int(, n)
else row.names
print(m, ..., quote = quote, right = right)
}
invisible(x)
}


d
##  x
## [1,] A ∪ B ∩ C




--
Erstellt mit Operas E-Mail-Modul: http://www.opera.com/mail/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Richard Cotton
Here's a data frame with some Unicode symbols (set intersection and union).

d - data.frame(x = A \u222a B \u2229 C)

Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I see

d
##  x
## 1 A U+222A B n C

Printing the column itself works fine.

d$x
## [1] A ∪ B ∩ C
## Levels: A ∪ B ∩ C

The encoding is correctly UTF-8.

Encoding(as.character(d$x))
## [1] UTF-8

Under Linux both forms of printing are fine for me.

I'm not quite sure whether I've missed a setting or if this is a bug, so

Am I doing something silly?
Can anyone else reproduce this?

-- 
Regards,
Richie

Learning R
4dpiecharts.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Duncan Murdoch

On 25/05/2015 11:37 AM, Ista Zahn wrote:

AFAIK this is the way it works on Windows. It has been discussed in several
places, e.g.
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
,
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
(both of these came up when I googled the subject line of your email).


Yes, but it is a bug, just a hard one to fix.  It needs someone to 
dedicate a serious amount of time to deal with it.


Since most of the people who tend to do that generally use systems in 
UTF-8 locales where this isn't a problem, or don't use Windows, it is 
languishing.


Duncan Murdoch


Best,
Ista
On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote:

 Here's a data frame with some Unicode symbols (set intersection and union).

 d - data.frame(x = A \u222a B \u2229 C)

 Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I
 see

 d
 ##  x
 ## 1 A U+222A B n C

 Printing the column itself works fine.

 d$x
 ## [1] A ∪ B ∩ C
 ## Levels: A ∪ B ∩ C

 The encoding is correctly UTF-8.

 Encoding(as.character(d$x))
 ## [1] UTF-8

 Under Linux both forms of printing are fine for me.

 I'm not quite sure whether I've missed a setting or if this is a bug, so

 Am I doing something silly?
 Can anyone else reproduce this?

 --
 Regards,
 Richie

 Learning R
 4dpiecharts.com

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Duncan Murdoch

On 25/05/2015 12:43 PM, Duncan Murdoch wrote:

On 25/05/2015 11:37 AM, Ista Zahn wrote:
 AFAIK this is the way it works on Windows. It has been discussed in several
 places, e.g.
 
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
 ,
 
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
 (both of these came up when I googled the subject line of your email).

Yes, but it is a bug, just a hard one to fix.  It needs someone to
dedicate a serious amount of time to deal with it.

Since most of the people who tend to do that generally use systems in
UTF-8 locales where this isn't a problem, or don't use Windows, it is
languishing.


Oops, I meant to write or don't use non-ascii characters, the UTF-8 
locales implies non-Windows.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Ista Zahn
AFAIK this is the way it works on Windows. It has been discussed in several
places, e.g.
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
,
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
(both of these came up when I googled the subject line of your email).

Best,
Ista
On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote:

 Here's a data frame with some Unicode symbols (set intersection and union).

 d - data.frame(x = A \u222a B \u2229 C)

 Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I
 see

 d
 ##  x
 ## 1 A U+222A B n C

 Printing the column itself works fine.

 d$x
 ## [1] A ∪ B ∩ C
 ## Levels: A ∪ B ∩ C

 The encoding is correctly UTF-8.

 Encoding(as.character(d$x))
 ## [1] UTF-8

 Under Linux both forms of printing are fine for me.

 I'm not quite sure whether I've missed a setting or if this is a bug, so

 Am I doing something silly?
 Can anyone else reproduce this?

 --
 Regards,
 Richie

 Learning R
 4dpiecharts.com

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Peter Meissner

Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch murdoch.dun...@gmail.com:


On 25/05/2015 11:37 AM, Ista Zahn wrote:
AFAIK this is the way it works on Windows. It has been discussed in  
several

places, e.g.
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
,
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
(both of these came up when I googled the subject line of your email).


Yes, but it is a bug, just a hard one to fix.  It needs someone to  
dedicate a serious amount of time to deal with it.


Since most of the people who tend to do that generally use systems in  
UTF-8 locales where this isn't a problem, or don't use Windows, it is  
languishing.


Duncan Murdoch



I understand that these problems are not easy to fix but ...

I think that
most of the people who tend to do that generally use systems in UTF-8  
locales
is a biased perception. Developers might tend to use Mac or Linux most  
often. For others Windows still is and probably will be the OS most often  
used. For most of them switching to something else is a major hurdle.


What I often witness is that those non existent Windows users try to  
muddle through with numerous calls to Encoding() , iconv() and the like  
while at the same time never being sure if the strange behavior is due to  
their lack of understanding, Windows specifics or due to R. In the end  
they either succeed with their muddling or give up,  - but do not change  
the system.


So whoever might attempt the Hercules task will be praised by thousands ;-)

Best, Peter




Best,
Ista
On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote:

 Here's a data frame with some Unicode symbols (set intersection and  
union).


 d - data.frame(x = A \u222a B \u2229 C)

 Printing this data frame under R 3.2.0 patched (r68378) and Windows  
7, I

 see

 d
 ##  x
 ## 1 A U+222A B n C

 Printing the column itself works fine.

 d$x
 ## [1] A ∪ B ∩ C
 ## Levels: A ∪ B ∩ C

 The encoding is correctly UTF-8.

 Encoding(as.character(d$x))
 ## [1] UTF-8

 Under Linux both forms of printing are fine for me.

 I'm not quite sure whether I've missed a setting or if this is a bug,  
so


 Am I doing something silly?
 Can anyone else reproduce this?

 --
 Regards,
 Richie

 Learning R
 4dpiecharts.com

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Duncan Murdoch

On 25/05/2015 3:12 PM, Peter Meissner wrote:

Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch murdoch.dun...@gmail.com:

 On 25/05/2015 11:37 AM, Ista Zahn wrote:
 AFAIK this is the way it works on Windows. It has been discussed in
 several
 places, e.g.
 
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
 ,
 
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
 (both of these came up when I googled the subject line of your email).

 Yes, but it is a bug, just a hard one to fix.  It needs someone to
 dedicate a serious amount of time to deal with it.

 Since most of the people who tend to do that generally use systems in
 UTF-8 locales where this isn't a problem, or don't use Windows, it is
 languishing.

 Duncan Murdoch


I understand that these problems are not easy to fix but ...

I think that
most of the people who tend to do that generally use systems in UTF-8
locales
is a biased perception. Developers might tend to use Mac or Linux most
often. For others Windows still is and probably will be the OS most often
used. For most of them switching to something else is a major hurdle.

What I often witness is that those non existent Windows users try to
muddle through with numerous calls to Encoding() , iconv() and the like
while at the same time never being sure if the strange behavior is due to
their lack of understanding, Windows specifics or due to R. In the end
they either succeed with their muddling or give up,  - but do not change
the system.

So whoever might attempt the Hercules task will be praised by thousands ;-)
I'm not sure we disagree.  R is a volunteer project, and the things that 
get done are the things that someone volunteers to do.  But in this 
particular case, the volunteer needs a lot of knowledge about R 
internals to make progress, and there just aren't that many people like 
that.   They are all developers.


If you aren't one of those people, you need to motivate one of them to 
volunteer to take this on.  I don't think a financial contribution would 
work, but people do return favours:  so do something that makes one of 
the developers' lives a lot easier, and then point out how this 
particular bug is causing trouble for you, and maybe they'll choose to 
return the favour.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel