[R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Hans-Joerg Bibiko
Dear all,

is it possible to set up RGUI or JGR on Windows PC to UTF-8 encoding?

I looked for it in mailing lists and in the documentation, but I  
couldn't figure out it.

My problem is e.g. to split a given string containing German and  
Russian words into characters.
example:

 > a <- "asdШas"
 > strsplit(a,NULL)
[[1]]
[1] "a" "s" "d" "Ш" "a" "s"

works on each Mac or Linux computer, but I didn't find a way for  
Windows.

I tried to set options(encoding) to UTF-8, I tried to use the Perl  
mode in strsplit, but I had no success. At least by using JGR I was  
able to type Russian and see my text correctly but strsplit failed.

I set RGUI to a Unicode font, no success.

I tried to save a script file in UTF-8 or UTF-16 and I tried to run  
source(FILE, encoding="***"), no success.

Is there really no way to use a Windows PC and R to work with Unicode  
texts?

Many thanks in advance for each hint,

--Hans
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Prof Brian Ripley
You didn't tell us your R version (or your locale).  Windows has no UTF-8 
locales, so a lot of work has had to be done to allow Unicode chars to be 
handled on Windows.


Please look into 2.7.0 RC, and in particular its CHANGES file at

https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES


On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote:


Dear all,

is it possible to set up RGUI or JGR on Windows PC to UTF-8 encoding?

I looked for it in mailing lists and in the documentation, but I
couldn't figure out it.

My problem is e.g. to split a given string containing German and
Russian words into characters.
example:

> a <- "asdШas"
> strsplit(a,NULL)
[[1]]
[1] "a" "s" "d" "Ш" "a" "s"

works on each Mac or Linux computer, but I didn't find a way for
Windows.

I tried to set options(encoding) to UTF-8, I tried to use the Perl
mode in strsplit, but I had no success. At least by using JGR I was
able to type Russian and see my text correctly but strsplit failed.

I set RGUI to a Unicode font, no success.

I tried to save a script file in UTF-8 or UTF-16 and I tried to run
source(FILE, encoding="***"), no success.

Is there really no way to use a Windows PC and R to work with Unicode
texts?

Many thanks in advance for each hint,

--Hans
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Hans-Joerg Bibiko
On 21 Apr 2008, at 11:33, Prof Brian Ripley wrote:
> You didn't tell us your R version (or your locale).  Windows has no  
> UTF-8 locales, so a lot of work has had to be done to allow Unicode  
> chars to be handled on Windows.
It was more or less a general question on R running on Windows PCs.
Normally I'm using R on a Mac or Linux. But some of my students asked  
for the Unicode support for Windows' RGUI.

> Please look into 2.7.0 RC, and in particular its CHANGES file at
>
> https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES
These are really good news!
I would like to express my gratitude toward anyone who was/is involved  
in that development!


Is it possible to download a compiled snapshot of 2.7.0 for Windows XP?

Thanks a lot,

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Prof Brian Ripley
On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote:

> On 21 Apr 2008, at 11:33, Prof Brian Ripley wrote:
>> You didn't tell us your R version (or your locale).  Windows has no UTF-8 
>> locales, so a lot of work has had to be done to allow Unicode chars to be 
>> handled on Windows.

> It was more or less a general question on R running on Windows PCs.
> Normally I'm using R on a Mac or Linux. But some of my students asked for the 
> Unicode support for Windows' RGUI.
>
>> Please look into 2.7.0 RC, and in particular its CHANGES file at
>> 
>> https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES
> These are really good news!
> I would like to express my gratitude toward anyone who was/is involved in 
> that development!

Thanks for the thanks.

> Is it possible to download a compiled snapshot of 2.7.0 for Windows XP?

Yes, http://cran.r-project.org/bin/windows/base/rtest.html
And it is due for release tomorrow.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Hans-Joerg Bibiko

On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:

>> Is it possible to download a compiled snapshot of 2.7.0 for Windows  
>> XP?
>
> Yes, http://cran.r-project.org/bin/windows/base/rtest.html
> And it is due for release tomorrow.

Many thanks! I can see the progress :)

But please forgive my incompetence. I'm not so familiar with Windows.
If I start e.g. RGUI by using: Rgui.exe LC_CTYPE=ja I can type  
Japanese, Russian, and German. strsplit works perfectly! ;)
But if I type for instance a German umlaut 'ü' it comes out as 'u'.  
OK, it is due to the fact I didn't set up Rgui in UTF-8 mode.
But how can I do this? My data are written in many different  
languages, and I want to do some statistics.

R version 2.7.0 RC (2008-04-19 r45391)
i386-pc-mingw32

locales:
all to German_Germany.1252
LC_CTYPE=Japanese_Japan.932

###

There are some minor issues.
I set Rgui's font to "Arial Unicode". This works but I have some  
troubles to place my cursor, caused by the issue that Arial Unicode is  
not a monospaced font.

If I start up Rgui in German, I can see the localized menu items, but  
for each non-ASCII character I see cryptic things. It seems to me that  
the localized strings are written in UTF-8, and Rgui expects ANSI  
characters.

###
Nevertheless, thanks a lot!

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Prof Brian Ripley

On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote:



On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:


Is it possible to download a compiled snapshot of 2.7.0 for Windows XP?


Yes, http://cran.r-project.org/bin/windows/base/rtest.html
And it is due for release tomorrow.


Many thanks! I can see the progress :)

But please forgive my incompetence. I'm not so familiar with Windows.
If I start e.g. RGUI by using: Rgui.exe LC_CTYPE=ja I can type Japanese, 
Russian, and German. strsplit works perfectly! ;)
But if I type for instance a German umlaut 'ü' it comes out as 'u'. OK, it is 
due to the fact I didn't set up Rgui in UTF-8 mode.


Entering at the keyboard in more than one language is close to impossible 
(not quite, as 'Japanese' covers a few but you need a Japanese keyboard to 
do it).  You can't change the language of Windows just by setting locales.


But how can I do this? My data are written in many different languages, and I 
want to do some statistics.


You can read in files in known encodings, though.


R version 2.7.0 RC (2008-04-19 r45391)
i386-pc-mingw32

locales:
all to German_Germany.1252
LC_CTYPE=Japanese_Japan.932

###

There are some minor issues.
I set Rgui's font to "Arial Unicode". This works but I have some troubles to 
place my cursor, caused by the issue that Arial Unicode is not a monospaced 
font.


Right, and you are warned not to do that.  You must use a fixed-width 
font, and for CJK characters, one in the standard single/double spacing.


(See for example the comments in Rconsole and rw-FAQ 3.5.  The GUI 
preferrences dialog only offers fixed-width fonts, so you have to work 
quite hard to do anything else.)


If I start up Rgui in German, I can see the localized menu items, but for 
each non-ASCII character I see cryptic things. It seems to me that the 
localized strings are written in UTF-8, and Rgui expects ANSI characters.


Argh, yes, that was an error by the translator in marking the file -- 
thanks, I just have time to fix it.  (RGui does not expect ANSI, but all 
of R does expect translations to be in the encoding they are declared to 
be-- this eas declared as ISO-8859-1.)



###
Nevertheless, thanks a lot!

--Hans




--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-22 Thread Hans-Joerg Bibiko

On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:

>> Is it possible to download a compiled snapshot of 2.7.0 for Windows  
>> XP?
> Yes, http://cran.r-project.org/bin/windows/base/rtest.html
> And it is due for release tomorrow.

I played with 2.7.0 on Windows XP. I can do things which couldn't be  
done with 2.6.x. Many many thanks for the effort!!!

But, I always came to a point where I didn't find a solution, due to  
the fact that Windows has no UTF-8 locale(s).
Has Windows Vista UTF-8 locales?
If I'm dealing with known languages I'm able to get rid of a lot of  
things.

But my/our problem is that we have to deal with different languages at  
the same time [in a data.frame]. Furthermore I/we have to deal with  
IPA symbols, which haven't a locale; and grep, strsplit, etc. are set  
up on top of the chosen locale. Thus I'm not able to use strsplit on a  
string which contains German, Russian, IPA-symbols, because all glyphs  
which are not part of the chosen locale are displayed [e.g. as output  
of strsplit()] literally as .

That's why the only solution is to use an UTF-8 environment (OS) or  
for hard-liners to transform each glyph into numbers and to do  
research on that numbers (which is really annoying ;).

Unfortunately at this point I have to give up. Maybe there is someone  
who can give me further advice with Windows.
The only thing, maybe, I have in mind is to use Perl, Python etc. in  
beforehand to manipulate the data before the data are analyzed using R.


--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.