[R] UTF-8 or Unicode on Windows PC
Dear all, is it possible to set up RGUI or JGR on Windows PC to UTF-8 encoding? I looked for it in mailing lists and in the documentation, but I couldn't figure out it. My problem is e.g. to split a given string containing German and Russian words into characters. example: > a <- "asdШas" > strsplit(a,NULL) [[1]] [1] "a" "s" "d" "Ш" "a" "s" works on each Mac or Linux computer, but I didn't find a way for Windows. I tried to set options(encoding) to UTF-8, I tried to use the Perl mode in strsplit, but I had no success. At least by using JGR I was able to type Russian and see my text correctly but strsplit failed. I set RGUI to a Unicode font, no success. I tried to save a script file in UTF-8 or UTF-16 and I tried to run source(FILE, encoding="***"), no success. Is there really no way to use a Windows PC and R to work with Unicode texts? Many thanks in advance for each hint, --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] UTF-8 or Unicode on Windows PC
You didn't tell us your R version (or your locale). Windows has no UTF-8 locales, so a lot of work has had to be done to allow Unicode chars to be handled on Windows. Please look into 2.7.0 RC, and in particular its CHANGES file at https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote: Dear all, is it possible to set up RGUI or JGR on Windows PC to UTF-8 encoding? I looked for it in mailing lists and in the documentation, but I couldn't figure out it. My problem is e.g. to split a given string containing German and Russian words into characters. example: > a <- "asdШas" > strsplit(a,NULL) [[1]] [1] "a" "s" "d" "Ш" "a" "s" works on each Mac or Linux computer, but I didn't find a way for Windows. I tried to set options(encoding) to UTF-8, I tried to use the Perl mode in strsplit, but I had no success. At least by using JGR I was able to type Russian and see my text correctly but strsplit failed. I set RGUI to a Unicode font, no success. I tried to save a script file in UTF-8 or UTF-16 and I tried to run source(FILE, encoding="***"), no success. Is there really no way to use a Windows PC and R to work with Unicode texts? Many thanks in advance for each hint, --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] UTF-8 or Unicode on Windows PC
On 21 Apr 2008, at 11:33, Prof Brian Ripley wrote: > You didn't tell us your R version (or your locale). Windows has no > UTF-8 locales, so a lot of work has had to be done to allow Unicode > chars to be handled on Windows. It was more or less a general question on R running on Windows PCs. Normally I'm using R on a Mac or Linux. But some of my students asked for the Unicode support for Windows' RGUI. > Please look into 2.7.0 RC, and in particular its CHANGES file at > > https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES These are really good news! I would like to express my gratitude toward anyone who was/is involved in that development! Is it possible to download a compiled snapshot of 2.7.0 for Windows XP? Thanks a lot, --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] UTF-8 or Unicode on Windows PC
On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote: > On 21 Apr 2008, at 11:33, Prof Brian Ripley wrote: >> You didn't tell us your R version (or your locale). Windows has no UTF-8 >> locales, so a lot of work has had to be done to allow Unicode chars to be >> handled on Windows. > It was more or less a general question on R running on Windows PCs. > Normally I'm using R on a Mac or Linux. But some of my students asked for the > Unicode support for Windows' RGUI. > >> Please look into 2.7.0 RC, and in particular its CHANGES file at >> >> https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES > These are really good news! > I would like to express my gratitude toward anyone who was/is involved in > that development! Thanks for the thanks. > Is it possible to download a compiled snapshot of 2.7.0 for Windows XP? Yes, http://cran.r-project.org/bin/windows/base/rtest.html And it is due for release tomorrow. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] UTF-8 or Unicode on Windows PC
On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote: >> Is it possible to download a compiled snapshot of 2.7.0 for Windows >> XP? > > Yes, http://cran.r-project.org/bin/windows/base/rtest.html > And it is due for release tomorrow. Many thanks! I can see the progress :) But please forgive my incompetence. I'm not so familiar with Windows. If I start e.g. RGUI by using: Rgui.exe LC_CTYPE=ja I can type Japanese, Russian, and German. strsplit works perfectly! ;) But if I type for instance a German umlaut 'ü' it comes out as 'u'. OK, it is due to the fact I didn't set up Rgui in UTF-8 mode. But how can I do this? My data are written in many different languages, and I want to do some statistics. R version 2.7.0 RC (2008-04-19 r45391) i386-pc-mingw32 locales: all to German_Germany.1252 LC_CTYPE=Japanese_Japan.932 ### There are some minor issues. I set Rgui's font to "Arial Unicode". This works but I have some troubles to place my cursor, caused by the issue that Arial Unicode is not a monospaced font. If I start up Rgui in German, I can see the localized menu items, but for each non-ASCII character I see cryptic things. It seems to me that the localized strings are written in UTF-8, and Rgui expects ANSI characters. ### Nevertheless, thanks a lot! --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] UTF-8 or Unicode on Windows PC
On Mon, 21 Apr 2008, Hans-Joerg Bibiko wrote: On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote: Is it possible to download a compiled snapshot of 2.7.0 for Windows XP? Yes, http://cran.r-project.org/bin/windows/base/rtest.html And it is due for release tomorrow. Many thanks! I can see the progress :) But please forgive my incompetence. I'm not so familiar with Windows. If I start e.g. RGUI by using: Rgui.exe LC_CTYPE=ja I can type Japanese, Russian, and German. strsplit works perfectly! ;) But if I type for instance a German umlaut 'ü' it comes out as 'u'. OK, it is due to the fact I didn't set up Rgui in UTF-8 mode. Entering at the keyboard in more than one language is close to impossible (not quite, as 'Japanese' covers a few but you need a Japanese keyboard to do it). You can't change the language of Windows just by setting locales. But how can I do this? My data are written in many different languages, and I want to do some statistics. You can read in files in known encodings, though. R version 2.7.0 RC (2008-04-19 r45391) i386-pc-mingw32 locales: all to German_Germany.1252 LC_CTYPE=Japanese_Japan.932 ### There are some minor issues. I set Rgui's font to "Arial Unicode". This works but I have some troubles to place my cursor, caused by the issue that Arial Unicode is not a monospaced font. Right, and you are warned not to do that. You must use a fixed-width font, and for CJK characters, one in the standard single/double spacing. (See for example the comments in Rconsole and rw-FAQ 3.5. The GUI preferrences dialog only offers fixed-width fonts, so you have to work quite hard to do anything else.) If I start up Rgui in German, I can see the localized menu items, but for each non-ASCII character I see cryptic things. It seems to me that the localized strings are written in UTF-8, and Rgui expects ANSI characters. Argh, yes, that was an error by the translator in marking the file -- thanks, I just have time to fix it. (RGui does not expect ANSI, but all of R does expect translations to be in the encoding they are declared to be-- this eas declared as ISO-8859-1.) ### Nevertheless, thanks a lot! --Hans -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] UTF-8 or Unicode on Windows PC
On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote: >> Is it possible to download a compiled snapshot of 2.7.0 for Windows >> XP? > Yes, http://cran.r-project.org/bin/windows/base/rtest.html > And it is due for release tomorrow. I played with 2.7.0 on Windows XP. I can do things which couldn't be done with 2.6.x. Many many thanks for the effort!!! But, I always came to a point where I didn't find a solution, due to the fact that Windows has no UTF-8 locale(s). Has Windows Vista UTF-8 locales? If I'm dealing with known languages I'm able to get rid of a lot of things. But my/our problem is that we have to deal with different languages at the same time [in a data.frame]. Furthermore I/we have to deal with IPA symbols, which haven't a locale; and grep, strsplit, etc. are set up on top of the chosen locale. Thus I'm not able to use strsplit on a string which contains German, Russian, IPA-symbols, because all glyphs which are not part of the chosen locale are displayed [e.g. as output of strsplit()] literally as . That's why the only solution is to use an UTF-8 environment (OS) or for hard-liners to transform each glyph into numbers and to do research on that numbers (which is really annoying ;). Unfortunately at this point I have to give up. Maybe there is someone who can give me further advice with Windows. The only thing, maybe, I have in mind is to use Perl, Python etc. in beforehand to manipulate the data before the data are analyzed using R. --Hans __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.