Thanks! I tried my C++ program based on R externals and the same R script and found the results shown are the desired glyph. Hence this is R windows specific problem.
On Wed, Jun 2, 2021 at 9:08 PM brodie gaslam <brodie.gas...@yahoo.com> wrote: > > > On Wednesday, June 2, 2021, 7:58:54 PM EDT, xiaoyan yu < > xiaoyan...@gmail.com> wrote: > > > > I am using gmail. Not sure of the configuration of plain text. > > The memory pointed by the char * as the output of Rf_translateChar() is > > actually the string "<U+BD80><U+C2E4>". > > Hi Xiaoyan, > > Unfortunately I'm not super familiar with R on Windows, but I think > I can provide a simpler reproducible example. In Rgui, if I type "\UBD80" > at the prompt and hit enter, I see the desired glyph. In Rterm I see the > unicode escape. > > IIRC the capabilities of Rterm and Rgui are different, and UTF8 support > in windows is limited. Tomas Kalibera discusses this in some detail: > > > https://developer.r-project.org/Blog/public/2020/05/02/utf-8-support-on-windows/index.html > > In terms of `Rf_translateChar()`, presumably the `Riconv` call is failing > on Rterm, but not on Rgui: > > https://github.com/r-devel/r-svn/blob/master/src/main/sysutils.c#L924 > > I'm guessing, but that would explain why the C level string is in that > format. I don't know why the string would translate in Rgui though. My > guess is that it did not as even in Rgui the following: > > enc2native("\uBD80") > > Produces the escaped version of the string. > > As others have suggested you could try the experimental UCRT Windows > release: > > > https://developer.r-project.org/Blog/public/2021/03/12/windows/utf-8-toolchain-and-cran-package-checks/index.html > > Install instructions (focus on Binary installer): > > > https://svn.r-project.org/R-dev-web/trunk/WindowsBuilds/winutf8/ucrt3/howto.html > > If I try UCRT on my system this no longer produces the escape: > > enc2native("\uBD80") > > Although all I see is a question mark. My guess is that my code page or > something similar is not set right. Examining with `charToRaw` reveals > the string remains in UTF-8 encoding. > > Aside: it's not clear to me that you need to translate the string if your > intent is for it to remain UTF-8. You just don't seem to be set-up to > interpret UTF-8 strings currently. > > Best, > > B > > > On Wed, Jun 2, 2021 at 6:09 PM David Winsemius <dwinsem...@comcast.net> > > wrote: > > > >> First; you should configure yopu mail client to send plain text. > >> > >> Can you explain what is meant by: > >> > >> the characters are unicodes (<U+BD80><U+C2E4>) instead of > >> utf8 encoding of the korean characters 부실. > >> > >> As far as I can tell those two unicodes _are_ the utf8 encodings of 부실. > >> > >> You may need to consult a couple of R help pages. I suggest: > >> > >> ?Quotes > >> ?points # has examples of changing fonts used for display on console. > >> > >> Sorry if I've misunderstood. I'm not on a Windows device, so posting > the > >> C++ program won't be helpful, but maybe it would for other prospective > >> respondents. > >> > >> -- > >> David. > >> > >> On 6/2/21 1:33 PM, xiaoyan yu wrote: > >> > I have a R Script Predict.R: > >> > set.seed(42) > >> > C <- seq(1:1000) > >> > A <- rep(seq(1:200),5) > >> > E <- (seq(1:1000) * (0.8 + (0.4*runif(50, 0, 1)))) > >> > L <- ifelse(runif(1000)>.5,1,0) > >> > df <- data.frame(cbind(C, A, E, L)) > >> > load("C:/Temp/tree.RData") # load the model for > scoring > >> > > >> > P <- as.character(predict(tree_model_1,df,type='class')) > >> > > >> > Then in a C++ program > >> > I call eval to evaluate the script and then findVar the P variable. > >> > After get each class label from P using string_elt and then > >> > Rf_translateChar, the characters are unicodes (<U+BD80><U+C2E4>) > instead > >> of > >> > utf8 encoding of the korean characters 부실. > >> > Can I know how to get UTF8 by using R externals? > >> > > >> > I also found the same script giving utf8 characters in RGui but > unicode > >> in > >> > Rterm. > >> > I tried to attach a screenshot but got message "The message's content > >> type > >> > was not explicitly allowed" > >> > In RGui, I saw the output 부실, while in Rterm, <U+BD80><U+C2E4>. > >> > > >> > Please help. > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-devel@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > > >> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel