Le lundi 28 mars 2016 à 20:12 +0530, Sunny Singha a écrit : > Milan, > Ok, Let me take a case of facebook. I used Rfacebook package > to get posts (getPost()) which returns list() of data frames(post, > comments, Likes) > > let me demonstrate 2 cases of read and write just as you suggested, > Case 1::::::::: > Lets say one of the facebook comment has below string value, in > Japanese language--> > "世界餐福事工 - 餐廳員工沒精打采 老是打盤子" > > On R console I now assign above string to variableas: x <- "世界餐福事工 - > 餐廳員工沒精打采 老是打盤子" > and write it as below: > write.csv(x, file='x.csv', row.names=F, fileEncoding='UTF-8') > I get this string in the file > "" - > " But how do you read back the contents of the file? You need to specify the encoding when reading it too.
> Case 2:::::::::::::: > I create a notepad 'x.txt' and save Japanese string "世界餐福事工 - 餐廳員工沒精打采 老是打盤子" > and read it as below: > read.table('x.txt', fileEncoding='UTF-8'), I get below output: > > V1 > 1 ? > Warning messages: > 1: In read.table("x.txt", fileEncoding = "UTF-8") : > invalid input found on input connection 'x.txt' > 2: In read.table("x.txt", fileEncoding = "UTF-8") : > incomplete final line found by readTableHeader on 'x.txt' Are you sure the notepad saved the text as UTF-8? > Above was for demonstration, I'm infact reading social media data > extracted, which ultimately is somewhere using httr package and > returning data frames. > I'm not sure how should I get it handled in Windows as I don't observe > this behavior in Mac where system locase is set to 'en_US.UTF-8' > > Regards, > Sunny > > > > > On Mon, Mar 28, 2016 at 7:39 PM, Milan Bouchet-Valat wrote: > > > > Le lundi 28 mars 2016 à 19:16 +0530, Sunny Singha a écrit : > > > > > > Hi, > > > I think I'm experiencing an issue regarding system Locale. I have > > > exported '.csv' formatted data frames gathered from various social > > > media platforms like facebook/twitter/G+, etc. > > > > > > I observe many variable/columns consists of strings formatted similar to > > > below: > > > " > > > " > > > > > > As expected and I confirmed, in social media data, they are strings in > > > different languages. > > > Platform details are provide in the end of this mail. OS locale is set > > > to English (United States) hence 'R' locale is 'English_United > > > States.1252' > > > > > > I have attempted to change it to UTF-8 but receives below warning message: > > > > > > Warning message: > > > In Sys.setlocale("LC_ALL", "UTF-8") : > > > OS reports request to set locale to "UTF-8" cannot be honored > > You don't need to set the locale. Just pass an appropriate value (e.g. > > "UTF-8") to read.csv() or write.csv()'s fileEncoding argument. > > > > You also didn't tell us what program you used to read these files. Some > > might guess the encoding incorrectly, or require you to choose it > > manually. > > > > > > Regards > > > > > > > > I have gone through below forums but no resolution so far: > > > --- > > > http://stackoverflow.com/questions/20571147/how-to-set-unicode-locale-in-r > > > --- https://stat.ethz.ch/pipermail/r-devel/2013-November/067940.html > > > --- http://stackoverflow.com/questions/19877676/write-utf-8-files-from-r > > > --- https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/ > > > --- > > > http://withr.me/configure-character-encoding-for-r-under-linux-and-windows/ > > > > > > I'm not sure whether the issue is while reading/extracting the data > > > from media or while writing/exporting in Windows directory, but I > > > don't experience similar issue in my personal Mac machine. I need some > > > clarification here. > > > > > > How could I export the data just as I see on web ? Please guide. > > > > > > Regards, > > > Sunny > > > > > > Platform I'm using:::::::::::::::::::::::::::: > > > Operating System : Windows 7 Professional SP1 > > > R version details: > > > platform x86_64-w64-mingw32 > > > arch x86_64 > > > os mingw32 > > > system x86_64, mingw32 > > > status > > > major 3 > > > minor 2.3 > > > year 2015 > > > month 12 > > > day 10 > > > svn rev 69752 > > > language R > > > version.string R version 3.2.3 (2015-12-10) > > > nickname Wooden Christmas-Tree > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.