Success! Thanks to everyone who helped. I needed to have the right file encoding parameter when using read.table().
test08 = read.table("test.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE, fileEncoding = "UCS-2") Upon further research: http://technet.microsoft.com/en-us/library/bb330962%28v=sql.90%29.aspx *International Features in Microsoft SQL Server 2005* Generally, SQL Server stores Unicode in the UCS-2 encoding scheme. On 10/11/2013 2:43 AM, Milan Bouchet-Valat wrote: > Le jeudi 10 octobre 2013 à 21:45 -0700, Ira Sharenow a écrit : >> Thanks for the suggestion. From R version 3.0.2, I tried >> >> >> >>> testDF7 = iconv(x = test07 , from = "UCS-2", to = "") >>> Encoding(testDF7) >> [1] "unknown" >> >> >> >>> testDF7[1:6] >> [1] NA NA NA NA NA NA >> >> >> >> So using "UCS-2" produced the same results as before. >> >> >> >> I do not think there are any NA values. I cleaned up the csv file from >> within Excel. Then read it into R >> >>> sum(is.na(workingDF)) >> [1] 0 >> >> >> >> Also the Excel COUNTBLANK function gave me zero. > In a previous message, Brian told you to use the 'fileEncoding' argument > to read.table(). Please do that. > > > Regards > >> On 10/9/2013 11:33 PM, Prof Brian Ripley wrote: >> >>> On 09/10/2013 10:37, Milan Bouchet-Valat wrote: >>>> Le mardi 08 octobre 2013 à 16:02 -0700, Ira Sharenow a écrit : >>>>> A colleague is sending me quite a few files that have been saved >>>>> with MS >>>>> SQL Server 2005. I am using R 2.15.1 on Windows 7. >>>>> >>>>> I am trying to read in the files using standard techniques. >>>>> Although the >>>>> file has a csv extension when I go to Excel or WordPad and do >>>>> SAVE AS I >>>>> see that it is Unicode Text. Notepad indicates that the encoding >>>>> is >>>>> Unicode. Right now I have to do a few things from within Excel >>>>> (such as >>>>> Text to Columns) and eventually save as a true csv file before I >>>>> can >>>>> read it into R and then use it. >>>>> >>>>> Is there an easy way to solve this from within R? I am also open >>>>> to easy >>>>> SQL Server 2005 solutions. >>>>> >>>>> I tried the following from within R. >>>>> >>>>> testDF = read.table("Info06.csv", header = TRUE, sep = ",") >>>>> >>>>>> testDF2 = iconv(x = testDF, from = "Unicode", to = "") >>>>> Error in iconv(x = testDF, from = "Unicode", to = "") : >>>>> >>>>> unsupported conversion from 'Unicode' to '' in codepage 1252 >>>>> >>>>> # The next line did not produce an error message >>>>> >>>>>> testDF3 = iconv(x = testDF, from = "UTF-8" , to = "") >>>>>> testDF3[1:6, 1:3] >>>>> Error in testDF3[1:6, 1:3] : incorrect number of dimensions >>>>> >>>>> # The next line did not produce an error message >>>>> >>>>>> testDF4 = iconv(x = testDF, from = "macroman" , to = "") >>>>>> testDF4[1:6, 1:3] >>>>> Error in testDF4[1:6, 1:3] : incorrect number of dimensions >>>>> >>>>>> Encoding(testDF3) >>>>> [1] "unknown" >>>>> >>>>>> Encoding(testDF4) >>>>> [1] "unknown" >>>>> >>>>> This is the first few lines from WordPad >>>>> >>>>> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2 >>>>> >>>>> 2006-01-03 >>>>> 00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834 >>>>> >>>>> 2006-01-03 >>>>> 00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929 >>>> What's the actual problem? You did not state any. Do you get >>>> accentuated >>>> characters that are not printed correctly after importing the >>>> file? In >>>> the two lines above it does not look like there would be any >>>> non-ASCII >>>> characters in this file, so encoding would not matter. >>> It is most likely UCS-2. That has embedded NULs, so the encoding >>> does matter. All 8-bit encodings extend ASCII: others do not, in >>> general. >>> >>> > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.