Success!

Thanks to everyone who helped. I needed to have the right file encoding 
parameter when using read.table().

test08 = read.table("test.csv", sep = ",", header = TRUE, 
stringsAsFactors = FALSE, fileEncoding = "UCS-2")

Upon further research:

http://technet.microsoft.com/en-us/library/bb330962%28v=sql.90%29.aspx

*International Features in Microsoft SQL Server 2005*

Generally, SQL Server stores Unicode in the UCS-2 encoding scheme.

On 10/11/2013 2:43 AM, Milan Bouchet-Valat wrote:
> Le jeudi 10 octobre 2013 à 21:45 -0700, Ira Sharenow a écrit :
>> Thanks for the suggestion. From R version 3.0.2, I tried
>>
>>   
>>
>>> testDF7 =  iconv(x = test07 , from = "UCS-2", to = "")
>>> Encoding(testDF7)
>> [1] "unknown"
>>
>>   
>>
>>> testDF7[1:6]
>> [1] NA NA NA NA NA NA
>>
>>   
>>
>> So using "UCS-2" produced the same results as before.
>>
>>   
>>
>> I do not think there are any NA values. I cleaned up the csv file from
>> within Excel. Then read it into R
>>
>>> sum(is.na(workingDF))
>> [1] 0
>>
>>   
>>
>> Also the Excel COUNTBLANK function gave me zero.
> In a previous message, Brian told you to use the 'fileEncoding' argument
> to read.table(). Please do that.
>
>
> Regards
>
>> On 10/9/2013 11:33 PM, Prof Brian Ripley wrote:
>>
>>> On 09/10/2013 10:37, Milan Bouchet-Valat wrote:
>>>> Le mardi 08 octobre 2013 à 16:02 -0700, Ira Sharenow a écrit :
>>>>> A colleague is sending me quite a few files that have been saved
>>>>> with MS
>>>>> SQL Server 2005. I am using R 2.15.1 on Windows 7.
>>>>>
>>>>> I am trying to read in the files using standard techniques.
>>>>> Although the
>>>>> file has a csv extension when I go to Excel or WordPad and do
>>>>> SAVE AS I
>>>>> see that it is Unicode Text. Notepad indicates that the encoding
>>>>> is
>>>>> Unicode. Right now I have to do a few things from within Excel
>>>>> (such as
>>>>> Text to Columns) and eventually save as a true csv file before I
>>>>> can
>>>>> read it into R and then use it.
>>>>>
>>>>> Is there an easy way to solve this from within R? I am also open
>>>>> to easy
>>>>> SQL Server 2005 solutions.
>>>>>
>>>>> I tried the following from within R.
>>>>>
>>>>> testDF = read.table("Info06.csv", header = TRUE, sep = ",")
>>>>>
>>>>>> testDF2 =  iconv(x = testDF, from = "Unicode", to = "")
>>>>> Error in iconv(x = testDF, from = "Unicode", to = "") :
>>>>>
>>>>> unsupported conversion from 'Unicode' to '' in codepage 1252
>>>>>
>>>>> # The next line did not produce an error message
>>>>>
>>>>>> testDF3 =  iconv(x = testDF, from = "UTF-8" , to = "")
>>>>>> testDF3[1:6,  1:3]
>>>>> Error in testDF3[1:6, 1:3] : incorrect number of dimensions
>>>>>
>>>>> # The next line did not produce an error message
>>>>>
>>>>>> testDF4 =  iconv(x = testDF, from = "macroman" , to = "")
>>>>>> testDF4[1:6,  1:3]
>>>>> Error in testDF4[1:6, 1:3] : incorrect number of dimensions
>>>>>
>>>>>>    Encoding(testDF3)
>>>>> [1] "unknown"
>>>>>
>>>>>>    Encoding(testDF4)
>>>>> [1] "unknown"
>>>>>
>>>>> This is the first few lines from WordPad
>>>>>
>>>>> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
>>>>>
>>>>> 2006-01-03
>>>>> 00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
>>>>>
>>>>> 2006-01-03
>>>>> 00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
>>>> What's the actual problem? You did not state any. Do you get
>>>> accentuated
>>>> characters that are not printed correctly after importing the
>>>> file? In
>>>> the two lines above it does not look like there would be any
>>>> non-ASCII
>>>> characters in this file, so encoding would not matter.
>>> It is most likely UCS-2.  That has embedded NULs, so the encoding
>>> does matter.  All 8-bit encodings extend ASCII: others do not, in
>>> general.
>>>
>>>
>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to