David, Thanks. When I am back at work I will try to find out some specifics regarding the original data base and how the reports are generated. The differencs are not apparent via manual inspection.
I will look at the csv file in an editor as well and look into xlsReadWrite. I agree EXCEL formatting of dates can be a problem. regards Bob > I am a bit confused by this. You are doing a transfer from Excel (.xls > or .xlsx) to .csv, then a subset in R and ending up with a couple of > entries which are " Open" rather than "Open". So where are they coming > from? You say they are not in the original Excel, so that suggests the > transfer to .csv is the problem. I would be very surprised if the subset > was a problem, but as others have indicated transfer to .csv can be > downright ugly. > > You can check the .csv file by opening it in an editor (I use Emacs). > Just go to the line and have a look if the extra space is there nestling > between two commas. > > The other advice is, don't go through .csv. Go directly from Excel to R. > My favourite tools are RODBC and xlsReadWrite for that step. Both work > extremely well. > > As others have indicated, the big bugbear in the .csv route is dates, or > what Excel decides are dates. My experience was the conversion of New > Zealand health ID numbers to dates. They are three letters then 4 > digits, so AUG2699 became a date. > > David Scott > > On 14/01/2011 10:58 p.m., bgr...@dyson.brisnet.org.au wrote: >> Hello David, >> >> Thanks for your e-mail. The data was a report derived from a statewide >> database, saved in EXCEL format, so the usual issue of the vagaries of >> human data entry variation wasn't the issue as the data was an automated >> report, which is run every three months. I would not have even noticed >> this problem if I hadn't been double checking the numbers of people by >> district. Visual inspection didn't reveal this problem - no white space >> was obvious and the spelling was identical. Tabulation via R wouldn't >> have >> detected this - I was obtaining the EXCEL totals via filter which I then >> compared with R output. I'm hoping I can skip this step, in future, with >> Jim's suggestion. >> >> regards >> >> Bob >> >> >>> As a further note, this is a reminder that whenever you get data via a >>> spreadsheet the first thing to do is examine it and clean up any >>> problems. A basic requirement is to tabulate any categorical variable. >>> Spreadsheets allow any sort of data to be entered, with no controls. My >>> experience is that those who enter data into spreadsheets enter all >>> sorts of variations of what a human would wish to treat as the same >>> ("Open", "Open ", "open", etc.), even when told not to. >>> >>> David Scott >>> >>> On 14/01/2011 4:03 p.m., Jim Holtman wrote: >>>> try strip.white=TRUE to strip out white space >>>> >>>> Sent from my iPad >>>> >>>> On Jan 13, 2011, at 21:44, bgr...@dyson.brisnet.org.au wrote: >>>> >>>>> >>>>> I have a frustrating issue which I am hoping someone may have a >>>>> suggestion >>>>> about. >>>>> >>>>> I am running XP and R 2.12.0 and saved an EXCEL file that I was sent >>>>> as >>>>> a >>>>> csv file. >>>>> >>>>> The initial code I ran follows. >>>>> >>>>> dec<- read.csv("g://FMH/FO30122010.csv",header=T) >>>>> dec.open<- subset (dec, Status == "Open") >>>>> table(dec.open$AMHS) >>>>> >>>>> I was checking the output and noticed a difference between my manual >>>>> count >>>>> and R output. Two subject's rows were not being detected by the >>>>> subset >>>>> command: >>>>> >>>>> For the AMHS where there was a discrepancy I then ran: >>>>> wm<- subset (dec, AMHS == "WM") >>>>> >>>>> The problem appears to be that there is a space before the 'Open" >>>>> value >>>>> for two indivduals, as per the example below. >>>>> >>>>> 10/02/2010 Open >>>>> 22/08/2007 Open >>>>> >>>>> Checking in EXCEL there does not appear to be a space and the format >>>>> is >>>>> the same (e.g 'general'). I resolved the problem by copying over the >>>>> values for the two individuals where I identified a problem. >>>>> >>>>> Given this problem was not detected by visual scanning I would >>>>> appreciate >>>>> advice on how this problem can be detected in future without my >>>>> having >>>>> to >>>>> manually check raw data against R output. >>>>> >>>>> Any assistance is appreciated, >>>>> >>>>> Bob >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> -- >>> _________________________________________________________________ >>> David Scott Department of Statistics >>> The University of Auckland, PB 92019 >>> Auckland 1142, NEW ZEALAND >>> Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 >>> Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 >>> >>> Director of Consulting, Department of Statistics >>> >>> >> > > > -- > _________________________________________________________________ > David Scott Department of Statistics > The University of Auckland, PB 92019 > Auckland 1142, NEW ZEALAND > Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055 > Email: d.sc...@auckland.ac.nz, Fax: +64 9 373 7018 > > Director of Consulting, Department of Statistics > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.