On Fri, 14 Jan 2011, David Scott wrote:

As a further note, this is a reminder that whenever you get data via a spreadsheet the first thing to do is examine it and clean up any problems. A basic requirement is to tabulate any categorical variable. Spreadsheets allow any sort of data to be entered, with no controls. My experience is that those who enter data into spreadsheets enter all sorts of variations of what a human would wish to treat as the same ("Open", "Open ", "open", etc.), even when told not to.

Another common problem is that they enter characters such as non-breaking space or zero-width characters: we added support for known encodings of NBSP to strip.white about five years ago.


David Scott

On 14/01/2011 4:03 p.m., Jim Holtman wrote:
try strip.white=TRUE to strip out white space

Sent from my iPad

On Jan 13, 2011, at 21:44, bgr...@dyson.brisnet.org.au wrote:


I have a frustrating issue which I am hoping someone may have a suggestion
about.

I am running XP and R 2.12.0 and saved an EXCEL file that I was sent as a
csv file.

The initial code I ran follows.

dec<- read.csv("g://FMH/FO30122010.csv",header=T)
dec.open<- subset (dec, Status == "Open")
table(dec.open$AMHS)

I was checking the output and noticed a difference between my manual count
and R output. Two subject's rows were not being detected by the subset
command:

For the AMHS where there was a discrepancy I then ran:
wm<- subset (dec, AMHS == "WM")

The problem appears to be that there is a space before the 'Open" value
for two indivduals, as per the example below.

10/02/2010  Open
22/08/2007   Open

Checking in EXCEL there does not appear to be a space and the format is
the same (e.g 'general').  I resolved the problem by copying over the
values for the two individuals where I identified  a problem.

Given this problem was not detected by visual scanning I would appreciate
advice on how this problem can be detected in future without my having to
manually check raw data against R output.

Any assistance is appreciated,

Bob

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
_________________________________________________________________
David Scott     Department of Statistics
                The University of Auckland, PB 92019
                Auckland 1142,    NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

Director of Consulting, Department of Statistics

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to