On Dec 7, 2009, at 12:37 PM, Marshall Feldman wrote:

I totally agree with Barry, although it's sometimes convenient to
include data with analysis code for debugging and/or documentation purposes.

However, the example actually applies equally to separate data files. In
fact, the example is from the U.S. Bureau of Labor Statistics at
ftp://ftp.bls.gov/pub/time.series/sm/, which contains nothing but data
and documentation files. At issue is not where the data come from, but
rather how to parse relatively complex data organized inconsistently.
SAS has built-in the ability to parse five different organizations of
data: list (delimited), modified list, column, formatted, and mixed (see
http://www.masil.org/sas/input.html). It seems R can parse such data,
but only .

It is hard to know what you mean, because you have not specified what you mean by "with considerable work by the user."

It would be great to have a
function/package that implements something with as easy (hah!) and
flexible as SAS.

In particular it is not clear whether you were anticipating using read.fwf() and why you think that requires "considerable more work" than a SAS INPUT statement. The output of read.fwf gets passed to read.table, so that help page would document your options regarding definition of classes of input variables.

--
David



   Marsh

Barry Rowlingson wrote:
On Mon, Dec 7, 2009 at 3:53 PM, Marshall Feldman <ma...@uri.edu> wrote:

Regarding the various methods people have suggested, what if a typical
tab-delimited data line looks like:

   SMS11000000000000001 1990 M01 688.0

and the SAS INPUT statement is

INPUT survey $ 1-2 seasonal $ 3 state $ 4-5 area $ 6-10 supersector $ 11-12 @13 industry $8. datatype $ 21-22 year period $ value footnote $ ;

Note that most data lines have no footnote item, as in the sample.

Here (I think) we'd want all the character variables to be read as factors,
possibly "year" as a date, and "value" as numeric.


Actually I'm surprised that nobody has yet said what a clearly
bonkers thing it is to mix up your data and your analysis code in a
single file. Now suppose you have another set of data you want to
analyse with the same code? Are you going to create a new file and
paste the new data in? You've now got two copies of your analysis code
- good luck keeping corrections to that code synchronised.

This just seems like horrendously bad practice, which is one reason
it's kludgy in R. If it was good practice, someone would surely have
written a way to do it neatly.

Keep your data in data files, and your functions in .R function
files. You'll thank me later.

Barry



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to