On 6/13/07, Robert Wilkins <[EMAIL PROTECTED]> wrote: > > The point is : there are lots of data preparation scenarios where > large numbers of merges need to be done. This is an example where > Vilno and SAS are easier to use than the competition. I'm sure an Awk > programmer can come up with something, but the result would be > awkward.
Agreed. In the awk+R scenario, it is clear that the merges are often better done with R. My strategy is to use awk only to clean/reformat data into a tabular format and do most of the "consolidation" (computations/filtering/merges) in R. I suggested to use awk only to perform manipulations that would be more complex to do within R (especially mutliline records or recors with optionnal fields). I try to keep the scripts as simple as possible on both sides > Certain apsects of Vilno and SAS are a bit more user-friendly: > > Each column has a variable name, such as "PatientID". > > Awk uses $1, $2, $3 , as variable names for columns. Not user-friendly. > > In the first lines of awk scripts, I usually assign column numbers to variables (e.g. "Code=1, time=3") and then access the fields with "$Code", "$Time"... Yet, it is true that it is cumbersome, in awk, to use the labels on the first line of a file as a variable names (my major complain about awk). I looked at a few examples of SAS Data step scripts on the Net, and found that the awk scripts would be very similar (except for merges), but there may manipulations which I missed. > For scanning inconsistently structured ASCII data files, where > different rows have different column specifications, Awk is a better > tool. > > For data problems that lend themselves to UNIX-style regular > expressions, Awk, again, is a great tool. The examples of messy data formats that were described ealier on the list are good examples where regular expressions will help a lot. In the very first stage of data inspection, to detect coding "mistakes", awk (sometimes with the help ot other gnutools such as 'uniq' and 'sort') can be very efficient. > The upshot: > Awk is a hammer. > Vilno is a screwdriver. Nice analogy. Using the right tool for the right task is very important. So awk and vilno seem complementary. Yet, when R enters into the equation, do you still "need" the three tools? What we should really compare is the four situations: R alone R + awk R + vilno R + awk + vilno and maybe "R + SAS Data step" and see what scripts are more elegant (read 'short and understandable') Best, Christophe -- Christophe Pallier (http://www.pallier.org) [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.