That can be elegantly handled in R through R's object oriented programming by defining a class for the fancy input. See this post: https://stat.ethz.ch/pipermail/r-help/2007-April/130912.html for a simple example of that style.
On 6/9/07, Robert Wilkins <[EMAIL PROTECTED]> wrote: > Here are some examples of the type of data crunching you might have to do. > > In response to the requests by Christophe Pallier and Martin Stevens. > > Before I started developing Vilno, some six years ago, I had been working in > the pharmaceuticals for eight years ( it's not easy to show you actual data > though, because it's all confidential of course). > > Lab data can be especially messy, especially if one clinical trial allows > the physicians to use different labs. So let's consider lab data. > > Merge in normal ranges, into the lab data. This has to be done by lab-site > and lab testcode(PLT for platelets, etc.), obviously. I've seen cases where > you also need to match by sex and age. The sex column in the normal ranges > could be: blank, F, M, or B ( B meaning for Both sexes). The age column in > the normal ranges could be: blank, or something like "40 <55". Even worse, > you could have an ageunits column in the normal ranges dataset: usually "Y", > but if there are children in the clinical trial, you will have "D" or "M", > for Days and Months. If the clinical trial is for adults, all rows with "D" > or "M" should be tossed out at the start. Clearly the statistical programmer > has to spend time looking at the data, before writing the program. Remember, > all of these details can change any time you move to a new clinical trial. > > So for the lab data, you have to merge in the patient's date of birth, > calculate age, and somehow relate that to the age-group column in the normal > ranges dataset. > > (By the way, in clinical trial data preparation, the SAS datastep is much > more useful and convenient, in my opinion, than the SQL SELECT syntax, at > least 97% of the time. But in the middle of this program, when you merge the > normal ranges into the lab data, you get a better solution with PROC SQL ( > just the SQL SELECT statement implemented inside SAS) This is because of the > trickiness of the age match-up, and the SAS datastep does not do well with > many-to-many joins.). > > Merge in various study drug administration dates into the lab data. Now, for > each lab record, calculate treatment period ( or cycle number ), depending > on the statistician's specifications and the way the clinical trial is > structured. > > Different clinical sites chose to use different lab providers. So, for > example, for Monocytes, you have 10 different units ( essentially 6 units, > but spelling inconsistencies as well). The statistician has requested that > you use standardized units in some of the listings ( % units, and only one > type of non-% unit, for example ). At the same time, lab values need to be > converted ( *1.61 , divide by 1000, etc. ). This can be very time consuming > no matter what software you use, and, in my experience, when the SAS > programmer asks for more clinical information or lab guidebooks, the > response is incomplete, so he does a lot of guesswork. SAS programmers do > not have expertise in lab science, hence the guesswork. > > Your program has to accomodate numeric values, "1.54" , quasi-numeric values > "<1" , and non-numeric values "Trace". > > Your data listing is tight for space, so print "PROLONGED CELL CONT" as > "PRCC". > > Once normal ranges are merged in, figure out which values are out-of-range > and high , which are low, and which are within normal range. In the data > listing, you may have "H" or "L" appended to the result value being printed. > > For each treatment period, you may need a unique lab record selected, in > case there are two or three for the same treatment period. The statistician > will tell the SAS programmer how. Maybe the averages of the results for that > treatment period, maybe that lab record closest to the mid-point of of the > treatment period. This isn't for the data listing, but for a summary table. > > For the differentials ( monocytes, lymphocytes, etc) , merge in the WBC > (total white blood cell count) values , to convert values between % units > and absolute count units. > > When printing the values in the data listing, you need "H" or "L" to the > right of the value. But you also need the values to be well lined up ( the > decimal place ). This can be stupidly time consuming. > > > > AND ON AND ON AND ON ..... > > I think you see why clinical trials statisticians and SAS programmers enjoy > lots of job security. This could be readily handled in R using object oriented programming. You would specify a class for the strange input, ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.