Here is how to wittle it down for the first two parts of your
question. I am not exactly what you are after in the third part. Is
it that you want specific DATEs or do you want the ratio of the
DATE[max]/DATE[min]?
x - read.table(textConnection(CODENAME
DATE DATA1
+ 4813'ADVANCED TELECOM'19870.013
+ 3845'ADVANCED THERAPEUTIC SYS LTD'198710.1
+ 3845'ADVANCED THERAPEUTIC SYS LTD'19892.463
+ 3845'ADVANCED THERAPEUTIC SYS LTD'19881.563
+ 2836'ADVANCED TISSUE SCI -CL A' 19870.847
+ 2836'ADVANCED TISSUE SCI -CL A' 1989 0.872
+ 2836'ADVANCED TISSUE SCI -CL A' 1988
0.529), header=TRUE)
# matches on things to delete
delete_indx - grep(-CL A$|-OLD$|-ADS$, x$NAME)
# delete them
x - x[-delete_indx,]
x
CODE NAME DATE DATA1
1 4813 ADVANCED TELECOM 1987 0.013
2 3845 ADVANCED THERAPEUTIC SYS LTD 1987 10.100
3 3845 ADVANCED THERAPEUTIC SYS LTD 1989 2.463
4 3845 ADVANCED THERAPEUTIC SYS LTD 1988 1.563
# I assume you want to use NAME to check for ranges of data
date_range - tapply(x$DATE, x$NAME, function(dates) diff(range(dates)))
date_range
ADVANCED TELECOM ADVANCED THERAPEUTIC SYS LTD
02
ADVANCED TISSUE SCI -CL A
NA
# delete ones with less than 3 years
names_to_delete - names(date_range[date_range 2])
# delete those entries
x - x[!(x$NAME %in% names_to_delete),]
x
CODE NAME DATE DATA1
2 3845 ADVANCED THERAPEUTIC SYS LTD 1987 10.100
3 3845 ADVANCED THERAPEUTIC SYS LTD 1989 2.463
4 3845 ADVANCED THERAPEUTIC SYS LTD 1988 1.563
On Nov 13, 2007 2:34 PM, Jonas Malmros [EMAIL PROTECTED] wrote:
Dear R users,
I have a huge database and I need to adjust it somewhat.
Here is a very little cut out from database:
CODENAME DATE
DATA1
4813ADVANCED TELECOM19870.013
3845ADVANCED THERAPEUTIC SYS LTD198710.1
3845ADVANCED THERAPEUTIC SYS LTD19892.463
3845ADVANCED THERAPEUTIC SYS LTD19881.563
2836ADVANCED TISSUE SCI -CL A 19870.847
2836ADVANCED TISSUE SCI -CL A 1989 0.872
2836ADVANCED TISSUE SCI -CL A 1988 0.529
What I need is:
1) Delete all cases containing -CL A (and also -OLD, -ADS, etc) at the end
2) Delete all cases that have less than 3 years of data
3) For each remaining case compute ratio DATA1(1989) / DATA1(1987)
[and then ratios involving other data variables] and output this into
new database consisting of CODE, NAME, RATIOs.
Maybe someone can suggest an effective way to do these things? I
imagine the first one would involve grep(), and 2 and 3 would involve
apply family of functions, but I cannot get my mind around the actual
code to perform this adjustments. I am new to R, I do write code but
usually it consists of for-functions and plotting. I would much
appreciate your help.
Thank you in advance!
--
Jonas Malmros
Stockholm University
Stockholm, Sweden
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.