Re: [R] data manipulation help

2007-08-28 Thread Charles C. Berry
On Tue, 28 Aug 2007, Zheng Lu wrote:

> Dear All:
>
> I have a dataset like
> A=c(0,12,34,5,6,0,4,5,6,0,12,3,4,8,7,0,4,3,5,0,...),I want to add a
> column to this dataset, it must be in
> B=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,..), How can I create B
> based on the sequence of A. Appreciate.

Do you want

B  <- cumsum( A == 0 )

??

Please use spaces and newlines to make your code more readable!


>
>
> Zheng
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation help

2007-08-28 Thread Leeds, Mark \(IED\)
below works on you example but someone will have something more elegant.

zeroindices<-which(a == 0)
rep(1:length(zeroindices),c(diff(zeroindices),(length(a)-zeroindices[len
gth(zeroindices)]+1)))






-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Zheng Lu
Sent: Tuesday, August 28, 2007 5:00 PM
To: r-help@stat.math.ethz.ch
Subject: [R] data manipulation help

Dear All:

I have a dataset like
A=c(0,12,34,5,6,0,4,5,6,0,12,3,4,8,7,0,4,3,5,0,...),I want to add a
column to this dataset, it must be in
B=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,..), How can I create B
based on the sequence of A. Appreciate.


Zheng

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data manipulation help

2007-08-28 Thread Zheng Lu
Dear All:

I have a dataset like 
A=c(0,12,34,5,6,0,4,5,6,0,12,3,4,8,7,0,4,3,5,0,...),I want to add a 
column to this dataset, it must be in 
B=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,..), How can I create B 
based on the sequence of A. Appreciate.


Zheng

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation using R

2007-04-18 Thread Stephen Tucker
...is this what you're looking for?

donedat <- subset(data,ID < 6000 | ID >= 7000)
findat <- donedat[-unique(rapply(donedat,function(x)
 which( x < 0 ))),,drop=FALSE]

the second line looks through each column, and finds the indices of negative
values - rapply() returns all of them as a vector; unique() removes
duplicated elements, and with negative indexing you remove these values from
donedat.

--- Anup Nandialath <[EMAIL PROTECTED]> wrote:

> Dear Friends,
> 
> I have data set with around 220,000 rows and 17 columns. One of the columns
> is an id variable which is grouped from 1000 through 9000. I need to
> perform the following operations. 
> 
> 1) Remove all the observations with id's between 6000 and 6999
> 
> I tried using this method. 
> 
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
> 
> I check the last and first entry and found that it did not have ID values
> 6000. Therefore I think that this might be correct, but is this the most
> efficient way of doing this?
> 
> 2) I need to remove observations within columns 3, 4, 6 and 8 when they are
> negative. For instance if the number in column 3 is -4, then I need to
> delete the entire observation. Can somebody help me with this too.
> 
> Thank and Regards
> 
> Anup
> 
>
> -
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Manipulation using R

2007-04-17 Thread Charilaos Skiadas
On Apr 17, 2007, at 8:03 PM, Anup Nandialath wrote:

> Dear Friends,
>
> I have data set with around 220,000 rows and 17 columns. One of the  
> columns is an id variable which is grouped from 1000 through 9000.  
> I need to perform the following operations.
>
> 1) Remove all the observations with id's between 6000 and 6999
>
> I tried using this method.
>
> remdat1 <- subset(data, ID<6000)
> remdat2 <- subset(data, ID>=7000)
> donedat <- rbind(remdat1, remdat2)
>
> I check the last and first entry and found that it did not have ID  
> values 6000. Therefore I think that this might be correct, but is  
> this the most efficient way of doing this?
>
The rbind is a bit unnecessary probably.

I think all you are missing for both questions is the "or" operator,   
"|".  ( ?"|" )

Simply:

donedat <- subset(data, ID< 6000 | ID >=7000)

would do for this. Not sure about efficiency, but if the code is fast  
as it stands I wouldn't worry too much about it.

> 2) I need to remove observations within columns 3, 4, 6 and 8 when  
> they are negative. For instance if the number in column 3 is -4,  
> then I need to delete the entire observation. Can somebody help me  
> with this too.

The following should do it (untested, not sure if it would handle NA's):

toremove <- data[,3] < 0 | data[,4] < 0 | data[,6] < 0 | data[,8] < 0
data[!toremove,]


If you want more columns than those 4, then we could perhaps look for  
a better line than the first line above.

> Thank and Regards
>
> Anup

Haris Skiadas
Department of Mathematics and Computer Science
Hanover College

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data Manipulation using R

2007-04-17 Thread Anup Nandialath
Dear Friends,

I have data set with around 220,000 rows and 17 columns. One of the columns is 
an id variable which is grouped from 1000 through 9000. I need to perform the 
following operations. 

1) Remove all the observations with id's between 6000 and 6999

I tried using this method. 

remdat1 <- subset(data, ID<6000)
remdat2 <- subset(data, ID>=7000)
donedat <- rbind(remdat1, remdat2)

I check the last and first entry and found that it did not have ID values 6000. 
Therefore I think that this might be correct, but is this the most efficient 
way of doing this?

2) I need to remove observations within columns 3, 4, 6 and 8 when they are 
negative. For instance if the number in column 3 is -4, then I need to delete 
the entire observation. Can somebody help me with this too.

Thank and Regards

Anup

   
-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data manipulation in columns (with apply?)

2006-10-10 Thread jim holtman
Does this start to do what you want?

> x <- "NUM sim N
+ 1 1  466
+ 1 2  450
+ 1 3  473
+ 1 4  531
+ 1 5  515
+ 1 6  502
+ 1 7  471
+ 1 8  460
+ 1 9  458
+ 1 10 434
+ 2 1  289
+ 2 2  356
+ 2 3 387
+ 2 4 440
+ 2 5 457
+ 2 6 466
+ 2 7 467
+ 2 8 449
+ 2 9 387
+ 2 10 394
+ 3 1 367
+ 3 2 400
+ 3 3 476
+ 3 4 508
+ 3 5 478
+ 3 6 501
+ 3 7 513
+ 3 8 505
+ 3 9 492
+ 3 10 465"
> a <- read.table(textConnection(x), header=T)
> lambda <- by(a, a$NUM, function(x) x$N[-1] / x$N[-length(x$N)])
> lambda
a$NUM: 1
[1] 0.9656652 1.051 1.1226216 0.9698682 0.9747573 0.9382470
0.9766454 0.9956522 0.9475983
--
a$NUM: 2
[1] 1.2318339 1.0870787 1.1369509 1.0386364 1.0196937 1.0021459
0.9614561 0.8619154 1.0180879
--
a$NUM: 3
[1] 1.0899183 1.190 1.0672269 0.9409449 1.0481172 1.0239521
0.9844055 0.9742574 0.9451220
> # sum of lambdas
> sapply(lambda, sum)
   123
8.942166 9.357799 9.263944
> # mean
> sapply(lambda, mean)
   123
0.993574 1.039755 1.029327
> # sd
> sapply(lambda, sd)
 1  2  3
0.05822850 0.10525335 0.08004527
>
>
>


On 10/10/06, Bret Collier <[EMAIL PROTECTED]> wrote:
> R Users,
> I have written a small simulation model in R which outputs a datafile 
> consisting of ending population sizes for each simulation run (year).  The 
> data (see short data example below) is labeled by NUM (simulation run), sim 
> (year) and N (yearly count).   After searching the help files and coming up 
> empty (probably because I used the wrong terms) I am appealing for some help 
> for working with the output dataset.
>
> What I want to do is for each of the i simulation runs (NUM) I want to
>
> 1) take N(t+1)/N(t)=lambda(t) for each year (where in the below example 
> t=1,...,10--total years of the simulation)
> 2) Sum lambda(t) and divide by t (e.g., output both the mean/se of lambda for 
> each simulation run)
> 3) Take the mean of the mean(lambda's) (and associated stddev, min, max) over 
> all NUM
>
> I think I have to write a function for use within an apply statement, but I 
> am not quite there yet on the learning curve so most of my recent attempts in 
> R have been useful learning experiences of what not to do...
>
> Any suggestions/direction is greatly appreciated.
>
> Bret Collier
> TX A&M
>
> NUM sim N
> 1 1  466
> 1 2  450
> 1 3  473
> 1 4  531
> 1 5  515
> 1 6  502
> 1 7  471
> 1 8  460
> 1 9  458
> 1 10 434
> 2 1  289
> 2 2  356
> 2 3 387
> 2 4 440
> 2 5 457
> 2 6 466
> 2 7 467
> 2 8 449
> 2 9 387
> 2 10 394
> 3 1 367
> 3 2 400
> 3 3 476
> 3 4 508
> 3 5 478
> 3 6 501
> 3 7 513
> 3 8 505
> 3 9 492
> 3 10 465
>
> platform   i386-pc-mingw32
> arch   i386
> os mingw32
> system i386, mingw32
> status
> major  2
> minor  3.0
> year   2006
> month  04
> day24
> svn rev37909
> language   R
> version.string Version 2.3.0 (2006-04-24) (yeah, I need to update)
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Data manipulation in columns (with apply?)

2006-10-10 Thread Bret Collier
R Users,
I have written a small simulation model in R which outputs a datafile 
consisting of ending population sizes for each simulation run (year).  The data 
(see short data example below) is labeled by NUM (simulation run), sim (year) 
and N (yearly count).   After searching the help files and coming up empty 
(probably because I used the wrong terms) I am appealing for some help for 
working with the output dataset. 

What I want to do is for each of the i simulation runs (NUM) I want to 

1) take N(t+1)/N(t)=lambda(t) for each year (where in the below example 
t=1,...,10--total years of the simulation)
2) Sum lambda(t) and divide by t (e.g., output both the mean/se of lambda for 
each simulation run)
3) Take the mean of the mean(lambda's) (and associated stddev, min, max) over 
all NUM

I think I have to write a function for use within an apply statement, but I am 
not quite there yet on the learning curve so most of my recent attempts in R 
have been useful learning experiences of what not to do...

Any suggestions/direction is greatly appreciated.

Bret Collier
TX A&M

NUM sim N
1 1  466
1 2  450
1 3  473
1 4  531
1 5  515
1 6  502
1 7  471
1 8  460
1 9  458
1 10 434
2 1  289
2 2  356
2 3 387
2 4 440
2 5 457
2 6 466
2 7 467
2 8 449
2 9 387
2 10 394
3 1 367
3 2 400
3 3 476
3 4 508
3 5 478
3 6 501
3 7 513
3 8 505
3 9 492
3 10 465

platform   i386-pc-mingw32   
arch   i386  
os mingw32   
system i386, mingw32 
status   
major  2 
minor  3.0   
year   2006  
month  04
day24
svn rev37909 
language   R 
version.string Version 2.3.0 (2006-04-24) (yeah, I need to update)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data manipulation docs

2006-05-05 Thread Frank E Harrell Jr
Federico Calboli wrote:
> Hi All,
> 
> Is there some document/manual about data manipulation within R that I 
> could use as a reference (obviously, aside the R manuals)?
> 
> The reason I am asking is that I have a number of data frames/matrices 
> containg genetic data. The data is in a character form, as in:
> 
>V1 V2 V3 V4 V5
> 1 AA AG AA GG AG
> 2 AC AA AA GG AG
> 3 AA AG AA GG AG
> 4 AA AA AA GG AG
> 5 AA AA AA GG AA
> 
> I need, to chop, subset, and variously manipulate this kind of data, 
> sometimes keeping the data in its character format, sometimes converting 
> it to numeric form (i.e. substitute each data point with the equivalent 
> factor value). Since the data is ofthe quite big, I have to keep things 
> memory efficient.
> 
> This whole game is getting excedingly time consuming and frustrating, 
> because I end up with random pieces of code that I save, patching a 
> particular problem, but difficult to be 'abstracted' for a new task, so 
> I get back close to square one annoyingly often.
> 
> Cheers,
> 
> Federico Calboli
> 
> 

There is a large data manipulation section on the Alzola Harrell 
document available on CRAN under contributed docs, or a slightly more up 
to date version at biostat.mc.vanderbilt.edu

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation docs

2006-05-04 Thread Larry Howe
On Thursday May 4 2006 10:20, Federico Calboli wrote:
> The reason I am asking is that I have a number of data frames/matrices
> containg genetic data. The data is in a character form, as in:

Take a look at the Bioconductor project: "Bioconductor is an open source and 
open development software project for the analysis and comprehension of 
genomic data." http://www.bioconductor.org/

> This whole game is getting excedingly time consuming and frustrating,
> because I end up with random pieces of code that I save, patching a
> particular problem, but difficult to be 'abstracted' for a new task, so
> I get back close to square one annoyingly often.

This sounds like a software engineering problem, not an R problem. Does 
Imperial have a computer science dept.? Maybe they could advise on software 
engineering techniques.

Larry Howe

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] data manipulation docs

2006-05-04 Thread Federico Calboli
Hi All,

Is there some document/manual about data manipulation within R that I 
could use as a reference (obviously, aside the R manuals)?

The reason I am asking is that I have a number of data frames/matrices 
containg genetic data. The data is in a character form, as in:

   V1 V2 V3 V4 V5
1 AA AG AA GG AG
2 AC AA AA GG AG
3 AA AG AA GG AG
4 AA AA AA GG AG
5 AA AA AA GG AA

I need, to chop, subset, and variously manipulate this kind of data, 
sometimes keeping the data in its character format, sometimes converting 
it to numeric form (i.e. substitute each data point with the equivalent 
factor value). Since the data is ofthe quite big, I have to keep things 
memory efficient.

This whole game is getting excedingly time consuming and frustrating, 
because I end up with random pieces of code that I save, patching a 
particular problem, but difficult to be 'abstracted' for a new task, so 
I get back close to square one annoyingly often.

Cheers,

Federico Calboli


-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602 Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-09-10 Thread Dieter Menne
Marc Bernard  yahoo.fr> writes:

> I would be grateful if you can help me. My problem is the following:
> I have a data set like:
> 
> ID  time  X1  X2
> 11  x111  x211
> 12  x112  x212

 
> where X1 and X2 are 2 covariates and "time" is the time of observation and ID 
indicates the cluster.
> 
> I want to merge the above data by creating a new variable  "X" and "type" as 
follows:
> 
> ID   timeXtype
> 1 1  x111 X1


Try reshape. And have courage, this is one of the more complex interfaces in R, 
very powerful, but intimidating.

Dieter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-09-08 Thread Jean Eid
I am sure all this work but If you want exaclty the output to be the way
you mentioned do this

temp<-read.table("yourfile", as.is=T, header=T)
temp1<-temp[, 1:3]
temp2<-temp[, c(1,2,4)]
colnames(temp1)[3]<-"X"
colnames(temp2)[3]<-"X"
temp3<-merge(temp1, temp2, all=T)
temp3$type<-toupper(substr(temp3$X, 1,2))


after which you can generate factors and such..
note the as.is=T in read.table keeps the variables X1, X2, as characters.
This is done for substr...


P.S. I am sure you can use reshape instead of the second to the fifth
commands above

?reshape

Jean

On Thu, 8 Sep 2005, Sebastian Luque wrote:

> Marc Bernard <[EMAIL PROTECTED]> wrote:
> > Dear All,
>
> > I would be grateful if you can help me. My problem is the following:
> > I have a data set like:
>
> > ID  time  X1  X2
> > 11  x111  x211
> > 12  x112  x212
> > 21  x121  x221
> > 22  x122  x222
> > 23  x123  x223
>
> > where X1 and X2 are 2 covariates and "time" is the time of observation and 
> > ID
> > indicates the cluster.
>
> > I want to merge the above data by creating a new variable "X" and "type" as
> > follows:
>
> > ID   timeXtype
> > 1 1  x111 X1
> > 1 2  x112 X1
> > 1 1  x211 X2
> > 1 2  x212 X2
> > 2 1  x121 X1
> > 2 2  x122 X1
> > 2 3  x123 X1
> > 2 1  x221 X2
> > 2 2  x222 X2
> > 2 3  x223 X2
>
>
> > Where "type" is a factor variable indicating if the observation is related 
> > to
> > X1 or X2...
>
>
> Say your original data is in dataframe df, then this might do what you
> want:
>
> R> newdf <- rbind(df[, 1:3], df[, c(1, 2, 4)])
> R> names(newdf)[3] <- "X"
> R> newdf$type <- substr(c(df[[3]], df[[4]]), 1, 2)
>
> Cheers,
>
> --
> Sebastian P. Luque
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-09-08 Thread Jim Porzak
Also see Hadley Wickham's reshape package for more bells & whistles.
-- 
HTH!
Jim Porzak
Loyalty Matrix Inc.



On 9/8/05, Thomas Lumley <[EMAIL PROTECTED]> wrote:
> 
> This is what reshape() does.
> 
> -thomas
> 
> On Thu, 8 Sep 2005, Marc Bernard wrote:
> 
> > Dear All,
> >
> > I would be grateful if you can help me. My problem is the following:
> > I have a data set like:
> >
> > ID  time  X1  X2
> > 11  x111  x211
> > 12  x112  x212
> > 21  x121  x221
> > 22  x122  x222
> > 23  x123  x223
> >
> > where X1 and X2 are 2 covariates and "time" is the time of observation and 
> > ID indicates the cluster.
> >
> > I want to merge the above data by creating a new variable  "X" and "type" 
> > as follows:
> >
> > ID   timeXtype
> > 1 1  x111 X1
> > 1 2  x112 X1
> > 1 1  x211 X2
> > 1 2  x212 X2
> > 2 1  x121 X1
> > 2 2  x122 X1
> > 2 3  x123 X1
> > 2 1  x221 X2
> > 2 2  x222 X2
> > 2 3  x223 X2
> >
> >
> > Where "type" is a factor variable indicating if the observation is related 
> > to X1 or X2...
> >
> > Many thanks in advance,
> >
> > Bernard
> >
> >
> > -
> >
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> >
> 
> Thomas Lumley   Assoc. Professor, Biostatistics
> [EMAIL PROTECTED]University of Washington, Seattle
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-09-08 Thread Thomas Lumley

This is what reshape() does.

-thomas

On Thu, 8 Sep 2005, Marc Bernard wrote:

> Dear All,
>
> I would be grateful if you can help me. My problem is the following:
> I have a data set like:
>
> ID  time  X1  X2
> 11  x111  x211
> 12  x112  x212
> 21  x121  x221
> 22  x122  x222
> 23  x123  x223
>
> where X1 and X2 are 2 covariates and "time" is the time of observation and ID 
> indicates the cluster.
>
> I want to merge the above data by creating a new variable  "X" and "type" as 
> follows:
>
> ID   timeXtype
> 1 1  x111 X1
> 1 2  x112 X1
> 1 1  x211 X2
> 1 2  x212 X2
> 2 1  x121 X1
> 2 2  x122 X1
> 2 3  x123 X1
> 2 1  x221 X2
> 2 2  x222 X2
> 2 3  x223 X2
>
>
> Where "type" is a factor variable indicating if the observation is related to 
> X1 or X2...
>
> Many thanks in advance,
>
> Bernard
>
>
> -
>
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

Thomas Lumley   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]   University of Washington, Seattle

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-09-08 Thread Sebastian Luque
Marc Bernard <[EMAIL PROTECTED]> wrote:
> Dear All,

> I would be grateful if you can help me. My problem is the following:
> I have a data set like:

> ID  time  X1  X2
> 11  x111  x211
> 12  x112  x212
> 21  x121  x221
> 22  x122  x222
> 23  x123  x223

> where X1 and X2 are 2 covariates and "time" is the time of observation and ID
>   indicates the cluster.

> I want to merge the above data by creating a new variable "X" and "type" as
>   follows:

> ID   timeXtype
> 1 1  x111 X1
> 1 2  x112 X1
> 1 1  x211 X2
> 1 2  x212 X2
> 2 1  x121 X1
> 2 2  x122 X1
> 2 3  x123 X1
> 2 1  x221 X2
> 2 2  x222 X2
> 2 3  x223 X2


> Where "type" is a factor variable indicating if the observation is related to
>   X1 or X2...


Say your original data is in dataframe df, then this might do what you
want:

R> newdf <- rbind(df[, 1:3], df[, c(1, 2, 4)])
R> names(newdf)[3] <- "X"
R> newdf$type <- substr(c(df[[3]], df[[4]]), 1, 2)

Cheers,

-- 
Sebastian P. Luque

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-09-08 Thread Martin Lam
Hi,

This may not be the best solution, but at least it's
easy to see what i'm doing, assume that your data set
is called "data":

# remove the 4th column
data1 = data[,-4]

# remove the 3rd column
data2 = data[,-3]

# use cbind to add an extra column with only X1 
#elements
data1 = cbind(data1, array("X1", nrow(data1), 1)

# use cbind to add an extra column with only X2
#elements
data2 = cbind(data2, array("X2", nrow(data2), 1)

# use rbind to add them together as rows
data3 = rbind(data1, data2)

# rename the names of the columns
colnames(data3) <- c("ID", "time", "X", "type")

# show output
data3

The only thing I couldn't figure out is how to sort
the data set per row, perhaps someone else could help
us out on this?

Martin

--- Marc Bernard <[EMAIL PROTECTED]> wrote:

> Dear All,
>  
> I would be grateful if you can help me. My problem
> is the following:
> I have a data set like:
>  
> ID  time  X1  X2
> 11  x111  x211
> 12  x112  x212
> 21  x121  x221
> 22  x122  x222
> 23  x123  x223
>  
> where X1 and X2 are 2 covariates and "time" is the
> time of observation and ID indicates the cluster.
>  
> I want to merge the above data by creating a new
> variable  "X" and "type" as follows:
>  
> ID   timeXtype
> 1 1  x111 X1
> 1 2  x112 X1
> 1 1  x211 X2
> 1 2  x212 X2
> 2 1  x121 X1
> 2 2  x122 X1
> 2 3  x123 X1
> 2 1  x221 X2
> 2 2  x222 X2
> 2 3  x223 X2
> 
>  
> Where "type" is a factor variable indicating if the
> observation is related to X1 or X2...
>  
> Many thanks in advance,
>  
> Bernard
> 
>   
> -
> 
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 





__
Click here to donate to the Hurricane Katrina relief effort.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] data manipulation

2005-09-08 Thread Marc Bernard
Dear All,
 
I would be grateful if you can help me. My problem is the following:
I have a data set like:
 
ID  time  X1  X2
11  x111  x211
12  x112  x212
21  x121  x221
22  x122  x222
23  x123  x223
 
where X1 and X2 are 2 covariates and "time" is the time of observation and ID 
indicates the cluster.
 
I want to merge the above data by creating a new variable  "X" and "type" as 
follows:
 
ID   timeXtype
1 1  x111 X1
1 2  x112 X1
1 1  x211 X2
1 2  x212 X2
2 1  x121 X1
2 2  x122 X1
2 3  x123 X1
2 1  x221 X2
2 2  x222 X2
2 3  x223 X2

 
Where "type" is a factor variable indicating if the observation is related to 
X1 or X2...
 
Many thanks in advance,
 
Bernard


-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] data manipulation help

2005-08-16 Thread munguiar

Thanks to Patrick Burns, Dieter Menne and Peter Alspach for your help.

Peter Alspach indicated me how to get the first and the last capture
of every individual with the following code:

capture <- matrix(rbinom(40, 1, 0.3), 4, 10)

capture
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]000001101 1
[2,]101000111 0
[3,]000000101 0
[4,]010110000 0

firstcap<-apply(capture, 1, function(x) min((1:length(x))[x==1])) [1] 6 1
7 2
lastcap<-apply(capture, 1, function(x) max((1:length(x))[x==1])) [1] 10 
9  9  5

Roberto

Hello everybody,

 I have a dataframe with 468 individuals (rows) that I captured at least
once during 28 visits (columns), iso I can know how many times every
individual was captured, 0= not capture, 1=capture.

persistence<-apply(mortacap2,1,sum)

I also want to know when  was the first and the last capture for every
individual,
if I use:

which(mortacap2[1,]==1)


X18.10.2004 X26.10.2004 X28.10.2004 X30.10.2004

  1   5   6   7
I can estimate manually row by row, but I dont get how to estimate the
first and the last capture, to all individuals in the database at the
same time.


I tried
d<-numeric(368)
for (i in 1:368) {d[i]<-which(mortacap2[1:368,]==1}
but it didnt work. Any help would be appreciated.


Thanks in advance!!


Roberto Munguia Steyer
Departamento Biologia Evolutiva
Instituto de Ecologia, A.C.
Xalapa, Veracruz.
MEXICO

 Windows XP
R 2.10

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation help

2005-08-16 Thread Dieter Menne
roberto munguia  posgrado.ecologia.edu.mx> writes:

> 
> I have a dataframe with 468 individuals (rows) that I captured at least once
> during 28 visits (columns), it looks like:
> 
> mortality[1:10,]
> 
> 
> 11   0   0   0   1   1
> 1   0   0   0
..
> so I can know how many times every individual was captured, 0= not capture,
> 1=capture. 
 
> I also want to know when  was the first and the last capture for every
> individual,

This should give you a starter

# create play data
cap = data.frame(matrix(rbinom(120,1,0.3),nrow=10))

firstthat<-function(x) which(x)[1] # stolen from Thomas Lumley

# Make your data logical; not really needed, but easier to understand
cap.log = cap==1
apply(cap.log,1,firstthat) # gives first captures

Dieter

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] data manipulation help

2005-08-15 Thread roberto munguia
Hellow everybody,

 

I have a dataframe with 468 individuals (rows) that I captured at least once
during 28 visits (columns), it looks like:

 

mortality[1:10,]

 

   X18.10.2004 X20.10.2004 X22.10.2004 X24.10.2004 X26.10.2004 X28.10.2004
X30.10.2004 X01.11.2004 X03.11.2004 X07.11.2004

11   0   0   0   1   1
1   0   0   0

21   0   0   0   0   0
0   0   0   0

31   1   1   0   0   0
1   0   0   1

41   0   0   0   0   0
0   0   0   0

51   1   1   1   0   0
1   1   0   0

61   1   1   1   0   0
0   1   0   0

71   0   1   0   1   0
1   1   0   0

81   1   1   0   1   0
1   1   1   1

91   0   0   1   1   0
0   0   1   0

10   1   0   1   0   1   0
0   0   0   0

 

 

so I can know how many times every individual was captured, 0= not capture,
1=capture. 

 

persistence<-apply(mortacap2,1,sum)

 

I also want to know when  was the first and the last capture for every
individual,

 

if I use:

 

which(mortacap2[1,]==1)

 

X18.10.2004 X26.10.2004 X28.10.2004 X30.10.2004 

  1   5   6   7 

 

I can estimate manually row by row, but I dont get how to estimate the first
and the last capture,

to all individuals in the database at the same time.

 

I tried 

 

d<-numeric(368)

for (i in 1:368) {d[i]<-which(mortacap2[1:368,]==1}

 

but it didnt work. Any help would be appreciated.

 

Thanks in advance!!

 

Roberto Munguia Steyer

Departamento Biologia Evolutiva

Instituto de Ecologia, A.C.

Xalapa, Veracruz.

MEXICO

 

Windows XP

R 2.10


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] data manipulation

2005-04-23 Thread Liaw, Andy
You just need to try harder in reading the documentation.  Try:

data <- matrix(scan("file-name"), ncol=29, byrow=TRUE)

Andy

> From: Yoko Nakajima
> 
> Hello,
> 
> may I ask a further question?
> 
> I have realized that "data <-
> matrix(scan("file-name"), ncol=29)" will read the data 
> differently than I
> thought, i.e., (4,1) is the first column,  (17,1) is the 
> second column, and
> (1,1) is the third and so on by this code - please see the data below.
> Therefore, the data set I have would not be in order if I 
> used this code.
> 
> It needed to be read as: (4.4) first column, (1,1) the second 
> column, and
> (17, 17) is the third and so on (i.e., from 4 to 0.5611 makes 
> the first row
> and another 4 to 0.5611 makes the second row and so on). So,
> 
> V1 V2 V3 ... V29
> 4117   ...  0.5611
> 4117   ...  0.5611
> 
> was needed.
> 
> (Now I have ,
> V1 V2 V3   V29
> 417   1   ...  0.6578
> 11   -5.1536  ...   0.5611)
> 
> 
> [The data set I have may have around 1000 sets of them (29 
> variables times
> around 1000 sets of these 29 variables). I only paste here two sets of
> them.]
> 4 1 17 1 1
> -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678
> -0.5081 -0.2227
> 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673
> -0.1033 -0.0796
> -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
> 
> 4 1 17 2 1
> -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678
> -0.5081 -0.2227
> 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673
> -0.1033 -0.0796
> -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
> 
> 
> 
> I need 29 columns. This is true. But the data was read differently by
> "ncol=29". Is there any way I can handle this problem by R?
> 
> I would very appreciate it if you could let me know. My guess 
> is that I
> should probably rearrange the data set  by excel etc.. I have used
> "data.entry(data)" and found this. I can not analyze this data set.
> 
> Thank you very much, in advance.
> Sincerely,
> Yoko.
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-04-23 Thread Yoko Nakajima
Hello,

may I ask a further question?

I have realized that "data <-
matrix(scan("file-name"), ncol=29)" will read the data differently than I
thought, i.e., (4,1) is the first column,  (17,1) is the second column, and
(1,1) is the third and so on by this code - please see the data below.
Therefore, the data set I have would not be in order if I used this code.

It needed to be read as: (4.4) first column, (1,1) the second column, and
(17, 17) is the third and so on (i.e., from 4 to 0.5611 makes the first row
and another 4 to 0.5611 makes the second row and so on). So,

V1 V2 V3 ... V29
4117   ...  0.5611
4117   ...  0.5611

was needed.

(Now I have ,
V1 V2 V3   V29
417   1   ...  0.6578
11   -5.1536  ...   0.5611)


[The data set I have may have around 1000 sets of them (29 variables times
around 1000 sets of these 29 variables). I only paste here two sets of
them.]
4 1 17 1 1
-5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678
-0.5081 -0.2227
0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673
-0.1033 -0.0796
-0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611

4 1 17 2 1
-5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678
-0.5081 -0.2227
0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673
-0.1033 -0.0796
-0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611



I need 29 columns. This is true. But the data was read differently by
"ncol=29". Is there any way I can handle this problem by R?

I would very appreciate it if you could let me know. My guess is that I
should probably rearrange the data set  by excel etc.. I have used
"data.entry(data)" and found this. I can not analyze this data set.

Thank you very much, in advance.
Sincerely,
Yoko.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2005-04-13 Thread Marc Schwartz
On Wed, 2005-04-13 at 20:56 -0400, Yoko Nakajima wrote:
> Hello,
> my question is about the data handling.
> 
> I have a data set that is lined as:
> 
> 4 1 17 1 1
>  -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 -0.5081
> -0.2227
>   0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 -0.1033
> -0.0796
>  -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
> 4 1 17 2 1
>  -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 -0.5081
> -0.2227
>   0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 -0.1033
> -0.0796
>  -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
> 
> This means that 29 variables are together as a set. You saw two sets
> of them in example. I have about 1000 sets (of 29 variables) in my
> data. When I "scan" this data set, the result comes with 7 columns and
> it is not possible, so far, to read the table by column wise, and thus
> it is not possible to analyze the data. I would like to know whether
> there is a way to solve this problem, say, by arranging columns or
> increasing the number of columns of data matrix by R.
> 
> Also, I would like to know how you could name each column of the data
> so that you could use the individual column separately.

You probably change some default setting in scan(). By default it treats
'white space' as field delimiters.

Using your data above, which I save in file called 'test.dat':

> mat <- matrix(scan("test.dat"), ncol = 29)
Read 58 items

> dim(mat)
[1]  2 29

> colnames(mat) <- paste("Col", 1:29, sep = "")

> mat
 Col1 Col2Col3Col4Col5   Col6Col7Col8Col9
[1,]4   17  1. -0.1668 -0.5062 0.3640 -0.5081  0.8142 -0.0445
[2,]11 -5.1536 -2.3412  0.9621 0.3678 -0.2227 -0.0389 -0.0578
   Col10   Col11   Col12   Col13   Col14  Col15 Col16 Col17   Col18
[1,] -0.1175  0.8673 -0.0796 -0.1716 -0.7014 0.5611 1 2 -5.1536
[2,] -0.1232 -0.1033 -0.0341 -0.1801  0.6578 4.17 1 -0.1668
   Col19  Col20   Col21   Col22   Col23   Col24   Col25   Col26
[1,] -2.3412 0.9621  0.3678 -0.2227 -0.0389 -0.0578 -0.1232 -0.1033
[2,] -0.5062 0.3640 -0.5081  0.8142 -0.0445 -0.1175  0.8673 -0.0796
   Col27   Col28  Col29
[1,] -0.0341 -0.1801 0.6578
[2,] -0.1716 -0.7014 0.5611

In this case, 'mat' is a matrix with 2 rows and 29 columns.

You can restructure this differently as per your requirements.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] data manipulation

2005-04-13 Thread John Fox
Dear Yoko,

If you're sure that the data are complete, then data <-
matrix(scan("file-name"), ncol=29) should do the trick. Then to name the
columns of the data matrix, colnames(data) <- c("one", "two", etc.). [Of
course, you'd substitute meaningful names.]

I hope this helps,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Yoko Nakajima
> Sent: Wednesday, April 13, 2005 7:56 PM
> To: r-help@stat.math.ethz.ch
> Subject: [R] data manipulation
> 
> Hello,
> my question is about the data handling.
> 
> I have a data set that is lined as:
> 
> 4 1 17 1 1
>  -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 
> -0.5081 -0.2227
>   0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 
> -0.1033 -0.0796
>  -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
> 4 1 17 2 1
>  -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 
> -0.5081 -0.2227
>   0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 
> -0.1033 -0.0796
>  -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
> 
> This means that 29 variables are together as a set. You saw 
> two sets of them in example. I have about 1000 sets (of 29 
> variables) in my data. When I "scan" this data set, the 
> result comes with 7 columns and it is not possible, so far, 
> to read the table by column wise, and thus it is not possible 
> to analyze the data. I would like to know whether there is a 
> way to solve this problem, say, by arranging columns or 
> increasing the number of columns of data matrix by R.
> 
> Also, I would like to know how you could name each column of 
> the data so that you could use the individual column separately.
> 
> Sincerely.
>   [[alternative HTML version deleted]]
> 
> __
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] data manipulation

2005-04-13 Thread Yoko Nakajima
Hello,
my question is about the data handling.

I have a data set that is lined as:

4 1 17 1 1
 -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 -0.5081 -0.2227
  0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 -0.1033 -0.0796
 -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
4 1 17 2 1
 -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 -0.5081 -0.2227
  0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 -0.1033 -0.0796
 -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611

This means that 29 variables are together as a set. You saw two sets of them in 
example. I have about 1000 sets (of 29 variables) in my data. When I "scan" 
this data set, the result comes with 7 columns and it is not possible, so far, 
to read the table by column wise, and thus it is not possible to analyze the 
data. I would like to know whether there is a way to solve this problem, say, 
by arranging columns or increasing the number of columns of data matrix by R.

Also, I would like to know how you could name each column of the data so that 
you could use the individual column separately.

Sincerely.
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Data manipulation

2005-02-08 Thread Helmut Kudrnovsky
thanks a lot  for the information, reshape did the job

> datars <-reshape(data, timevar="TERRCODE", idvar="BID", direction="wide")

greetings
helli

BID TERRCODEANMCODE
200310413120660 22  0
200310413120660 273 0
200310413120660 280 0
200310413120660 467 0
200310413120660 468 0
200310413127001 5   0
200310413127001 50  0
200310413127001 53  13
200310413127001 54  11
200310413127001 72  0
200310413127001 89  0
200310413127001 671 0
200310413225032 1   0
200310413225032 3   0
200310413225032 6   0
200310413225032 51  0
200310413225032 52  21
200310413225032 53  21
200310413225032 54  21
200310413225032 55  13
200310413225032 57  11
200310413225032 72  0


result:

BID ANMCODE.1   ANMCODE.2   ANMCODE.3   ANMCODE.4   
ANMCODE.5   ANMCODE.6   ANMCODE.7   
200310413120660 NA  NA  NA  NA  NA  NA  NA  NA  
NA  NA  NA  NA  
200310413127001 NA  NA  NA  NA  0   NA  NA  NA  
NA  NA  NA  NA  
200310413225032 0   NA  0   NA  NA  0   NA  NA  
NA  NA  NA  NA  
200310413225033 0   NA  0   NA  NA  0   NA  NA  
NA  NA  NA  NA  
200310413225072 0   NA  0   NA  NA  NA  NA  NA  
0   NA  NA  0   
200310413225073 0   NA  0   NA  NA  0   NA  NA  
NA  NA  0   NA  
200310413225074 0   NA  0   NA  NA  0   NA  NA  
NA  NA  NA  0
...



Eric Lecoutre <[EMAIL PROTECTED]> schrieb am 08.02.05 08:55:46:


Hi,

Have a look at:
? aggregate
? reshape

Eric


At 07:39 8/02/2005, you wrote:
>Content-Type: text/plain; charset="iso-8859-1"
>Received-SPF: none (hypatia: domain of [EMAIL PROTECTED] does not designate 
>permitted sender hosts)
>X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
>Content-Transfer-Encoding: 8bit
>X-MIME-Autoconverted: from quoted-printable to 8bit by 
>hypatia.math.ethz.ch id j186djX0017423
>X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on 
>hypatia.math.ethz.ch
>X-Spam-Level:
>X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50 autolearn=no 
>version=3.0.2
>
>Hi R-friends,
>
>i have large dataset in the following structure:
>
>BID;TERRCODE;ANMCODE
>200310413290002;4;0
>200310413290002;80;0
>200310413290002;2;0
>200310413290002;5;0
>200310413290003;3;0
>200310413290003;1;0
>200310413290003;11;0
>200310413290003;26;0
>200310413290003;141;21
>200310413290003;472;0
>200310413290004;3;0
>200310413290004;1;0
>200310413290004;7;0
>200310413290004;18;0
>200310413290004;51;0
>200310413290004;56;0
>200310413290004;57;0
>200310413290004;76;0
>200310413290004;89;0
>200310413290004;97;0
>200310413290004;98;0
>200310413290004;72;0
>200310413290004;456;0
>200310413290004;141;0
>200310413290004;640;0
>200310413290004;201;0
>200310413290004;764;20
>200310413290005;273;22
>200310413290005;456;0
>200310413290005;22;0
>200310413290005;23;0
>200310413290005;21;21
>200310413290005;141;0
>200310413290005;640;0
>200310413290005;201;0
>200310413290005;43;0
>200310413290005;650;0
>200310413290005;472;0
>200310413290006;456;0
>200310413290006;22;25
>200310413290006;23;25
>200310413290006;21;25
>200310413290006;640;0
>200310413290006;201;0
>200310413290006;43;0
>200310413290006;651;1
>.
>.
>.
>
>BID is the code of my sample-area
>TERRCODE is the code for landscape characteristic for example: 640 ... sun 
>exposed, .
>ANMCODE ist the value of the TERRCODE: for example 0 means „occuring“, 1 
>means „often occuring“, ..
>
>Now my question: is it possible to get a table with the folllowing structure:
>
>
>BID (TERRCODE)4  (TERRCODE)21 ..
>200310413290002 (ANMCODE)0  (ANMCODE)0 ...
>200310413290003 0  0 ..
>200310413290004 0  0 ..
>200310413290005 0  21 ..
>200310413290006 0 . 25 ..
>.
>.
>
>in this example (TERRCODE) and (ANMCODE) is only for explanation and not 
>necessary for further analysis
>
>
>greetings from the snowy tyrol
>
>helli
>
>platform i386-pc-mingw32
>arch i386
>os mingw32
>system i386, mingw32
>status
>major 2
>minor 0.0
>year 2004
>month 10
>day 04
>language R
>
>__
>Verschicken Sie romantische, coole und witzige Bilder per SMS!
>
>__
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
[EMAIL PROTECTED]
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward 
Tufte

_

Re: [R] Data manipulation

2005-02-07 Thread Uwe Ligges
Helmut Kudrnovsky wrote:
Content-Type: text/plain; charset="iso-8859-1"
Received-SPF: none (hypatia: domain of [EMAIL PROTECTED] does not designate permitted sender hosts)
X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by hypatia.math.ethz.ch id j186djX0017423
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on hypatia.math.ethz.ch
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50 autolearn=no version=3.0.2

Hi R-friends,
i have large dataset in the following structure:
BID;TERRCODE;ANMCODE
200310413290002;4;0
200310413290002;80;0
200310413290002;2;0
200310413290002;5;0
200310413290003;3;0
200310413290003;1;0
200310413290003;11;0
200310413290003;26;0
200310413290003;141;21
200310413290003;472;0
200310413290004;3;0
200310413290004;1;0
200310413290004;7;0
200310413290004;18;0
200310413290004;51;0
200310413290004;56;0
200310413290004;57;0
200310413290004;76;0
200310413290004;89;0
200310413290004;97;0
200310413290004;98;0
200310413290004;72;0
200310413290004;456;0
200310413290004;141;0
200310413290004;640;0
200310413290004;201;0
200310413290004;764;20
200310413290005;273;22
200310413290005;456;0
200310413290005;22;0
200310413290005;23;0
200310413290005;21;21
200310413290005;141;0
200310413290005;640;0
200310413290005;201;0
200310413290005;43;0
200310413290005;650;0
200310413290005;472;0
200310413290006;456;0
200310413290006;22;25
200310413290006;23;25
200310413290006;21;25
200310413290006;640;0
200310413290006;201;0
200310413290006;43;0
200310413290006;651;1
.
.
.
BID is the code of my sample-area
TERRCODE is the code for landscape characteristic for example: 640 ... sun 
exposed, .
ANMCODE ist the value of the TERRCODE: for example 0 means „occuring“, 1 means „often 
occuring“, ..
Now my question: is it possible to get a table with the folllowing structure: 

BID (TERRCODE)4  (TERRCODE)21 ..
200310413290002 (ANMCODE)0  (ANMCODE)0 ...
200310413290003 0  0 ..
200310413290004 0  0 ..
200310413290005 0  21 ..
200310413290006 0 . 25 ..

Perhaps, if you explain us the formula you derive those lines from the 
data above. At least I don't understand it.

Uwe Ligges

.
in this example (TERRCODE) and (ANMCODE) is only for explanation and not 
necessary for further analysis
greetings from the snowy tyrol
helli
platform i386-pc-mingw32
arch i386 
os mingw32 
system i386, mingw32 
status 
major 2 
minor 0.0 
year 2004 
month 10 
day 04 
language R

__
Verschicken Sie romantische, coole und witzige Bilder per SMS!
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Data manipulation

2005-02-07 Thread Helmut Kudrnovsky
Content-Type: text/plain; charset="iso-8859-1"
Received-SPF: none (hypatia: domain of [EMAIL PROTECTED] does not designate 
permitted sender hosts)
X-Virus-Scanned: by amavisd-new at stat.math.ethz.ch
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by hypatia.math.ethz.ch id 
j186djX0017423
X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on hypatia.math.ethz.ch
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50 autolearn=no 
version=3.0.2

Hi R-friends,

i have large dataset in the following structure:

BID;TERRCODE;ANMCODE
200310413290002;4;0
200310413290002;80;0
200310413290002;2;0
200310413290002;5;0
200310413290003;3;0
200310413290003;1;0
200310413290003;11;0
200310413290003;26;0
200310413290003;141;21
200310413290003;472;0
200310413290004;3;0
200310413290004;1;0
200310413290004;7;0
200310413290004;18;0
200310413290004;51;0
200310413290004;56;0
200310413290004;57;0
200310413290004;76;0
200310413290004;89;0
200310413290004;97;0
200310413290004;98;0
200310413290004;72;0
200310413290004;456;0
200310413290004;141;0
200310413290004;640;0
200310413290004;201;0
200310413290004;764;20
200310413290005;273;22
200310413290005;456;0
200310413290005;22;0
200310413290005;23;0
200310413290005;21;21
200310413290005;141;0
200310413290005;640;0
200310413290005;201;0
200310413290005;43;0
200310413290005;650;0
200310413290005;472;0
200310413290006;456;0
200310413290006;22;25
200310413290006;23;25
200310413290006;21;25
200310413290006;640;0
200310413290006;201;0
200310413290006;43;0
200310413290006;651;1
.
.
.

BID is the code of my sample-area
TERRCODE is the code for landscape characteristic for example: 640 ... sun 
exposed, .
ANMCODE ist the value of the TERRCODE: for example 0 means „occuring“, 1 means 
„often occuring“, ..

Now my question: is it possible to get a table with the folllowing structure: 


BID (TERRCODE)4  (TERRCODE)21 ..
200310413290002 (ANMCODE)0  (ANMCODE)0 ...
200310413290003 0  0 ..
200310413290004 0  0 ..
200310413290005 0  21 ..
200310413290006 0 . 25 ..
.
.

in this example (TERRCODE) and (ANMCODE) is only for explanation and not 
necessary for further analysis


greetings from the snowy tyrol

helli

platform i386-pc-mingw32
arch i386 
os mingw32 
system i386, mingw32 
status 
major 2 
minor 0.0 
year 2004 
month 10 
day 04 
language R

__
Verschicken Sie romantische, coole und witzige Bilder per SMS!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Data manipulation query

2004-08-03 Thread Vito Ricci
Hi,

see ? quantile to obtain deciles of variable X1

see ? cut to divide the range of 'x' into intervals
and codes the values in 'x' according to which
interval they fall.

se ? table to use the cross-classifying factors to
build a contingency table of the counts at each
combination of factor levels.

Best
Vito


Hi,
Not sure if I am making a simple problem complex but
still here
we go:

I have a data frame with four columns say, X1 X2 X3
and X4. I
want to break X4 into deciles and for each deciles
obtained, I want to
see corresponding elements of X1. 

Ideally, the output should be in a tabular fashion as
shown
below:

Deciles 1 Deciles 2 Deciles 10

X1-1   X1-2  X1-99
X1-5   X1-3 
X1-10

Where X1-1...X1-100 are elements of column X1 that
categorized
as per deciles

Any pointers to help get the right structure would be
greatly
appreciated!!

TIA.

Manoj

=
Diventare costruttori di soluzioni

Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Data manipulation query

2004-08-02 Thread Manoj - Hachibushu Capital
Hi,
Not sure if I am making a simple problem complex but still here
we go:

I have a data frame with four columns say, X1 X2 X3 and X4. I
want to break X4 into deciles and for each deciles obtained, I want to
see corresponding elements of X1. 

Ideally, the output should be in a tabular fashion as shown
below:

Deciles 1 Deciles 2 Deciles 10

X1-1   X1-2  X1-99
X1-5   X1-3 
X1-10

Where X1-1...X1-100 are elements of column X1 that categorized
as per deciles

Any pointers to help get the right structure would be greatly
appreciated!!

TIA.

Manoj

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] data manipulation

2003-09-08 Thread Gabor Grothendieck
And here is a simplification I just noticed:

date.grouping <- function(d) {
  # for ea date in d calculate date beginning 6 month period which contains it
  POSIXct.dates <- as.POSIXct(paste(as.character(d),"01",sep="-"))
  breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf)
  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
}

patients <- read.table("clipboard",header=T)
patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
patients2 <- as.data.frame( patients2 )

summary(patients2)

boxplot(patients2)



--- Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>Sorry but there was an error in the seq statement.  Here it is again.
>
>
>date.grouping <- function(d) {
>  # for ea date in d calculate date beginning 6 month period which contains it
>  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
>  f <- function(x) do.call( "ISOdate", as.list(x) )
>  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
>  breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf)
>  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
>}
>
>patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
>patients2 <- as.data.frame( patients2 )
>
>summary(patients2)
>
>boxplot(patients2)
>
>
>
>--- Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>>
>>Try this.  The function takes a vector of dates of the form -mm and produces a 
>>new character vector of dates of the same form except the 
>>output date is the beginning of the 6 month period in which the input date lies.  
>>The 6 month intervals are measured from the minimum date.
>>
>>date.grouping <- function(d) {
>>  # for ea date in d calculate date beginning 6 month period which contains it
>>  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
>>  f <- function(x) do.call( "ISOdate", as.list(x) )
>>  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
>>  breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf)
>>  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
>>}
>>
>>patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
>>patients2 <- as.data.frame( patients2 )
>>
>>summary(patients2)
>>
>>boxplot(patients2)
>>
>>
>>
>>--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote:
>>>Hi,
>>>
>>>
>>>I am new to R, coming from a few years using Stata. I've been twisting my
>>>brain and checking several R and S references over the last few days to
>>>try to solve this data management problem: I have a data set with a unique
>>>patient identifier that is repeated along multiple rows, a variable with
>>>month of patient encounter, and a continous variable for cost of
>>>individual encounters. The data looks like this:
>>>
>>>ID   datecost
>>>1"2001-01"   200.00
>>>1"2001-01"   123.94
>>>1"2001-03"   100.23
>>>1"2001-04"   150.34
>>>2"2001-03"   296.34
>>>2"2002-05"   156.36
>>>
>>>
>>>I would like to obtain the median costs and boxplots for the sum of
>>>encounters happening in the first six months after the index encounter
>>>(first patient encounter) for each patient, then the mean and median costs
>>>for the costs happening from 6 to 12 months after the index encounter, and
>>>so on. Notice that the first ID has two encounters during the index date,
>>>making it more difficult to define a single row with the index encounter.
>>>
>>>Any help would be appreciated,
>>>
>>>
>>>Ricardo
>>>
>>>
>>>Ricardo Pietrobon, MD
>>>Assistant Professor of Surgery
>>>Duke University Medical Center
>>>Durham, NC 27710 US
>>>
>>>__
>>>[EMAIL PROTECTED] mailing list
>>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>>
>>__
>>[EMAIL PROTECTED] mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>_

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation

2003-09-08 Thread Gabor Grothendieck
Sorry but there was an error in the seq statement.  Here it is again.


date.grouping <- function(d) {
  # for ea date in d calculate date beginning 6 month period which contains it
  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
  f <- function(x) do.call( "ISOdate", as.list(x) )
  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
  breaks <- c(seq(from=min(POSIXct.dates), to=max(POSIXct.dates), by="6 mo"), Inf)
  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
}

patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
patients2 <- as.data.frame( patients2 )

summary(patients2)

boxplot(patients2)



--- Gabor Grothendieck <[EMAIL PROTECTED]> wrote:
>
>Try this.  The function takes a vector of dates of the form -mm and produces a 
>new character vector of dates of the same form except the 
>output date is the beginning of the 6 month period in which the input date lies.  The 
>6 month intervals are measured from the minimum date.
>
>date.grouping <- function(d) {
>  # for ea date in d calculate date beginning 6 month period which contains it
>  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
>  f <- function(x) do.call( "ISOdate", as.list(x) )
>  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
>  breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf)
>  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
>}
>
>patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
>patients2 <- as.data.frame( patients2 )
>
>summary(patients2)
>
>boxplot(patients2)
>
>
>
>--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote:
>>Hi,
>>
>>
>>I am new to R, coming from a few years using Stata. I've been twisting my
>>brain and checking several R and S references over the last few days to
>>try to solve this data management problem: I have a data set with a unique
>>patient identifier that is repeated along multiple rows, a variable with
>>month of patient encounter, and a continous variable for cost of
>>individual encounters. The data looks like this:
>>
>>IDdatecost
>>1 "2001-01"   200.00
>>1 "2001-01"   123.94
>>1 "2001-03"   100.23
>>1 "2001-04"   150.34
>>2 "2001-03"   296.34
>>2 "2002-05"   156.36
>>
>>
>>I would like to obtain the median costs and boxplots for the sum of
>>encounters happening in the first six months after the index encounter
>>(first patient encounter) for each patient, then the mean and median costs
>>for the costs happening from 6 to 12 months after the index encounter, and
>>so on. Notice that the first ID has two encounters during the index date,
>>making it more difficult to define a single row with the index encounter.
>>
>>Any help would be appreciated,
>>
>>
>>Ricardo
>>
>>
>>Ricardo Pietrobon, MD
>>Assistant Professor of Surgery
>>Duke University Medical Center
>>Durham, NC 27710 US
>>
>>__
>>[EMAIL PROTECTED] mailing list
>>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>__
>[EMAIL PROTECTED] mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation

2003-09-08 Thread Gabor Grothendieck

Try this.  The function takes a vector of dates of the form -mm and produces a new 
character vector of dates of the same form except the 
output date is the beginning of the 6 month period in which the input date lies.  The 
6 month intervals are measured from the minimum date.

date.grouping <- function(d) {
  # for ea date in d calculate date beginning 6 month period which contains it
  mat <- matrix(as.numeric(unlist(strsplit(as.character(d),"-"))),nr=2)
  f <- function(x) do.call( "ISOdate", as.list(x) )
  POSIXct.dates <- apply(rbind(mat,1),2,f) + ISOdate(1970,1,1)
  breaks <- c(seq(from=min(POSIXct.dates), along=POSIXct.dates, by="6 mo"), Inf)
  format( as.POSIXct( cut( POSIXct.dates, breaks, include.lowest=T )), "%Y-%m" )
}

patients2 <- with( patients, tapply( cost, list(ID,date.grouping(date)), sum ) )
patients2 <- as.data.frame( patients2 )

summary(patients2)

boxplot(patients2)



--- Ricardo Pietrobon <[EMAIL PROTECTED]> wrote:
>Hi,
>
>
>I am new to R, coming from a few years using Stata. I've been twisting my
>brain and checking several R and S references over the last few days to
>try to solve this data management problem: I have a data set with a unique
>patient identifier that is repeated along multiple rows, a variable with
>month of patient encounter, and a continous variable for cost of
>individual encounters. The data looks like this:
>
>ID datecost
>1  "2001-01"   200.00
>1  "2001-01"   123.94
>1  "2001-03"   100.23
>1  "2001-04"   150.34
>2  "2001-03"   296.34
>2  "2002-05"   156.36
>
>
>I would like to obtain the median costs and boxplots for the sum of
>encounters happening in the first six months after the index encounter
>(first patient encounter) for each patient, then the mean and median costs
>for the costs happening from 6 to 12 months after the index encounter, and
>so on. Notice that the first ID has two encounters during the index date,
>making it more difficult to define a single row with the index encounter.
>
>Any help would be appreciated,
>
>
>Ricardo
>
>
>Ricardo Pietrobon, MD
>Assistant Professor of Surgery
>Duke University Medical Center
>Durham, NC 27710 US
>
>__
>[EMAIL PROTECTED] mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation

2003-09-07 Thread Peter Dalgaard BSA
Ricardo Pietrobon <[EMAIL PROTECTED]> writes:

> IDdatecost
> 1 "2001-01"   200.00
> 1 "2001-01"   123.94
> 1 "2001-03"   100.23
> 1 "2001-04"   150.34
> 2 "2001-03"   296.34
> 2 "2002-05"   156.36
> 
> 
> I would like to obtain the median costs and boxplots for the sum of
> encounters happening in the first six months after the index encounter
> (first patient encounter) for each patient, then the mean and median costs
> for the costs happening from 6 to 12 months after the index encounter, and
> so on. Notice that the first ID has two encounters during the index date,
> making it more difficult to define a single row with the index encounter.
> 
> Any help would be appreciated,

Let's see... You're going to need a bit of slight ugliness to convert
the date to a numeric month number. Something like (NB: That's a code
that means "I didn't actually try this"...)

attach(yourdata)
monthnum <- sapply(strsplit(date,"-"),function(x)sum(as.numeric(x)*c(12,1)))

Then we need a table of the index dates for each person

tbl <- tapply(monthnum, ID, min)

Now subtract the index date from monthnum

months.post.index <- monthnum - tbl[ID]

then you probably want to look at the subset of your original data
frame and do the sums

total.cost.6mo <- with(subset(yourdata,months.post.index < 6), 
   tapply(cost,ID,sum))

and finally

boxplot(total.cost.6mo)
median(total.cost.6mo)

(You could elaborate by converting months.post.index with cut() and
use lapply(names(period),.) to give you a list of tables, which
boxplot() might actually know how to plot directly.)
-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] data manipulation

2003-09-07 Thread Ricardo Pietrobon
Hi,


I am new to R, coming from a few years using Stata. I've been twisting my
brain and checking several R and S references over the last few days to
try to solve this data management problem: I have a data set with a unique
patient identifier that is repeated along multiple rows, a variable with
month of patient encounter, and a continous variable for cost of
individual encounters. The data looks like this:

ID  datecost
1   "2001-01"   200.00
1   "2001-01"   123.94
1   "2001-03"   100.23
1   "2001-04"   150.34
2   "2001-03"   296.34
2   "2002-05"   156.36


I would like to obtain the median costs and boxplots for the sum of
encounters happening in the first six months after the index encounter
(first patient encounter) for each patient, then the mean and median costs
for the costs happening from 6 to 12 months after the index encounter, and
so on. Notice that the first ID has two encounters during the index date,
making it more difficult to define a single row with the index encounter.

Any help would be appreciated,


Ricardo


Ricardo Pietrobon, MD
Assistant Professor of Surgery
Duke University Medical Center
Durham, NC 27710 US

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation: getting mean value every 5 rows

2003-07-28 Thread Federico Calboli
Dear All, 

thanks for exceptional and speedy help. In particular, thanks to J. R.
Lockwood, Sue Paul, Spencer Graves, Dennis J. Murphy and Tony Plate. 

regards,

Federico Calboli

=

Federico C.F. Calboli

Department of Biology
University College London
Room 327
Darwin Building
Gower Street
London
WClE 6BT

Tel: (+44) 020 7679 4395 
Fax (+44) 020 7679 7096
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation: getting mean value every 5 rows

2003-07-28 Thread Tony Plate
> x <- read.table(file("clipboard"), header=T)
> # add an extra field to define groups of 5 sequential rows
> x[,"code"] <- rep(seq(len=nrow(x)/5), each=5)
> x
   temp line cage   number code
118   181 6678.6301
218   181 7774.4581
318   181 7845.9021
418   181 9483.5781
518   181 8983.5551
618   181 9181.0522
718   181 9458.6962
818   181 8138.6162
918   181 7981.9942
10   18   181 7556.4912
11   18   181 7672.1373
12   18   181 6607.7763
13   18   181 8383.6503
14   18   181 7129.8523
15   18   181 8536.6673
16   18   182 8287.8004
17   18   182 7924.4704
18   18   182 7928.4744
19   18   182 7363.1574
20   18   182 7952.5934
> aggregate(x[,"number",drop=F], x[,c("temp", "line", "cage", "code")], mean)
  temp line cage code   number
1   18   1811 8153.225
2   18   1812 8463.370
3   18   1813 7666.016
4   18   1824 7891.299
> # result has an additional column named "code" -- easily eliminated
At Monday 10:47 PM 7/28/2003 +0100, you wrote:
Dear All,

I would like to ask you how to accomplish a little tricky data
manipulation. I have a large dataset, looking something like:
templinecagenumber
18  18  1   6678.63
18  18  1   7774.458
18  18  1   7845.902
18  18  1   9483.578
18  18  1   8983.555
18  18  1   9181.052
18  18  1   9458.696
18  18  1   8138.616
18  18  1   7981.994
18  18  1   7556.491
18  18  1   7672.137
18  18  1   6607.776
18  18  1   8383.65
18  18  1   7129.852
18  18  1   8536.667
18  18  2   8287.8
18  18  2   7924.47
18  18  2   7928.474
18  18  2   7363.157
18  18  2   7952.593
.
I would like to create a dataframe where I get the mean values, 5 rows at a
time, of columns "number", while keeping the value in the other columns
fixed to the vaules found in the first of the 5 rows (or whatever, it's the
same for the 5 rows) so that the above would be "shrunk" to:
templinecagenumber
18  18  1   8153.2246
18  18  1   8463.3698
18  18  1   7666.0164
18  18  2   7891.2988
Any hints?

Regards,

Federico Calboli

=

Federico C.F. Calboli

Department of Biology
University College London
Room 327
Darwin Building
Gower Street
London
WClE 6BT
Tel: (+44) 020 7679 4395
Fax (+44) 020 7679 7096
[EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Tony Plate   [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation: getting mean value every 5 rows

2003-07-28 Thread Spencer Graves
Have you considered "aggregate" [documented in help(aggregate) or 
"www.r-project.org" -> search -> "R site search" or Venables and Ripley, 
Modern Applied Statistics with S]?

hope this helps.  spencer graves

Federico Calboli wrote:
Dear All,

I would like to ask you how to accomplish a little tricky data
manipulation. I have a large dataset, looking something like:
templinecagenumber
18  18  1   6678.63
18  18  1   7774.458
18  18  1   7845.902
18  18  1   9483.578
18  18  1   8983.555
18  18  1   9181.052
18  18  1   9458.696
18  18  1   8138.616
18  18  1   7981.994
18  18  1   7556.491
18  18  1   7672.137
18  18  1   6607.776
18  18  1   8383.65
18  18  1   7129.852
18  18  1   8536.667
18  18  2   8287.8
18  18  2   7924.47
18  18  2   7928.474
18  18  2   7363.157
18  18  2   7952.593
.
I would like to create a dataframe where I get the mean values, 5 rows at a
time, of columns "number", while keeping the value in the other columns
fixed to the vaules found in the first of the 5 rows (or whatever, it's the
same for the 5 rows) so that the above would be "shrunk" to:
temp	line	cage	number	
18	18	1	8153.2246
18	18	1	8463.3698
18	18	1	7666.0164
18	18	2	7891.2988
 
Any hints?

Regards,

Federico Calboli

=

Federico C.F. Calboli

Department of Biology
University College London
Room 327
Darwin Building
Gower Street
London
WClE 6BT
Tel: (+44) 020 7679 4395 
Fax (+44) 020 7679 7096
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] data manipulation: getting mean value every 5 rows

2003-07-28 Thread Federico Calboli
Dear All,

I would like to ask you how to accomplish a little tricky data
manipulation. I have a large dataset, looking something like:

templinecagenumber
18  18  1   6678.63
18  18  1   7774.458
18  18  1   7845.902
18  18  1   9483.578
18  18  1   8983.555
18  18  1   9181.052
18  18  1   9458.696
18  18  1   8138.616
18  18  1   7981.994
18  18  1   7556.491
18  18  1   7672.137
18  18  1   6607.776
18  18  1   8383.65
18  18  1   7129.852
18  18  1   8536.667
18  18  2   8287.8
18  18  2   7924.47
18  18  2   7928.474
18  18  2   7363.157
18  18  2   7952.593
.

I would like to create a dataframe where I get the mean values, 5 rows at a
time, of columns "number", while keeping the value in the other columns
fixed to the vaules found in the first of the 5 rows (or whatever, it's the
same for the 5 rows) so that the above would be "shrunk" to:

templinecagenumber  
18  18  1   8153.2246
18  18  1   8463.3698
18  18  1   7666.0164
18  18  2   7891.2988
 
Any hints?

Regards,

Federico Calboli

=

Federico C.F. Calboli

Department of Biology
University College London
Room 327
Darwin Building
Gower Street
London
WClE 6BT

Tel: (+44) 020 7679 4395 
Fax (+44) 020 7679 7096
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] data manipulation function descriptions

2003-02-14 Thread Luke Tierney
On Fri, 14 Feb 2003 [EMAIL PROTECTED] wrote:

> On Thu, 13 Feb 2003, kjetil brinchmann halvorsen wrote:
> 
> > On 13 Feb 2003 at 17:09, Jason Bond wrote:
> 
> > > case  switch
> > [R-core : switch should be better 
> >announced. It is for   
> > instance not 
> >  mentioned in "An 
> >   introduction to R"]
> 
> Well, that is an *introduction*, not a programmer's guide.  You will find
> switch() is rarely used in R: it is a bit peculiar in its semantics, and 
> something definitely not to be considered introductory.
> 
> On the original question, I think it would be a mistake to translate what
> you know.  R is a vector language, not a pairlist language, and I see
> quite a bit of evidence of convoluted solutions in its internals dating
> from when R was the second.  Chapter 2 of Venables & Ripley (2002) (as in
> the R FAQ) is devoted to using S/R for data manipulation.

As someone reasonably familiar with both languages I have to disagree
with several points here.  First and foremost, despite differences in
surface syntax, as languages xlispstat and R are much more alike than
they are different.  xlispstat is much closer to R than S-plus because
both xlispstat and R use lexical scope, a feature of R that is still
not used as much as it could be.  The main language differences are
the limited form of lazy evaluation used in R, which you can usully
ignore, and the fact that R does not provide mutable data structures,
which is also rarely an issue.  There are other differences, but these
are the main ones that affect coding practices I think.

The basic xlispstat data handling functions mentioned in the original
post are quite similar to corresponding basic functions in R.  This is
not by accident: the choice of functions included in xlispstat was
heavily influenced by what was then called the "New S" language.  As a
result, if you want to create an R version of an xlispstat function
you can often do far worse than start with a fairly direct
transliteration.  In my view at least, good coding practices in
xlispstat are good coding practices for any high level mostly
functional language and carry over quite well to R.

I am sorry if the following seems a bit harsh, but I, and many others
who have worked with lisp, find it extremely frustrating to read
statements about lisp like the one above that suggest that lisp is a
pairlist language only, especially when these statements come from
people I thought knew better.  Lisp dates back to the 1950's.  The
only other language of any consequence still in use from that era is
FORTRAN.  No one would now claim that a major flaw in FORTAN is the
lack of an if-then-else construct.  That was true in the early days
but has not been for several decades.  But for some reason many people
seem very happy to very authoritatively make statements about lisp
that, if they were ever true at all, have not been so for a very long
time indeed.  Pairlists are a very useful data structure for
expressing many algorithms in a functional style.  That is why they
were one of the first data structures in Lisp, and that is why they
are available in virtually all other high level functional languages
(ML, Haskell, Miranda, Clean, ...).  Pailrists are NOT the only data
structure in Lisp.  For many years Lisp has also supported vectors and
arrays, both generic and typed (and other data structures).  Vectors
and pairlists are collectively referred to as sequences, and, if I
remember correctly, all the functions listed in the original post
except mapcar are designed to work on all kinds of sequences (the
sequence version of mapcar is map).  Code written in xlispstat in
terms of sequence functions can often be translated quite easily to R,
and the resulting code will be quite consistent with good R coding
practices.

R does not provide a pairlist data structure. This creates a dilemma
when translating some list-based xlispstat code, or, more importantly,
when implementing an algorithm for which parilists are the natural
data structure to use.  There are two choices: use a vector based
algorithm that may be a bit less natural but fits better with the
basic R data structures, or build your own pairlist abstraction for
this particular problem and write the algorithm the more natural way.
I have used both approaches on different occasions.  I usually prefer
to write an algorithm in the most natural way for the algorithm, since
that usually maximizes the probability that my code is actually
correct.  If this approach requires some additional abstract data
types, be they pairlists or anything else, then I develop and test
them separately and write the main code in terms of these
abstractions.  Occasianally, but not all that often, this results in
code th

Re: [R] data manipulation function descriptions

2003-02-13 Thread ripley
On Thu, 13 Feb 2003, kjetil brinchmann halvorsen wrote:

> On 13 Feb 2003 at 17:09, Jason Bond wrote:

> > case  switch
> [R-core : switch should be better 
>announced. It is for   
> instance not 
>  mentioned in "An 
>   introduction to R"]

Well, that is an *introduction*, not a programmer's guide.  You will find
switch() is rarely used in R: it is a bit peculiar in its semantics, and 
something definitely not to be considered introductory.

On the original question, I think it would be a mistake to translate what
you know.  R is a vector language, not a pairlist language, and I see
quite a bit of evidence of convoluted solutions in its internals dating
from when R was the second.  Chapter 2 of Venables & Ripley (2002) (as in
the R FAQ) is devoted to using S/R for data manipulation.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] data manipulation function descriptions

2003-02-13 Thread kjetil brinchmann halvorsen
On 13 Feb 2003 at 17:09, Jason Bond wrote:

As lisp-stat user, I tried to compile a short dictionary within your 
answer below:

> Hello,
> 
>I'm a recovering xlispstat user, and am trying to become a good R 
> user.  I've looked around on the CRAN doc website and have found quite a 
> few sets of documentation with various level of data manipulation function 
> descriptions (of what I've seen, most relatively low levels), and many with 
> examples of Rs use in statistical analyses.  Although I don't expect to get 
> my wish, ideally, it would be nice to have some sort of data manipulation 
> function guide for programmers.  I guess I'm somewhat of a different case, 
> as I know which functions that I want to use...I just don't know their 
> names...for example, all those great xlispstat functions like:
> 
> remove-duplicatesmore or lessunique()
> sort-data  " sort()
> combine c()
> remove
   x <- c(1,2,3,5,7,9,12,15, 18, 22)
x[-which(x==15)]
> reverse  rev
> butlast
n <- length(x)
x[-n]
> first   x[1] or for a list x[[1]]
> case  switch
[R-core : switch should be better 
   announced. It is for   
instance not 
 mentioned in "An 
  introduction to R"]
> which   which
> mapcar apply, lapply, sapply
> map-elements nothing better than the ones above
> all the string functions paste, strwidth, strwrap, substr, toString
> and many many more,\

Kjetil Halvorsen

> 
> descriptions of a few of which are spread out in various documents.  Part 
> of my problem is clinging to that which I know.  Anyway, any general advice 
> would be greatly appreciated.
> 
>Jason
> 
> At 03:57 PM 2/13/03 -0600, you wrote:
> 
> >?which
> >
> >On Thursday 13 February 2003 03:40 pm, Jason Bond wrote:
> > > Hello.  Sorry for the elementary post.  I've looked through the
> > > documentation, but can't seem to find a function which allows one to
> > > extract the position of an element within a list...for example the position
> > > of the element 4 in the vector c(1,2,4,3,6) is 3.  Thanks much for any
> > > help.
> > >
> > >Jason
> > >
> > > __
> > > [EMAIL PROTECTED] mailing list
> > > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> __
> [EMAIL PROTECTED] mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] data manipulation function descriptions

2003-02-13 Thread Jason Bond
Hello,

  I'm a recovering xlispstat user, and am trying to become a good R 
user.  I've looked around on the CRAN doc website and have found quite a 
few sets of documentation with various level of data manipulation function 
descriptions (of what I've seen, most relatively low levels), and many with 
examples of Rs use in statistical analyses.  Although I don't expect to get 
my wish, ideally, it would be nice to have some sort of data manipulation 
function guide for programmers.  I guess I'm somewhat of a different case, 
as I know which functions that I want to use...I just don't know their 
names...for example, all those great xlispstat functions like:

remove-duplicates
sort-data
combine
remove
reverse
butlast
first
case
which
mapcar
map-elements
all the string functions
and many many more,

descriptions of a few of which are spread out in various documents.  Part 
of my problem is clinging to that which I know.  Anyway, any general advice 
would be greatly appreciated.

  Jason

At 03:57 PM 2/13/03 -0600, you wrote:

?which

On Thursday 13 February 2003 03:40 pm, Jason Bond wrote:
> Hello.  Sorry for the elementary post.  I've looked through the
> documentation, but can't seem to find a function which allows one to
> extract the position of an element within a list...for example the position
> of the element 4 in the vector c(1,2,4,3,6) is 3.  Thanks much for any
> help.
>
>Jason
>
> __
> [EMAIL PROTECTED] mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] Data manipulation

2003-02-07 Thread John Fox
Dear Lew,

You could use the subset argument to lm:

knap.fit1 <- lm(Kweed ~ TREAT, data=knap, 
subset=c(41:60,81:100,101:120,121:140))

(You could alternatively subscript both Kweed and TREAT, rather than just 
TREAT, but this is unnecessarily complicated; as well, you'd need to use 
c() within the subscript, as in Kweed[c(41:60,81:100,101:120,121:140)].)

John

At 03:36 PM 2/7/2003 -0700, Lew wrote:
I am interested in building a model with a subset of data from a column.

The first 6 lines of my data look like this:
QUAD YEAR SITE TREAT HERB TILL PLANT SEED Kweed
1 A4 2002s 1NN NN 55.00
2A10 2002s 1NN NN 60.00
3 B2 2002s 1NN NN 35.00
4 C2 2002s 1NN NN 23.00
5 C9 2002s 1NN NN 70.00
6 11 2002m 1NN NN 22.00

I tried this command to get the subset I want:

> knap.fit1<-(lm(Kweed~TREAT[41:60,81:100,101:120,121:140], data=knap))
No luck.

Can anyone tell me how to code for this subset.


-
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: [EMAIL PROTECTED]
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox
-

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



Re: [R] Data manipulation

2003-02-07 Thread Roger Peng
You might want to try subsetting the data frame first, and then fit the
model.  Something like

knap.sub <- knap[c(41:60,81:100,101:120,121:140), ]
knap.fit1 <- lm(Kweed ~ TREAT, data = knap.sub)

might work for you.

-roger
___
UCLA Department of Statistics
[EMAIL PROTECTED]
http://www.stat.ucla.edu/~rpeng

On Fri, 7 Feb 2003, Lew wrote:

> I am interested in building a model with a subset of data from a column.
> 
> The first 6 lines of my data look like this:
> QUAD YEAR SITE TREAT HERB TILL PLANT SEED Kweed 
> 1 A4 2002s 1NN NN 55.00   
> 2A10 2002s 1NN NN 60.00   
> 3 B2 2002s 1NN NN 35.00  
> 4 C2 2002s 1NN NN 23.00   
> 5 C9 2002s 1NN NN 70.00   
> 6 11 2002m 1NN NN 22.00   
> 
> I tried this command to get the subset I want:
> 
> > knap.fit1<-(lm(Kweed~TREAT[41:60,81:100,101:120,121:140], data=knap))
> No luck.
>  
> Can anyone tell me how to code for this subset.
> 
> Thanks
> 
> Lew Stringer
> M.S. Student- Land Rehabilitation
> Dept. of Land Resources and Environmental Sciences 
> Montana State University
> 822 Leon Johnson Hall
> Bozeman, MT 59717
> Lab:(406)994-6811
> Fax:(406)994-3933
> 
> __
> [EMAIL PROTECTED] mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help



[R] Data manipulation

2003-02-07 Thread Lew
I am interested in building a model with a subset of data from a column.

The first 6 lines of my data look like this:
QUAD YEAR SITE TREAT HERB TILL PLANT SEED Kweed 
1 A4 2002s 1NN NN 55.00   
2A10 2002s 1NN NN 60.00   
3 B2 2002s 1NN NN 35.00  
4 C2 2002s 1NN NN 23.00   
5 C9 2002s 1NN NN 70.00   
6 11 2002m 1NN NN 22.00   

I tried this command to get the subset I want:

> knap.fit1<-(lm(Kweed~TREAT[41:60,81:100,101:120,121:140], data=knap))
No luck.
 
Can anyone tell me how to code for this subset.

Thanks

Lew Stringer
M.S. Student- Land Rehabilitation
Dept. of Land Resources and Environmental Sciences 
Montana State University
822 Leon Johnson Hall
Bozeman, MT 59717
Lab:(406)994-6811
Fax:(406)994-3933

__
[EMAIL PROTECTED] mailing list
http://www.stat.math.ethz.ch/mailman/listinfo/r-help