[R] Simple programming question

2007-05-18 Thread Lauri Nikkinen
Hi R-users,

I have a simple question for R heavy users. If I have a data frame like this


dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr - dfr[order(dfr$categ),]

and I want to score values or points in variable named var3 following this
kind of logic:

1. the highest value of var3 within category (variable named categ) -
high
2. the second highest value - mid
3. lowest value - low

This would be the output of this reasoning:

dfr$score -
factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
dfr

The question is how I do this programmatically in R (i.e. if I have 2000
rows in my dfr)?

I appreciate your help!

Cheers,
Lauri

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Adaikalavan Ramasamy
According to your post you are assuming that there are only 3 unique 
values for var3 within each category. But category C and D have 4 unique 
values for var3.

split(dfr, dfr$categ)
...
$C
   id categ var3 score
3   3 C6  high
7   7 C5   mid
11 11 C3   low
15 15 C1   low
...

If you meant something different, then just change myfun() below


  gmax - function(x, rnk=1){
   ## generalized maximum with rnk=1 being the bigest value (i.e. max)
   return( sort( unique(x), decreasing=T )[rnk] )
  }

  myfun - function(x){ ifelse( x==gmax(x,1), high,
ifelse( x==gmax(x,2), med, low ) ) }

  out   - lapply( split(dfr$var3, dfr$categ), myfun )

  data.frame( dfr, my.score = unsplit(out, dfr$categ) )

Regards, Adai



Lauri Nikkinen wrote:
 Hi R-users,
 
 I have a simple question for R heavy users. If I have a data frame like this
 
 
 dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
 var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
 dfr - dfr[order(dfr$categ),]
 
 and I want to score values or points in variable named var3 following this
 kind of logic:
 
 1. the highest value of var3 within category (variable named categ) -
 high
 2. the second highest value - mid
 3. lowest value - low
 
 This would be the output of this reasoning:
 
 dfr$score -
 factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
 dfr
 
 The question is how I do this programmatically in R (i.e. if I have 2000
 rows in my dfr)?
 
 I appreciate your help!
 
 Cheers,
 Lauri
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Gabor Grothendieck
Try this.  f assigns 1, 2 and 3 to the highest, second highest and third highest
within a category.  ave applies f to each category.  Finally we convert it to a
factor.

f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))



On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
 Hi R-users,

 I have a simple question for R heavy users. If I have a data frame like this


 dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
 var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
 dfr - dfr[order(dfr$categ),]

 and I want to score values or points in variable named var3 following this
 kind of logic:

 1. the highest value of var3 within category (variable named categ) -
 high
 2. the second highest value - mid
 3. lowest value - low

 This would be the output of this reasoning:

 dfr$score -
 factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
 dfr

 The question is how I do this programmatically in R (i.e. if I have 2000
 rows in my dfr)?

 I appreciate your help!

 Cheers,
 Lauri

[[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Dimitris Rizopoulos
try this:

dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr - dfr[order(dfr$categ), ]

dfr$score - unlist(tapply(dfr$var3, dfr$categ, function (x) {
sn - sort(unique(x), decreasing = TRUE)
labs - c(high, mid, rep(low, length(sn) - 2))
labs[match(x, sn)]
}))


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: Lauri Nikkinen [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Friday, May 18, 2007 3:15 PM
Subject: [R] Simple programming question


 Hi R-users,

 I have a simple question for R heavy users. If I have a data frame 
 like this


 dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
 var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
 dfr - dfr[order(dfr$categ),]

 and I want to score values or points in variable named var3 
 following this
 kind of logic:

 1. the highest value of var3 within category (variable named 
 categ) -
 high
 2. the second highest value - mid
 3. lowest value - low

 This would be the output of this reasoning:

 dfr$score -
 factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
 dfr

 The question is how I do this programmatically in R (i.e. if I have 
 2000
 rows in my dfr)?

 I appreciate your help!

 Cheers,
 Lauri

 [[alternative HTML version deleted]]

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Gabor Grothendieck
There was a problem in the first line in the case that the highest number
is not unique within a category.   In this example its not apparent since
that never occurs.  At any rate, it should be:

f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))

Also note that the factor labels were arranged so that
low, mid and high correspond to levels 1, 2 and 3
respectively.

On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Try this.  f assigns 1, 2 and 3 to the highest, second highest and third 
 highest
 within a category.  ave applies f to each category.  Finally we convert it to 
 a
 factor.

 f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
 factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))



 On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
  Hi R-users,
 
  I have a simple question for R heavy users. If I have a data frame like this
 
 
  dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
  var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
  dfr - dfr[order(dfr$categ),]
 
  and I want to score values or points in variable named var3 following this
  kind of logic:
 
  1. the highest value of var3 within category (variable named categ) -
  high
  2. the second highest value - mid
  3. lowest value - low
 
  This would be the output of this reasoning:
 
  dfr$score -
  factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
  dfr
 
  The question is how I do this programmatically in R (i.e. if I have 2000
  rows in my dfr)?
 
  I appreciate your help!
 
  Cheers,
  Lauri
 
 [[alternative HTML version deleted]]
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Gabor Grothendieck
The solution already calculates it as numeric and only after that
does it convert it to factor so just omit the conversion:

f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
score - ave(dfr$var3, dfr$categ, FUN = f)

As mentioned, this assigns 1 to low (everything other than the highest
two numbers in a category), 2 to the second highest and 3 to the highest.

If you want some other assignment, e.g. 3 is low, 1 is mid and 0 is high
then try:

c(3, 1, 0)[score]

On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
 Thank you all for your answers. Actually Gabor's first post was right in
 that sense that I wanted to have low to all cases which are lower than
 second highest. But how about if I want to convert/recode those high,
 mid and low to numeric to make some calculations, e.g. 3, 1, 0
 respectively. How do I have to modify your solutions? I would also like to
 apply this solution to many kinds of recoding situations.

 -Lauri


 2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]:
  There was a problem in the first line in the case that the highest number
  is not unique within a category.   In this example its not apparent since
  that never occurs.  At any rate, it should be:
 
  f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
  factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))
 
  Also note that the factor labels were arranged so that
  low, mid and high correspond to levels 1, 2 and 3
  respectively.
 
  On 5/18/07, Gabor Grothendieck  [EMAIL PROTECTED] wrote:
   Try this.  f assigns 1, 2 and 3 to the highest, second highest and third
 highest
   within a category.  ave applies f to each category.  Finally we convert
 it to a
   factor.
  
   f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
   factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))
  
  
  
   On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
Hi R-users,
   
I have a simple question for R heavy users. If I have a data frame
 like this
   
   
dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr - dfr[order(dfr$categ),]
   
and I want to score values or points in variable named var3
 following this
kind of logic:
   
1. the highest value of var3 within category (variable named categ)
 -
high
2. the second highest value - mid
3. lowest value - low
   
This would be the output of this reasoning:
   
dfr$score -
   
 factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
dfr
   
The question is how I do this programmatically in R (i.e. if I have
 2000
rows in my dfr)?
   
I appreciate your help!
   
Cheers,
Lauri
   
   [[alternative HTML version deleted]]
   
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
   
  
 



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R: Simple programming question

2007-05-18 Thread Guazzetti Stefano
try also this

dfr$score-factor(dfr$var3 %in% sort(unique(dfr$var3), decr=T)[1:2] * dfr$var3,
   labels=c(low, mid, high))
Hope this helps, 

Stefano

-Messaggio originale-
Da: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] conto di Lauri Nikkinen
Inviato: venerdì 18 maggio 2007 15.15
A: r-help@stat.math.ethz.ch
Oggetto: [R] Simple programming question


Hi R-users,

I have a simple question for R heavy users. If I have a data frame like this


dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
dfr - dfr[order(dfr$categ),]

and I want to score values or points in variable named var3 following this
kind of logic:

1. the highest value of var3 within category (variable named categ) -
high
2. the second highest value - mid
3. lowest value - low

This would be the output of this reasoning:

dfr$score -
factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
dfr

The question is how I do this programmatically in R (i.e. if I have 2000
rows in my dfr)?

I appreciate your help!

Cheers,
Lauri

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Lauri Nikkinen
Thank you all for your answers. Actually Gabor's first post was right in
that sense that I wanted to have low to all cases which are lower than
second highest. But how about if I want to convert/recode those high,
mid and low to numeric to make some calculations, e.g. 3, 1, 0
respectively. How do I have to modify your solutions? I would also like to
apply this solution to many kinds of recoding situations.

-Lauri


2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]:

 There was a problem in the first line in the case that the highest number
 is not unique within a category.   In this example its not apparent since
 that never occurs.  At any rate, it should be:

 f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
 factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))

 Also note that the factor labels were arranged so that
 low, mid and high correspond to levels 1, 2 and 3
 respectively.

 On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Try this.  f assigns 1, 2 and 3 to the highest, second highest and third
 highest
  within a category.  ave applies f to each category.  Finally we convert
 it to a
  factor.
 
  f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
  factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))
 
 
 
  On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
   Hi R-users,
  
   I have a simple question for R heavy users. If I have a data frame
 like this
  
  
   dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
   var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
   dfr - dfr[order(dfr$categ),]
  
   and I want to score values or points in variable named var3
 following this
   kind of logic:
  
   1. the highest value of var3 within category (variable named categ)
 -
   high
   2. the second highest value - mid
   3. lowest value - low
  
   This would be the output of this reasoning:
  
   dfr$score -
  
 factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low,low,high,mid,low,low))
   dfr
  
   The question is how I do this programmatically in R (i.e. if I have
 2000
   rows in my dfr)?
  
   I appreciate your help!
  
   Cheers,
   Lauri
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple programming question

2007-05-18 Thread Bert Gunter
?cut

This would recode to a factor with numeric labels for its levels.
as.numeric(as.character(...))would then convert the labels to numeric values
that you can manipulate. This presumes that the variable you are coding is
numeric and you want to recode by binning the values into ordered bins. 


Bert Gunter
Genentech Nonclinical Statistics


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Lauri Nikkinen
Sent: Friday, May 18, 2007 8:02 AM
To: Gabor Grothendieck
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] Simple programming question

Thank you all for your answers. Actually Gabor's first post was right in
that sense that I wanted to have low to all cases which are lower than
second highest. But how about if I want to convert/recode those high,
mid and low to numeric to make some calculations, e.g. 3, 1, 0
respectively. How do I have to modify your solutions? I would also like to
apply this solution to many kinds of recoding situations.

-Lauri


2007/5/18, Gabor Grothendieck [EMAIL PROTECTED]:

 There was a problem in the first line in the case that the highest number
 is not unique within a category.   In this example its not apparent since
 that never occurs.  At any rate, it should be:

 f - function(x) 4 - pmin(3, match(x, sort(unique(x), decreasing = TRUE)))
 factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))

 Also note that the factor labels were arranged so that
 low, mid and high correspond to levels 1, 2 and 3
 respectively.

 On 5/18/07, Gabor Grothendieck [EMAIL PROTECTED] wrote:
  Try this.  f assigns 1, 2 and 3 to the highest, second highest and third
 highest
  within a category.  ave applies f to each category.  Finally we convert
 it to a
  factor.
 
  f - function(x) 4 - pmin(3, match(x, sort(x, decreasing = TRUE)))
  factor(ave(dfr$var3, dfr$categ, FUN = f), lab = c(low, mid, high))
 
 
 
  On 5/18/07, Lauri Nikkinen [EMAIL PROTECTED] wrote:
   Hi R-users,
  
   I have a simple question for R heavy users. If I have a data frame
 like this
  
  
   dfr - data.frame(id=1:16, categ=rep(LETTERS[1:4], 4),
   var3=c(8,7,6,6,5,4,5,4,3,4,3,2,3,2,1,1))
   dfr - dfr[order(dfr$categ),]
  
   and I want to score values or points in variable named var3
 following this
   kind of logic:
  
   1. the highest value of var3 within category (variable named categ)
 -
   high
   2. the second highest value - mid
   3. lowest value - low
  
   This would be the output of this reasoning:
  
   dfr$score -
  

factor(c(high,mid,low,low,high,mid,mid,low,high,mid,low
,low,high,mid,low,low))
   dfr
  
   The question is how I do this programmatically in R (i.e. if I have
 2000
   rows in my dfr)?
  
   I appreciate your help!
  
   Cheers,
   Lauri
  
  [[alternative HTML version deleted]]
  
   __
   R-help@stat.math.ethz.ch mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.