Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Karin Lagesen [EMAIL PROTECTED] wrote:
 Gustaf Rydevik [EMAIL PROTECTED] writes:


  The second warning message tells you that:
   2: the condition has length  1 and only the first element will be
  used in: if (terminus  origin)
 
  You are comparing two vectors,  which generate a vector of TRUE/FALSE 
  values.
  The if statement need a single TRUE/FALSE value.
  Either use a for loop:
  for (i in 1:nrow(terminus)) {if terminus[i] origin[i]...}
  or a nested ifelse statement (which is recommendable on such a big data 
  set).

 Thankyou for your reply! I will certainly try the numeric thing.

 Now, for the vector comparison. I can easily see how you would do a
 for loop here, but I am unable to see how a nested ifelse statement
 would work. Could you possibly give me an example?

 Thankyou again for your help!

 Karin
 --
 Karin Lagesen, PhD student
 [EMAIL PROTECTED]
 http://folk.uio.no/karinlag


You replace each instance of if with ifelse, inserting a comma after
the logical test, and instead of the else statement.  The end result
would become (if I've not made a mistake):

_
replicator-rep(c(LEFT,RIGHT),50)
terminus-rnorm(100)
origin-rnorm(100)
begin-rnorm(100)
length-sample(1:100,100,replace=T)

d-ifelse(terminusorigin,
+ifelse(replicator==LEFT,-((origin-begin))%%length),(begin-origin)),
+ifelse(replicator==LEFT,(origin-begin),-((begin-origin)%%length))
+)

/Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Karin Lagesen [EMAIL PROTECTED] wrote:

 Sorry about this one being long, and I apologise beforehand if there
 is something obvious here that I have missed. I am new to creating my
 own functions in R, and I am uncertain of how they work.

 I have a data set that I have read into a data frame:

  gctable[1:5,]
  refseq geometry X60_origin X60_terminus  length  kingdom
 1 NC_009484  cir179   773000 3389227 Bacteria
 2 NC_009484  cir179   773000 3389227 Bacteria
 3 NC_009484  cir179   773000 3389227 Bacteria
 4 NC_009484  cir179   773000 3389227 Bacteria
 5 NC_009484  cir179   773000 3389227 Bacteria
   grp feature gene begin dir gc_content replicor LEADLAG
 1 Alphaproteobacteria CDS  CDS   261   +   0.654244RIGHTLEAD
 2 Alphaproteobacteria CDS  CDS  1737   -   0.651408RIGHT LAG
 3 Alphaproteobacteria CDS  CDS  2902   +   0.607843RIGHTLEAD
 4 Alphaproteobacteria CDS  CDS  3693   +   0.617647RIGHTLEAD
 5 Alphaproteobacteria CDS  CDS  4227   +   0.699208RIGHTLEAD
 

 Most of these columns are factors.

 Now, I have a function that I would like to employ on this data
 frame. Right now I cannot get it to work, and that seems to be due to
 the columns in the data frame being factors. I tested it with a data
 frame created from vectors, and it worked fine.

 The function:

 percentdistance - function(origin, terminus, length, begin, replicor){
 print(c(origin, terminus, length, begin, repl))
 d = 0
 if (terminusorigin) {
   if(replicor==LEFT) {
 d = -((origin-begin)%%length)
   }
 else {
 d = (begin-origin)
   }
 }
 else {
   if (replicor==LEFT) {
 d=(origin-begin)
   }
   else{
 d = -((begin-origin)%%length)
   }
 }
 d/length*2
 }

 The error I get:
  percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, 
  gctable$begin, gctable$replicor)
 [1]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
[19]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
[37]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
[55]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
[73]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
[91]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
   [109]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
   [127]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
 87
 .[99919]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2  
  2   2
 [99937]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
 2
 [99955]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
 2
 [99973]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
 2
 [1]   2   2   2   2   2   2   2   2   2
  [ reached getOption(max.print) -- omitted 8526091 entries ]]
 Error in if (terminus  origin) { : missing value where TRUE/FALSE needed
 In addition: Warning messages:
 1:  not meaningful for factors in: Ops.factor(terminus, origin)
 2: the condition has length  1 and only the first element will be used in: 
 if (terminus  origin) {
 

 This worked nice when the input were columns from a data frame created
 from vectors.

 I have also tried the different apply-functions, although I am
 uncertain of which one would be appropriate here.


...

 Karin
 --
 Karin Lagesen, PhD student
 [EMAIL PROTECTED]
 http://folk.uio.no/karinlag


Hej Karin!

A couple of things:
First, the first warning message tells you that:
1:  not meaningful for factors in: Ops.factor(terminus, origin).

Thus, terminus and origin are factor variables, which cannot be
ordered. You have to convert
them to numerical variables (See the faq for HowTo)

The second warning message tells you that:
 2: the condition has length  1 and only the first element will be
used in: if (terminus  origin)

You are comparing two vectors,  which generate a vector of TRUE/FALSE values.
The if statement need a single TRUE/FALSE value.
Either use a for loop:
for (i in 1:nrow(terminus)) {if terminus[i] origin[i]...}
or a nested ifelse statement (which is recommendable on such a big data set).


best,

Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.