Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Karin Lagesen <[EMAIL PROTECTED]> wrote:
>
> Sorry about this one being long, and I apologise beforehand if there
> is something obvious here that I have missed. I am new to creating my
> own functions in R, and I am uncertain of how they work.
>
> I have a data set that I have read into a data frame:
>
> > gctable[1:5,]
>  refseq geometry X60_origin X60_terminus  length  kingdom
> 1 NC_009484  cir179   773000 3389227 Bacteria
> 2 NC_009484  cir179   773000 3389227 Bacteria
> 3 NC_009484  cir179   773000 3389227 Bacteria
> 4 NC_009484  cir179   773000 3389227 Bacteria
> 5 NC_009484  cir179   773000 3389227 Bacteria
>   grp feature gene begin dir gc_content replicor LEADLAG
> 1 Alphaproteobacteria CDS  CDS   261   +   0.654244RIGHTLEAD
> 2 Alphaproteobacteria CDS  CDS  1737   -   0.651408RIGHT LAG
> 3 Alphaproteobacteria CDS  CDS  2902   +   0.607843RIGHTLEAD
> 4 Alphaproteobacteria CDS  CDS  3693   +   0.617647RIGHTLEAD
> 5 Alphaproteobacteria CDS  CDS  4227   +   0.699208RIGHTLEAD
> >
>
> Most of these columns are factors.
>
> Now, I have a function that I would like to employ on this data
> frame. Right now I cannot get it to work, and that seems to be due to
> the columns in the data frame being factors. I tested it with a data
> frame created from vectors, and it worked fine.
>
> The function:
>
> percentdistance <- function(origin, terminus, length, begin, replicor){
> print(c(origin, terminus, length, begin, repl))
> d = 0
> if (terminus>origin) {
>   if(replicor=="LEFT") {
> d = -((origin-begin)%%length)
>   }
> else {
> d = (begin-origin)
>   }
> }
> else {
>   if (replicor=="LEFT") {
> d=(origin-begin)
>   }
>   else{
> d = -((begin-origin)%%length)
>   }
> }
> d/length*2
> }
>
> The error I get:
> > percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, 
> > gctable$begin, gctable$replicor)
> [1]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[19]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[37]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[55]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[73]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[91]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>   [109]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>   [127]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
> .[99919]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2  
>  2   2
> [99937]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
> 2
> [99955]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
> 2
> [99973]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
> 2
> [1]   2   2   2   2   2   2   2   2   2
>  [ reached getOption("max.print") -- omitted 8526091 entries ]]
> Error in if (terminus > origin) { : missing value where TRUE/FALSE needed
> In addition: Warning messages:
> 1: > not meaningful for factors in: Ops.factor(terminus, origin)
> 2: the condition has length > 1 and only the first element will be used in: 
> if (terminus > origin) {
> >
>
> This worked nice when the input were columns from a data frame created
> from vectors.
>
> I have also tried the different apply-functions, although I am
> uncertain of which one would be appropriate here.
>
>
...
>
> Karin
> --
> Karin Lagesen, PhD student
> [EMAIL PROTECTED]
> http://folk.uio.no/karinlag


Hej Karin!

A couple of things:
First, the first warning message tells you that:
1: > not meaningful for factors in: Ops.factor(terminus, origin).

Thus, terminus and origin are factor variables, which cannot be
ordered. You have to convert
them to numerical variables (See the faq for HowTo)

The second warning message tells you that:
 2: the condition has length > 1 and only the first element will be
used in: if (terminus > origin)

You are comparing two vectors,  which generate a vector of TRUE/FALSE values.
The "if" statement need a single TRUE/FALSE value.
Either use a for loop:
for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
or a nested ifelse statement (which is recommendable on such a big data set).


best,

Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Gustaf Rydevik <[EMAIL PROTECTED]> wrote:
> On 9/19/07, Karin Lagesen <[EMAIL PROTECTED]> wrote:
> > "Gustaf Rydevik" <[EMAIL PROTECTED]> writes:
> >
> >
> > > The second warning message tells you that:
> > >  2: the condition has length > 1 and only the first element will be
> > > used in: if (terminus > origin)
> > >
> > > You are comparing two vectors,  which generate a vector of TRUE/FALSE 
> > > values.
> > > The "if" statement need a single TRUE/FALSE value.
> > > Either use a for loop:
> > > for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
> > > or a nested ifelse statement (which is recommendable on such a big data 
> > > set).
> >
> > Thankyou for your reply! I will certainly try the numeric thing.
> >
> > Now, for the vector comparison. I can easily see how you would do a
> > for loop here, but I am unable to see how a nested ifelse statement
> > would work. Could you possibly give me an example?
> >
> > Thankyou again for your help!
> >
> > Karin
> > --
> > Karin Lagesen, PhD student
> > [EMAIL PROTECTED]
> > http://folk.uio.no/karinlag
> >
>
> You replace each instance of "if" with ifelse, inserting a comma after
> the logical test, and instead of the else statement.  The end result
> would become (if I've not made a mistake):
>
> _
> replicator<-rep(c("LEFT","RIGHT"),50)
> terminus<-rnorm(100)
> origin<-rnorm(100)
> begin<-rnorm(100)
> length<-sample(1:100,100,replace=T)
>
> d<-ifelse(terminus>origin,
> +ifelse(replicator=="LEFT",-((origin-begin))%%length),(begin-origin)),
> +ifelse(replicator=="LEFT",(origin-begin),-((begin-origin)%%length))
> +)
>
> /Gustaf
>
>
> --
> Gustaf Rydevik, M.Sci.
> tel: +46(0)703 051 451
> address:Essingetorget 40,112 66 Stockholm, SE
> skype:gustaf_rydevik
>

Sorry, forgot to remove the plusses, and had a parenthesis wrong...

__
replicator<-rep(c("LEFT","RIGHT"),50)
terminus<-rnorm(100)
origin<-rnorm(100)
begin<-rnorm(100)
length<-sample(1:100,100,replace=T)

d<-ifelse(terminus>origin,
  ifelse(replicator=="LEFT",-((origin-begin)%%length),(begin-origin)),
  ifelse(replicator=="LEFT",(origin-begin),-((begin-origin)%%length))
  )
___


best,

Gustaf
-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Karin Lagesen <[EMAIL PROTECTED]> wrote:
> "Gustaf Rydevik" <[EMAIL PROTECTED]> writes:
>
>
> > The second warning message tells you that:
> >  2: the condition has length > 1 and only the first element will be
> > used in: if (terminus > origin)
> >
> > You are comparing two vectors,  which generate a vector of TRUE/FALSE 
> > values.
> > The "if" statement need a single TRUE/FALSE value.
> > Either use a for loop:
> > for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
> > or a nested ifelse statement (which is recommendable on such a big data 
> > set).
>
> Thankyou for your reply! I will certainly try the numeric thing.
>
> Now, for the vector comparison. I can easily see how you would do a
> for loop here, but I am unable to see how a nested ifelse statement
> would work. Could you possibly give me an example?
>
> Thankyou again for your help!
>
> Karin
> --
> Karin Lagesen, PhD student
> [EMAIL PROTECTED]
> http://folk.uio.no/karinlag
>

You replace each instance of "if" with ifelse, inserting a comma after
the logical test, and instead of the else statement.  The end result
would become (if I've not made a mistake):

_
replicator<-rep(c("LEFT","RIGHT"),50)
terminus<-rnorm(100)
origin<-rnorm(100)
begin<-rnorm(100)
length<-sample(1:100,100,replace=T)

d<-ifelse(terminus>origin,
+ifelse(replicator=="LEFT",-((origin-begin))%%length),(begin-origin)),
+ifelse(replicator=="LEFT",(origin-begin),-((begin-origin)%%length))
+)

/Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.