Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Karin Lagesen <[EMAIL PROTECTED]> wrote:
>
> Sorry about this one being long, and I apologise beforehand if there
> is something obvious here that I have missed. I am new to creating my
> own functions in R, and I am uncertain of how they work.
>
> I have a data set that I have read into a data frame:
>
> > gctable[1:5,]
>  refseq geometry X60_origin X60_terminus  length  kingdom
> 1 NC_009484  cir179   773000 3389227 Bacteria
> 2 NC_009484  cir179   773000 3389227 Bacteria
> 3 NC_009484  cir179   773000 3389227 Bacteria
> 4 NC_009484  cir179   773000 3389227 Bacteria
> 5 NC_009484  cir179   773000 3389227 Bacteria
>   grp feature gene begin dir gc_content replicor LEADLAG
> 1 Alphaproteobacteria CDS  CDS   261   +   0.654244RIGHTLEAD
> 2 Alphaproteobacteria CDS  CDS  1737   -   0.651408RIGHT LAG
> 3 Alphaproteobacteria CDS  CDS  2902   +   0.607843RIGHTLEAD
> 4 Alphaproteobacteria CDS  CDS  3693   +   0.617647RIGHTLEAD
> 5 Alphaproteobacteria CDS  CDS  4227   +   0.699208RIGHTLEAD
> >
>
> Most of these columns are factors.
>
> Now, I have a function that I would like to employ on this data
> frame. Right now I cannot get it to work, and that seems to be due to
> the columns in the data frame being factors. I tested it with a data
> frame created from vectors, and it worked fine.
>
> The function:
>
> percentdistance <- function(origin, terminus, length, begin, replicor){
> print(c(origin, terminus, length, begin, repl))
> d = 0
> if (terminus>origin) {
>   if(replicor=="LEFT") {
> d = -((origin-begin)%%length)
>   }
> else {
> d = (begin-origin)
>   }
> }
> else {
>   if (replicor=="LEFT") {
> d=(origin-begin)
>   }
>   else{
> d = -((begin-origin)%%length)
>   }
> }
> d/length*2
> }
>
> The error I get:
> > percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, 
> > gctable$begin, gctable$replicor)
> [1]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[19]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[37]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[55]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[73]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>[91]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>   [109]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
>   [127]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  
> 87
> .[99919]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2  
>  2   2
> [99937]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
> 2
> [99955]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
> 2
> [99973]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
> 2
> [1]   2   2   2   2   2   2   2   2   2
>  [ reached getOption("max.print") -- omitted 8526091 entries ]]
> Error in if (terminus > origin) { : missing value where TRUE/FALSE needed
> In addition: Warning messages:
> 1: > not meaningful for factors in: Ops.factor(terminus, origin)
> 2: the condition has length > 1 and only the first element will be used in: 
> if (terminus > origin) {
> >
>
> This worked nice when the input were columns from a data frame created
> from vectors.
>
> I have also tried the different apply-functions, although I am
> uncertain of which one would be appropriate here.
>
>
...
>
> Karin
> --
> Karin Lagesen, PhD student
> [EMAIL PROTECTED]
> http://folk.uio.no/karinlag


Hej Karin!

A couple of things:
First, the first warning message tells you that:
1: > not meaningful for factors in: Ops.factor(terminus, origin).

Thus, terminus and origin are factor variables, which cannot be
ordered. You have to convert
them to numerical variables (See the faq for HowTo)

The second warning message tells you that:
 2: the condition has length > 1 and only the first element will be
used in: if (terminus > origin)

You are comparing two vectors,  which generate a vector of TRUE/FALSE values.
The "if" statement need a single TRUE/FALSE value.
Either use a for loop:
for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
or a nested ifelse statement (which is recommendable on such a big data set).


best,

Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Gustaf Rydevik <[EMAIL PROTECTED]> wrote:
> On 9/19/07, Karin Lagesen <[EMAIL PROTECTED]> wrote:
> > "Gustaf Rydevik" <[EMAIL PROTECTED]> writes:
> >
> >
> > > The second warning message tells you that:
> > >  2: the condition has length > 1 and only the first element will be
> > > used in: if (terminus > origin)
> > >
> > > You are comparing two vectors,  which generate a vector of TRUE/FALSE 
> > > values.
> > > The "if" statement need a single TRUE/FALSE value.
> > > Either use a for loop:
> > > for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
> > > or a nested ifelse statement (which is recommendable on such a big data 
> > > set).
> >
> > Thankyou for your reply! I will certainly try the numeric thing.
> >
> > Now, for the vector comparison. I can easily see how you would do a
> > for loop here, but I am unable to see how a nested ifelse statement
> > would work. Could you possibly give me an example?
> >
> > Thankyou again for your help!
> >
> > Karin
> > --
> > Karin Lagesen, PhD student
> > [EMAIL PROTECTED]
> > http://folk.uio.no/karinlag
> >
>
> You replace each instance of "if" with ifelse, inserting a comma after
> the logical test, and instead of the else statement.  The end result
> would become (if I've not made a mistake):
>
> _
> replicator<-rep(c("LEFT","RIGHT"),50)
> terminus<-rnorm(100)
> origin<-rnorm(100)
> begin<-rnorm(100)
> length<-sample(1:100,100,replace=T)
>
> d<-ifelse(terminus>origin,
> +ifelse(replicator=="LEFT",-((origin-begin))%%length),(begin-origin)),
> +ifelse(replicator=="LEFT",(origin-begin),-((begin-origin)%%length))
> +)
>
> /Gustaf
>
>
> --
> Gustaf Rydevik, M.Sci.
> tel: +46(0)703 051 451
> address:Essingetorget 40,112 66 Stockholm, SE
> skype:gustaf_rydevik
>

Sorry, forgot to remove the plusses, and had a parenthesis wrong...

__
replicator<-rep(c("LEFT","RIGHT"),50)
terminus<-rnorm(100)
origin<-rnorm(100)
begin<-rnorm(100)
length<-sample(1:100,100,replace=T)

d<-ifelse(terminus>origin,
  ifelse(replicator=="LEFT",-((origin-begin)%%length),(begin-origin)),
  ifelse(replicator=="LEFT",(origin-begin),-((begin-origin)%%length))
  )
___


best,

Gustaf
-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] function on factors - how best to proceed

2007-09-19 Thread Karin Lagesen

Sorry about this one being long, and I apologise beforehand if there
is something obvious here that I have missed. I am new to creating my
own functions in R, and I am uncertain of how they work.

I have a data set that I have read into a data frame:

> gctable[1:5,]
 refseq geometry X60_origin X60_terminus  length  kingdom
1 NC_009484  cir179   773000 3389227 Bacteria
2 NC_009484  cir179   773000 3389227 Bacteria
3 NC_009484  cir179   773000 3389227 Bacteria
4 NC_009484  cir179   773000 3389227 Bacteria
5 NC_009484  cir179   773000 3389227 Bacteria
  grp feature gene begin dir gc_content replicor LEADLAG
1 Alphaproteobacteria CDS  CDS   261   +   0.654244RIGHTLEAD
2 Alphaproteobacteria CDS  CDS  1737   -   0.651408RIGHT LAG
3 Alphaproteobacteria CDS  CDS  2902   +   0.607843RIGHTLEAD
4 Alphaproteobacteria CDS  CDS  3693   +   0.617647RIGHTLEAD
5 Alphaproteobacteria CDS  CDS  4227   +   0.699208RIGHTLEAD
> 

Most of these columns are factors.

Now, I have a function that I would like to employ on this data
frame. Right now I cannot get it to work, and that seems to be due to
the columns in the data frame being factors. I tested it with a data
frame created from vectors, and it worked fine.

The function:

percentdistance <- function(origin, terminus, length, begin, replicor){
print(c(origin, terminus, length, begin, repl))
d = 0
if (terminus>origin) {
  if(replicor=="LEFT") {
d = -((origin-begin)%%length)
  }
else {
d = (begin-origin)
  }
}
else {
  if (replicor=="LEFT") {
d=(origin-begin)
  }
  else{
d = -((begin-origin)%%length)
  }
}
d/length*2
}

The error I get:
> percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, 
> gctable$begin, gctable$replicor)
[1]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [19]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [37]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [55]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [73]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
   [91]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
  [109]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
  [127]  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87  87
.[99919]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   
2   2
[99937]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99955]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[99973]   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2   2
[1]   2   2   2   2   2   2   2   2   2
 [ reached getOption("max.print") -- omitted 8526091 entries ]]
Error in if (terminus > origin) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: > not meaningful for factors in: Ops.factor(terminus, origin) 
2: the condition has length > 1 and only the first element will be used in: if 
(terminus > origin) { 
> 

This worked nice when the input were columns from a data frame created
from vectors.

I have also tried the different apply-functions, although I am
uncertain of which one would be appropriate here.


I would like to use this function to create a new data frame which
would look something like this:

new_frame = (gctable$feature, gctable$gene, gctable$kingdom, gctable$grp, 
gctable$gc_content, percentdistance(gctable))

I am uncertain of how to proceed. Should I deconstruct the data frame
within the function, or should I get just the numbers out of the
factors and input that into the function? Or is my solution way off
from how things are done in R?

Thankyou very much for your help!

Karin
-- 
Karin Lagesen, PhD student
[EMAIL PROTECTED]
http://folk.uio.no/karinlag

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] function on factors - how best to proceed

2007-09-19 Thread Gustaf Rydevik
On 9/19/07, Karin Lagesen <[EMAIL PROTECTED]> wrote:
> "Gustaf Rydevik" <[EMAIL PROTECTED]> writes:
>
>
> > The second warning message tells you that:
> >  2: the condition has length > 1 and only the first element will be
> > used in: if (terminus > origin)
> >
> > You are comparing two vectors,  which generate a vector of TRUE/FALSE 
> > values.
> > The "if" statement need a single TRUE/FALSE value.
> > Either use a for loop:
> > for (i in 1:nrow(terminus)) {if terminus[i]> origin[i]...}
> > or a nested ifelse statement (which is recommendable on such a big data 
> > set).
>
> Thankyou for your reply! I will certainly try the numeric thing.
>
> Now, for the vector comparison. I can easily see how you would do a
> for loop here, but I am unable to see how a nested ifelse statement
> would work. Could you possibly give me an example?
>
> Thankyou again for your help!
>
> Karin
> --
> Karin Lagesen, PhD student
> [EMAIL PROTECTED]
> http://folk.uio.no/karinlag
>

You replace each instance of "if" with ifelse, inserting a comma after
the logical test, and instead of the else statement.  The end result
would become (if I've not made a mistake):

_
replicator<-rep(c("LEFT","RIGHT"),50)
terminus<-rnorm(100)
origin<-rnorm(100)
begin<-rnorm(100)
length<-sample(1:100,100,replace=T)

d<-ifelse(terminus>origin,
+ifelse(replicator=="LEFT",-((origin-begin))%%length),(begin-origin)),
+ifelse(replicator=="LEFT",(origin-begin),-((begin-origin)%%length))
+)

/Gustaf


-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.