subject:"Re\: \[R\] Median computation"

Re: [R] Median computation

2012-05-23 Thread Preeti

Hello Everybody,

The code:

dfmed-lapply(unique(colnames(df)), function(x)
rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

takes really long time to execute ( in hours). Is there a faster way to do
this?

Thanks!

On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:

 Thanks Henrik! Here is the one-liner that I wrote:

 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

 Thanks again!


 On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson 
 h...@biostat.ucsf.eduwrote:

 See rowMedians() of the matrixStats package for replacing apply(x,
 MARGIN=1, FUN=median). /Henrik

 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:
  Hi,
 
  I have a 250,000 by 300 matrix. I am trying to calculate the median of
  those columns (by row) with column names that are identical. I would
 like
  this to be efficient since apply(x,1,median) where x is created by
 choosing
  only those columns with same column name and looping on this is taking a
  really long time. Is there an efficient way to do this?
 
  Thanks!
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread Bert Gunter

Assuming your original matrix IS a matrix, call it yourmat,  and not a
data frame (whose columns **must* have unique names if you haven't
messed with the check.names default) then maybe:

 UNTESTED!!! ###
thenames - unique(dimnames(yourmat)[[2]])
ans - lapply(thenames, function(nm, {
   apply( yourmat[, thenames==nm],1, median,na.rm=TRUE)
   })

If I got it right, ans should be a list of vectors, one per unique
column name, each of which gives rowwise medians of the columns with
the same name. This can be combined into a new matrix e.g. by
do.call(cbind,ans)  if you like. You could get a matrix answer
directly if you use sapply or, maybe faster, vapply instead of lapply,
but I find lists simpler to begin with.

I believe this should be reasonably fast. Converting to and from data
frames and operating on data frames slows things down a lot, because
these are very general structures that must keep track of a lot of
overhead when being worked on. Matrices do not.

-- Bert


On Wed, May 23, 2012 at 9:46 AM, Preeti pre...@sci.utah.edu wrote:
 Hello Everybody,

 The code:

 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

 takes really long time to execute ( in hours). Is there a faster way to do
 this?

 Thanks!

 On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:

 Thanks Henrik! Here is the one-liner that I wrote:

 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

 Thanks again!


 On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson 
 h...@biostat.ucsf.eduwrote:

 See rowMedians() of the matrixStats package for replacing apply(x,
 MARGIN=1, FUN=median). /Henrik

 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:
  Hi,
 
  I have a 250,000 by 300 matrix. I am trying to calculate the median of
  those columns (by row) with column names that are identical. I would
 like
  this to be efficient since apply(x,1,median) where x is created by
 choosing
  only those columns with same column name and looping on this is taking a
  really long time. Is there an efficient way to do this?
 
  Thanks!
 
         [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread Benno Pütz

I wonder how you do this (or maybe on what kind of machine you execute it).

I tried it out of curiosity and get

 df = as.data.frame(lapply(1:300,function(x)sample(200,25,T)))
 colnames(df) = sample(letters[1:20],300,T)
 system.time(dfmed-lapply(unique(colnames(df)), function(x)
+ rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
   user  system elapsed 
  5.680   0.952   7.171 

and those times are in seconds! The time consuming part was building the 
data.frame not the calculation.

The only thing I noticed is that my R process claims some 1.4 GB of memory but 
that should not be a problem on any recent hardware but my guess at answering 
your question would be that this might be your problem, especially if you have 
other memory-hogging variables like this data frame lying around and you see 
severe memory swapping effects

Benno

 Hello Everybody,
 
 The code:
 
 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
 
 takes really long time to execute ( in hours). Is there a faster way to do
 this?
 
 Thanks!
 
 On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:
 
 Thanks Henrik! Here is the one-liner that I wrote:
 
 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))
 
 Thanks again!
 
 
 On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson 
 h...@biostat.ucsf.eduwrote:
 
 See rowMedians() of the matrixStats package for replacing apply(x,
 MARGIN=1, FUN=median). /Henrik
 
 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:
 Hi,
 
 I have a 250,000 by 300 matrix. I am trying to calculate the median of
 those columns (by row) with column names that are identical. I would
 like
 this to be efficient since apply(x,1,median) where x is created by
 choosing
 only those columns with same column name and looping on this is taking a
 really long time. Is there an efficient way to do this?
 
 Thanks!
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

Benno Pütz
Statistical Genetics
MPI of Psychiatry
Kraepelinstr. 2-10
80804 Munich, Germany
T: ++49-(0)89-306 22 222
F: ++49-(0)89-306 22 601




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread Preeti

Hmm.. that is interesting... I did this on our server machine which has
about 200 cores. So memory is not an issue. Also, building the dataframe
takes about a few minutes maximum for me. My code is similar to yours but
for the fact that I create my dataframe from read.delim(filename) and
then I drop the first column because it has characters. I don't know why it
takes long on my machine.

On Wed, May 23, 2012 at 11:26 AM, Benno Pütz pu...@mpipsykl.mpg.de wrote:

 I wonder how you do this (or maybe on what kind of machine you execute it).

 I tried it out of curiosity and get

  df = as.data.frame(lapply(1:300,function(x)sample(200,25,T)))
  colnames(df) = sample(letters[1:20],300,T)
  system.time(dfmed-lapply(unique(colnames(df)), function(x)
 + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
user  system elapsed
   5.680   0.952   7.171

 and those times are in seconds! The time consuming part was building the
 data.frame not the calculation.

 The only thing I noticed is that my R process claims some 1.4 GB of memory
 but that should not be a problem on any recent hardware but my guess at
 answering your question would be that this might be your problem,
 especially if you have other memory-hogging variables like this data frame
 lying around and you see severe memory swapping effects

 Benno

 Hello Everybody,

 The code:

 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

 takes really long time to execute ( in hours). Is there a faster way to do
 this?

 Thanks!

 On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:

 Thanks Henrik! Here is the one-liner that I wrote:


 dfmed-lapply(unique(colnames(df)), function(x)

 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))


 Thanks again!



 On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson h...@biostat.ucsf.edu
 wrote:


 See rowMedians() of the matrixStats package for replacing apply(x,

 MARGIN=1, FUN=median). /Henrik


 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:

 Hi,


 I have a 250,000 by 300 matrix. I am trying to calculate the median of

 those columns (by row) with column names that are identical. I would

 like

 this to be efficient since apply(x,1,median) where x is created by

 choosing

 only those columns with same column name and looping on this is taking a

 really long time. Is there an efficient way to do this?


 Thanks!


   [[alternative HTML version deleted]]


 __

 R-help@r-project.org mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help

 PLEASE do read the posting guide

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.





 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Benno Pütz
 Statistical Genetics
 MPI of Psychiatry
 Kraepelinstr. 2-10
 80804 Munich, Germany
 T: ++49-(0)89-306 22 222
 F: ++49-(0)89-306 22 601





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread Henrik Bengtsson

Just adding a few cents to this:

rowMedians(x) is roughly 4-10 times faster than apply(x, MARGIN=1,
FUN=median) - at least on my local Windows 7 64bit tests.  You can do
these simple benchmark runs yourself via the
matrixStats/tests/rowMedians.R system test, cf. http://goo.gl/YCJed
[R-forge].

/Henrik

On Wed, May 23, 2012 at 10:30 AM, Preeti pre...@sci.utah.edu wrote:
 Hmm.. that is interesting... I did this on our server machine which has
 about 200 cores. So memory is not an issue. Also, building the dataframe
 takes about a few minutes maximum for me. My code is similar to yours but
 for the fact that I create my dataframe from read.delim(filename) and
 then I drop the first column because it has characters. I don't know why it
 takes long on my machine.

 On Wed, May 23, 2012 at 11:26 AM, Benno Pütz pu...@mpipsykl.mpg.de wrote:

 I wonder how you do this (or maybe on what kind of machine you execute it).

 I tried it out of curiosity and get

  df = as.data.frame(lapply(1:300,function(x)sample(200,25,T)))
  colnames(df) = sample(letters[1:20],300,T)
  system.time(dfmed-lapply(unique(colnames(df)), function(x)
 + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
    user  system elapsed
   5.680   0.952   7.171

 and those times are in seconds! The time consuming part was building the
 data.frame not the calculation.

 The only thing I noticed is that my R process claims some 1.4 GB of memory
 but that should not be a problem on any recent hardware but my guess at
 answering your question would be that this might be your problem,
 especially if you have other memory-hogging variables like this data frame
 lying around and you see severe memory swapping effects

 Benno

 Hello Everybody,

 The code:

 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

 takes really long time to execute ( in hours). Is there a faster way to do
 this?

 Thanks!

 On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:

 Thanks Henrik! Here is the one-liner that I wrote:


 dfmed-lapply(unique(colnames(df)), function(x)

 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))


 Thanks again!



 On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson h...@biostat.ucsf.edu
 wrote:


 See rowMedians() of the matrixStats package for replacing apply(x,

 MARGIN=1, FUN=median). /Henrik


 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:

 Hi,


 I have a 250,000 by 300 matrix. I am trying to calculate the median of

 those columns (by row) with column names that are identical. I would

 like

 this to be efficient since apply(x,1,median) where x is created by

 choosing

 only those columns with same column name and looping on this is taking a

 really long time. Is there an efficient way to do this?


 Thanks!


       [[alternative HTML version deleted]]


 __

 R-help@r-project.org mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help

 PLEASE do read the posting guide

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.





 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Benno Pütz
 Statistical Genetics
 MPI of Psychiatry
 Kraepelinstr. 2-10
 80804 Munich, Germany
 T: ++49-(0)89-306 22 222
 F: ++49-(0)89-306 22 601





        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread peter dalgaard


On May 23, 2012, at 19:30 , Preeti wrote:

 Hmm.. that is interesting... I did this on our server machine which has
 about 200 cores. So memory is not an issue. Also, building the dataframe
 takes about a few minutes maximum for me. My code is similar to yours but
 for the fact that I create my dataframe from read.delim(filename) and
 then I drop the first column because it has characters. I don't know why it
 takes long on my machine.

Are you sure that you actually have any columns with the same name then? You 
need read.delim(.., check.names=FALSE), otherwise you just get an expensive 
identity operation. 

Also, you should probably try running Benno's exact code, just for comparison. 
Some of those multicore machine are really rather slow if you only use one core 
for your process. 

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread Preeti

On Wed, May 23, 2012 at 11:54 AM, peter dalgaard pda...@gmail.com wrote:


 On May 23, 2012, at 19:30 , Preeti wrote:

  Hmm.. that is interesting... I did this on our server machine which has
  about 200 cores. So memory is not an issue. Also, building the dataframe
  takes about a few minutes maximum for me. My code is similar to yours but
  for the fact that I create my dataframe from read.delim(filename) and
  then I drop the first column because it has characters. I don't know why
 it
  takes long on my machine.

 Are you sure that you actually have any columns with the same name then?

Yes, That I am sure and yes that's how I read it.

 You need read.delim(.., check.names=FALSE), otherwise you just get an
 expensive identity operation.

 Also, you should probably try running Benno's exact code, just for
 comparison. Some of those multicore machine are really rather slow if you
 only use one core for your process.

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com










[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-23 Thread Bert Gunter

Yes, thanks Henrik. I neglected to mention that rowMedians could just
be plugged in instead of apply (..,1,...)

However, my main point is that that's probably not what matters,as
Benno points out. Maybe it's the data frames instead of the matrices,
but  The process should execute in a few seconds even
inefficiently (my code). So there's something fishy here.

--Bert

On Wed, May 23, 2012 at 10:39 AM, Henrik Bengtsson h...@biostat.ucsf.edu 
wrote:
 Just adding a few cents to this:

 rowMedians(x) is roughly 4-10 times faster than apply(x, MARGIN=1,
 FUN=median) - at least on my local Windows 7 64bit tests.  You can do
 these simple benchmark runs yourself via the
 matrixStats/tests/rowMedians.R system test, cf. http://goo.gl/YCJed
 [R-forge].

 /Henrik

 On Wed, May 23, 2012 at 10:30 AM, Preeti pre...@sci.utah.edu wrote:
 Hmm.. that is interesting... I did this on our server machine which has
 about 200 cores. So memory is not an issue. Also, building the dataframe
 takes about a few minutes maximum for me. My code is similar to yours but
 for the fact that I create my dataframe from read.delim(filename) and
 then I drop the first column because it has characters. I don't know why it
 takes long on my machine.

 On Wed, May 23, 2012 at 11:26 AM, Benno Pütz pu...@mpipsykl.mpg.de wrote:

 I wonder how you do this (or maybe on what kind of machine you execute it).

 I tried it out of curiosity and get

  df = as.data.frame(lapply(1:300,function(x)sample(200,25,T)))
  colnames(df) = sample(letters[1:20],300,T)
  system.time(dfmed-lapply(unique(colnames(df)), function(x)
 + rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE)))
    user  system elapsed
   5.680   0.952   7.171

 and those times are in seconds! The time consuming part was building the
 data.frame not the calculation.

 The only thing I noticed is that my R process claims some 1.4 GB of memory
 but that should not be a problem on any recent hardware but my guess at
 answering your question would be that this might be your problem,
 especially if you have other memory-hogging variables like this data frame
 lying around and you see severe memory swapping effects

 Benno

 Hello Everybody,

 The code:

 dfmed-lapply(unique(colnames(df)), function(x)
 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

 takes really long time to execute ( in hours). Is there a faster way to do
 this?

 Thanks!

 On Tue, May 22, 2012 at 3:46 PM, Preeti pre...@sci.utah.edu wrote:

 Thanks Henrik! Here is the one-liner that I wrote:


 dfmed-lapply(unique(colnames(df)), function(x)

 rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))


 Thanks again!



 On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson h...@biostat.ucsf.edu
 wrote:


 See rowMedians() of the matrixStats package for replacing apply(x,

 MARGIN=1, FUN=median). /Henrik


 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:

 Hi,


 I have a 250,000 by 300 matrix. I am trying to calculate the median of

 those columns (by row) with column names that are identical. I would

 like

 this to be efficient since apply(x,1,median) where x is created by

 choosing

 only those columns with same column name and looping on this is taking a

 really long time. Is there an efficient way to do this?


 Thanks!


       [[alternative HTML version deleted]]


 __

 R-help@r-project.org mailing list

 https://stat.ethz.ch/mailman/listinfo/r-help

 PLEASE do read the posting guide

 http://www.R-project.org/posting-guide.html

 and provide commented, minimal, self-contained, reproducible code.





 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 Benno Pütz
 Statistical Genetics
 MPI of Psychiatry
 Kraepelinstr. 2-10
 80804 Munich, Germany
 T: ++49-(0)89-306 22 222
 F: ++49-(0)89-306 22 601





        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list

Re: [R] Median computation

2012-05-22 Thread Petr Savicky

On Tue, May 22, 2012 at 01:34:45PM -0600, Preeti wrote:
 Hi,
 
 I have a 250,000 by 300 matrix. I am trying to calculate the median of
 those columns (by row) with column names that are identical. I would like
 this to be efficient since apply(x,1,median) where x is created by choosing
 only those columns with same column name and looping on this is taking a
 really long time. Is there an efficient way to do this?

Hi.

Can you send a simple example of what you want to compute?

The 300 medians of the 300 columns, each of length 250'000,
may be computed using apply(x,2,median) and this does not
take much time. What do you mean by choosing only those
columns with same column name?

Petr Savicky.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-22 Thread Henrik Bengtsson

See rowMedians() of the matrixStats package for replacing apply(x,
MARGIN=1, FUN=median). /Henrik

On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:
 Hi,

 I have a 250,000 by 300 matrix. I am trying to calculate the median of
 those columns (by row) with column names that are identical. I would like
 this to be efficient since apply(x,1,median) where x is created by choosing
 only those columns with same column name and looping on this is taking a
 really long time. Is there an efficient way to do this?

 Thanks!

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

2012-05-22 Thread Preeti

Thanks Henrik! Here is the one-liner that I wrote:

dfmed-lapply(unique(colnames(df)), function(x)
rowMedians(as.matrix(df[,colnames(df) == x]),na.rm=TRUE))

Thanks again!

On Tue, May 22, 2012 at 3:23 PM, Henrik Bengtsson h...@biostat.ucsf.eduwrote:

 See rowMedians() of the matrixStats package for replacing apply(x,
 MARGIN=1, FUN=median). /Henrik

 On Tue, May 22, 2012 at 12:34 PM, Preeti pre...@sci.utah.edu wrote:
  Hi,
 
  I have a 250,000 by 300 matrix. I am trying to calculate the median of
  those columns (by row) with column names that are identical. I would like
  this to be efficient since apply(x,1,median) where x is created by
 choosing
  only those columns with same column name and looping on this is taking a
  really long time. Is there an efficient way to do this?
 
  Thanks!
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

Re: [R] Median computation

11 matches

Site Navigation

Mail list logo

Footer information