Re: [R] efficiently replacing values in a matrix

2008-04-16 Thread Rolf Turner

On 17/04/2008, at 7:52 AM, Matthew Keller wrote:

> Hello all,
>
> I should probably know this by now... Anyway:
>
> I have a large matrix (dim(data) is 3000  18000). In each element are
> one of the following character strings "0/0", "1/1", "1/2", "2/2". I
> wanted to replace "0/0" with NA and the other three with 0,1,2
> respectively. To accomplish just the first of these four steps I did
> this:
>
> data[data=="0/0"] <- NA
>
> Which is still running after 13 hours. I have 18 GB RAM and running 64
> bit R. What is a more efficient way to accomplish this (I've already
> done it using sed in UNIX - but want to know how to do so in R)?
> Thanks in advance.

Well I just did

gorp <- c("0/0","1/1","1/2","2/2")
mung <- matrix(sample(gorp,54e6,TRUE),3000,18000)
mung[mung=="0/0"] <- NA

and the whole schmear ran in under half a minute of real time.

 > sessionInfo()
R version 2.6.2 (2008-02-08)
i386-apple-darwin8.10.1

locale:
C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] misc_0.0-2

loaded via a namespace (and not attached):
[1] rcompgen_0.1-17

I would say that something is seriously snarled up in your system.

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-16 Thread Charles C. Berry
On Thu, 17 Apr 2008, Rolf Turner wrote:

>
> On 17/04/2008, at 7:52 AM, Matthew Keller wrote:
>
>> Hello all,
>>
>> I should probably know this by now... Anyway:
>>
>> I have a large matrix (dim(data) is 3000  18000). In each element are
>> one of the following character strings "0/0", "1/1", "1/2", "2/2". I
>> wanted to replace "0/0" with NA and the other three with 0,1,2
>> respectively. To accomplish just the first of these four steps I did
>> this:
>>
>> data[data=="0/0"] <- NA
>>
>> Which is still running after 13 hours. I have 18 GB RAM and running 64
>> bit R. What is a more efficient way to accomplish this (I've already
>> done it using sed in UNIX - but want to know how to do so in R)?
>> Thanks in advance.
>
> Well I just did
>
>   gorp <- c("0/0","1/1","1/2","2/2")
>   mung <- matrix(sample(gorp,54e6,TRUE),3000,18000)
>   mung[mung=="0/0"] <- NA
>
> and the whole schmear ran in under half a minute of real time.

Likewise.

I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll 
not be surprised if the columns are factors. In which case

mung2 <- as.data.frame(lapply( mung,
function(x) {
levels(x)[ levels(x)=='0/0' ] <- NA
x } ))

will be faster, but still not as fast as what you show with a matrix.

HTH,

Chuck

>
> > sessionInfo()
> R version 2.6.2 (2008-02-08)
> i386-apple-darwin8.10.1
>
> locale:
> C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] misc_0.0-2
>
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17
>
> I would say that something is seriously snarled up in your system.
>
>   cheers,
>
>   Rolf Turner
>
> ##
> Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry(858) 534-2098
 Dept of Family/Preventive Medicine
E mailto:[EMAIL PROTECTED]  UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-16 Thread Rolf Turner

On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:



> I'll lay odds that Matthew's 'matrix' is actually a data.frame, and  
> I'll not be surprised if the columns are factors.



I suspect that you're right.

***Why*** can't people distinguish between data frames and matrices?
If they were the same  thing, there wouldn't be two
different terms for them, would there?

cheers,

Rolf Turner

##
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-16 Thread Matthew Keller
Yes Chuck, you're right.

Thanks for the help. It was a data.frame not a matrix (I had called
as.matrix() in my script much earlier but that line of code didn't run
because I misnamed the object!). My bad. Thanks for the help. And I'm
VERY relieved R isn't that inefficient...

Matt


On Wed, Apr 16, 2008 at 3:39 PM, Rolf Turner <[EMAIL PROTECTED]> wrote:
>
>  On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
>
> 
>
>
>
> > I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll
> not be surprised if the columns are factors.
> >
>
> 
>
>  I suspect that you're right.
>
>  ***Why*** can't people distinguish between data frames and matrices?
>  If they were the same  thing, there wouldn't be two
>  different terms for them, would there?
>
> cheers,
>
> Rolf Turner
>
>  ##
>  Attention:This e-mail message is privileged and confidential. If you are
> not theintended recipient please delete the message and notify the
> sender.Any views or opinions presented are solely those of the author.
>
>
>
>  This e-mail has been scanned and cleared by
> MailMarshalwww.marshalsoftware.com
>  ##
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-17 Thread Jim Lemon
Rolf Turner wrote:
> On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
> 
>   
> 
>>I'll lay odds that Matthew's 'matrix' is actually a data.frame, and  
>>I'll not be surprised if the columns are factors.
> 
> 
>   
> 
> I suspect that you're right.
> 
> ***Why*** can't people distinguish between data frames and matrices?
> If they were the same  thing, there wouldn't be two
> different terms for them, would there?
> 
Why Rolf, haven't you ever heard of Whorf's hypothesis? Because R users 
are surrounded by rectangular wads of data, it's surprising that we 
don't have hundreds of words for the same thing.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-17 Thread Joerg van den Hoff
On Wed, Apr 16, 2008 at 03:56:26PM -0600, Matthew Keller wrote:
> Yes Chuck, you're right.
> 

just a comment:

> Thanks for the help. It was a data.frame not a matrix (I had called
> as.matrix() in my script much earlier but that line of code didn't run
> because I misnamed the object!). My bad. Thanks for the help. And I'm
> VERY relieved R isn't that inefficient...

well,  it _is_ at least when using data frames. and while it
is obvious that operations on lists (data frames  are  lists
in   disguise,   actually,   right?)   are  slower  than  on
arrays/matrices, I'm not happy with a performance drop by  a
factor of about seemlingy >  1500 (30 sec vs. > 13 h) -- and
I have seen similar things even with rather small data sets,
where  the  difference  of using data frame vs. matrix might
mean, e.g. overall run times of 10 sec. vs. 0.1 sec. 

where  is  all  this  time  burned?  there  _are_ functional
languages which operate efficiently on lists.

I  think  these  extreme  performance  drop  when  using  an
apparently innocent data structure is really bad.  and  it's
bad,  that  it's not repeatedly stated in BIG LETTERS in the
manuals: use matrices, at least for  big  arrays,  whereever
possible.  this  message  is  not  at  all tranferred by the
"description" in data.frame manpage, e.g.:

"This   function   creates   data  frames,  tightly  coupled
collections of variables which share many of the  properties
of  matrices  and  of  lists,  used  as the fundamental data
structure by most of R's modeling software."...

probably 90% (+ x) of all R users are simply that: users and
not experts. when I started using R I exclusively used  data
frames  for purely numerical data instead of matrices simply
because I could get column n with x[n] instead of x[,n]  and
mean(x)  worked  columnwise  (whereas apply(x, 2, 'mean') is
tiresome) thus saving some typing. this is no strong  reason
in  retrospect but probably quite common. and many then will
stick with data.frames and endure long runtimes for now good
reason at all.

another  question  would  be whether homogeneous data frames
could not internally be handled as matrices...

joerg

> 
> Matt
> 
> 
> On Wed, Apr 16, 2008 at 3:39 PM, Rolf Turner <[EMAIL PROTECTED]> wrote:
> >
> >  On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
> >
> > 
> >
> >
> >
> > > I'll lay odds that Matthew's 'matrix' is actually a data.frame, and I'll
> > not be surprised if the columns are factors.
> > >
> >
> > 
> >
> >  I suspect that you're right.
> >
> >  ***Why*** can't people distinguish between data frames and matrices?
> >  If they were the same  thing, there wouldn't be two
> >  different terms for them, would there?
> >
> > cheers,
> >
> > Rolf Turner
> >
> >  ##
> >  Attention:This e-mail message is privileged and confidential. If you are
> > not theintended recipient please delete the message and notify the
> > sender.Any views or opinions presented are solely those of the author.
> >
> >
> >
> >  This e-mail has been scanned and cleared by
> > MailMarshalwww.marshalsoftware.com
> >  ##
> >
> 
> 
> 
> -- 
> Matthew C Keller
> Asst. Professor of Psychology
> University of Colorado at Boulder
> www.matthewckeller.com
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-18 Thread Nnamdi

Still it is pretty slow when entering values into a large matrix. Case in
point:

> a <- matrix(nrow=1,ncol=1)
> system.time(a[1,1] <- 1903908.80385)
   user  system elapsed
 30.840   6.226  41.416
> is.matrix(a)
[1] TRUE

Is there a better way to enter values into large matrices? If I have to
spend 41 secs each time I enter into a cell and I have 1x1 cells to
enter that is impractical! 

--Nnamdi

Rolf Turner-3 wrote:
> 
> 
> On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
> 
>   
> 
>> I'll lay odds that Matthew's 'matrix' is actually a data.frame, and  
>> I'll not be surprised if the columns are factors.
> 
>   
> 
> I suspect that you're right.
> 
> ***Why*** can't people distinguish between data frames and matrices?
> If they were the same  thing, there wouldn't be two
> different terms for them, would there?
> 
>   cheers,
> 
>   Rolf Turner
> 
> ##
> Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/efficiently-replacing-values-in-a-matrix-tp16732795p16763578.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] efficiently replacing values in a matrix

2008-04-18 Thread Matthew Keller
Nanmdi,

I think this is simply because a lot of time is taken transforming the
matrix from logical (default when you create it) to numeric (when you
add the number to [1,1]. If you do the same thing again to [1,2], it
is done instantaneously:
> a <- matrix(nrow=1,ncol=1)
> system.time(a[1,1] <- 1903908.80385)
   user  system elapsed
 10.000   0.781  10.755
>
> system.time(a[1,2] <- 1903908.80385)
   user  system elapsed
  0   0   0

By the way Rolf - I didn't see your full response last time through. I
do know the difference between a matrix and a data.frame, thank you
very much.


On Fri, Apr 18, 2008 at 11:11 AM, Nnamdi <[EMAIL PROTECTED]> wrote:
>
>  Still it is pretty slow when entering values into a large matrix. Case in
>  point:
>
>  > a <- matrix(nrow=1,ncol=1)
>  > system.time(a[1,1] <- 1903908.80385)
>user  system elapsed
>   30.840   6.226  41.416
>  > is.matrix(a)
>  [1] TRUE
>
>  Is there a better way to enter values into large matrices? If I have to
>  spend 41 secs each time I enter into a cell and I have 1x1 cells to
>  enter that is impractical!
>
>  --Nnamdi
>
>
>  Rolf Turner-3 wrote:
>  >
>  >
>  > On 17/04/2008, at 9:33 AM, Charles C. Berry wrote:
>  >
>  >   
>  >
>  >> I'll lay odds that Matthew's 'matrix' is actually a data.frame, and
>  >> I'll not be surprised if the columns are factors.
>  >
>  >   
>  >
>  > I suspect that you're right.
>  >
>  > ***Why*** can't people distinguish between data frames and matrices?
>  > If they were the same  thing, there wouldn't be two
>  > different terms for them, would there?
>  >
>  >   cheers,
>  >
>  >   Rolf Turner
>  >
>  > ##
>
> > Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>  >
>  > __
>  > R-help@r-project.org mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide
>  > http://www.R-project.org/posting-guide.html
>  > and provide commented, minimal, self-contained, reproducible code.
>  >
>  >
>
>  --
>  View this message in context: 
> http://www.nabble.com/efficiently-replacing-values-in-a-matrix-tp16732795p16763578.html
>  Sent from the R help mailing list archive at Nabble.com.
>
>
>
>  __
>  R-help@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-help
>  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>  and provide commented, minimal, self-contained, reproducible code.
>



-- 
Matthew C Keller
Asst. Professor of Psychology
University of Colorado at Boulder
www.matthewckeller.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.