Re: [R] [FORGED] Re: identical() versus sapply()

2016-04-11 Thread Paulson, Ariel
Hi Duncan,

That explains it, thanks!

I rarely use as(), but had thought in this case, replacing identical(x, y) with 
identical(x, as(y,class(x))) could be an sapply-friendly way to iron out class 
differences -- then noticed the inexplicable result.  But now I know about 
all.equal().

Thanks,
Ariel 


From: Duncan Murdoch 
Sent: Monday, April 11, 2016 8:09 PM
To: Paulson, Ariel; Jeff Newmiller; Bert Gunter
Cc: r-help@r-project.org
Subject: Re: [R] [FORGED] Re: identical() versus sapply()

On 11/04/2016 8:25 PM, Paulson, Ariel wrote:
> Hi Jeff,
>
>
> We are splitting hairs because R is splitting hairs, and causing us problems. 
>  Integer and numeric are different R classes with different properties, 
> mathematical relationships notwithstanding.  For instance, the 
> counterintuitive result:

The issue here is that R has grown.  The as() function is newer than the
as.numeric() function, it's part of the methods package.  It is a much
more complicated thing, and there are cases where they differ.

In this case, the problem is that is(1L, "numeric") evaluates to TRUE,
and nobody has written a coerce method that specifically converts
"integer" to "numeric".  So the as() function defaults to doing nothing.
It takes a while to do nothing, approximately 360 times longer than
as.numeric() takes to actually do the conversion:

 > microbenchmark(as.numeric(1L), as(1L, "numeric"))
Unit: nanoseconds
   expr   minlq  mean  median   uq max neval
 as.numeric(1L)   133   210516.92   273.5409.59444   100
  as(1L, "numeric") 51464 64501 119294.31 99768.5 138321.0 1313669   100

R performance is not always simple and easy to predict, but I think
anyone who had experience with R would never use as(x, "numeric").  So
this just isn't a problem worth fixing.

Now, you might object that the documentation claims they are equivalent,
but it certainly doesn't.  The documentation aims to be accurate, not
necessarily clear.

Duncan Murdoch

>
>> identical(as.integer(1), as.numeric(1))
> [1] FALSE
>
>
> Unfortunately the reply-to chain doesn't extend far enough -- here is the 
> original problem:
>
>
>> sapply(1, identical, 1)
> [1] TRUE
>
>> sapply(1:2, identical, 1)
> [1] FALSE FALSE
>
>> sapply(1:2, function(i) identical(as.numeric(i),1) )
> [1]  TRUE FALSE
>
>> sapply(1:2, function(i) identical(as(i,"numeric"),1) )
> [1] FALSE FALSE
>
> These are the results of R's hair-splitting!


>
> Ariel
>
> 
> From: Jeff Newmiller 
> Sent: Monday, April 11, 2016 6:49 PM
> To: Bert Gunter; Paulson, Ariel
> Cc: Rolf Turner; r-help@r-project.org
> Subject: Re: [R] [FORGED] Re: identical() versus sapply()
>
> Hypothesis regarding the thought process: integer is a perfect subset of 
> numeric, so why split hairs?
> --
> Sent from my phone. Please excuse my brevity.
>
> On April 11, 2016 12:36:56 PM PDT, Bert Gunter  wrote:
>
> Indeed!
>
> Slightly simplified to emphasize your point:
>
>   class(as(1:2,"numeric"))
> [1] "integer"
>
>   class(as.numeric(1:2))
> [1] "numeric"
>
> whereas in ?as it says:
>
> "Methods are pre-defined for coercing any object to one of the basic
> datatypes. For example, as(x, "numeric") uses the existing as.numeric
> function. "
>
> I suspect this is related to my ignorance of S4 classes (i.e. as() )
> and how they relate to S3 classes, but I certainly don't get it
> either.
>
> Cheers,
> Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things
> into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Apr 11, 2016 at 9:30 AM, Paulson, Ariel  wrote:
>   Ok, I see the difference between 1 and 1:2, I'll just leave it as one of 
> those "only in R" things.
>
>   But it seems then, that as.numeric() should guarantee a FALSE outcome, yet 
> it does not.
>
>   To build on what Rolf pointed out, I would really love for someone to 
> explain this one:
>
>   str(1)
>num 1
>
>   str(1:2)
>    int [1:2] 1 2
>
>   str(as.numeric(1:2))
>num [1:2] 1 2
>
>   str(as(1:2,"numeric"))
>int [1:2] 1 2
>
>   Which doubly makes no sense.  1) Either the class is "numeric" or it isn't; 
> I did not call as.integer() here.  2) method of recasting should not affect 
> final class.
>
>   Thanks,
>   Ariel
>
>
>   -Original Message-
>   From: Rolf Turner 

Re: [R] [FORGED] Re: identical() versus sapply()

2016-04-11 Thread Paulson, Ariel
​Perfect!


Thanks,

Ariel


From: William Dunlap 
Sent: Monday, April 11, 2016 7:37 PM
To: Paulson, Ariel
Cc: Jeff Newmiller; Bert Gunter; r-help@r-project.org
Subject: Re: [R] [FORGED] Re: identical() versus sapply()

Use all.equal instead of identical if you want to gloss over integer/numeric 
class differences and minor floating point differences (and a host of others).

Bill Dunlap
TIBCO Software
wdunlap tibco.com<http://tibco.com>

On Mon, Apr 11, 2016 at 5:25 PM, Paulson, Ariel 
mailto:a...@stowers.org>> wrote:
Hi Jeff,


We are splitting hairs because R is splitting hairs, and causing us problems.  
Integer and numeric are different R classes with different properties, 
mathematical relationships notwithstanding.  For instance, the counterintuitive 
result:


> identical(as.integer(1), as.numeric(1))
[1] FALSE


Unfortunately the reply-to chain doesn't extend far enough -- here is the 
original problem:


> sapply(1, identical, 1)
[1] TRUE

> sapply(1:2, identical, 1)
[1] FALSE FALSE

> sapply(1:2, function(i) identical(as.numeric(i),1) )
[1]  TRUE FALSE

> sapply(1:2, function(i) identical(as(i,"numeric"),1) )
[1] FALSE FALSE

These are the results of R's hair-splitting!

Ariel


From: Jeff Newmiller mailto:jdnew...@dcn.davis.ca.us>>
Sent: Monday, April 11, 2016 6:49 PM
To: Bert Gunter; Paulson, Ariel
Cc: Rolf Turner; r-help@r-project.org<mailto:r-help@r-project.org>
Subject: Re: [R] [FORGED] Re: identical() versus sapply()

Hypothesis regarding the thought process: integer is a perfect subset of 
numeric, so why split hairs?
--
Sent from my phone. Please excuse my brevity.

On April 11, 2016 12:36:56 PM PDT, Bert Gunter 
mailto:bgunter.4...@gmail.com>> wrote:

Indeed!

Slightly simplified to emphasize your point:

 class(as(1:2,"numeric"))
[1] "integer"

 class(as.numeric(1:2))
[1] "numeric"

whereas in ?as it says:

"Methods are pre-defined for coercing any object to one of the basic
datatypes. For example, as(x, "numeric") uses the existing as.numeric
function. "

I suspect this is related to my ignorance of S4 classes (i.e. as() )
and how they relate to S3 classes, but I certainly don't get it
either.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things
into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Apr 11, 2016 at 9:30 AM, Paulson, Ariel 
mailto:a...@stowers.org>> wrote:
 Ok, I see the difference between 1 and 1:2, I'll just leave it as one of those 
"only in R" things.

 But it seems then, that as.numeric() should guarantee a FALSE outcome, yet it 
does not.

 To build on what Rolf pointed out, I would really love for someone to explain 
this one:

 str(1)
  num 1

 str(1:2)
  int [1:2] 1 2

 str(as.numeric(1:2))
  num [1:2] 1 2

 str(as(1:2,"numeric"))
  int [1:2] 1 2

 Which doubly makes no sense.  1) Either the class is "numeric" or it isn't; I 
did not call as.integer() here.  2) method of recasting should not affect final 
class.

 Thanks,
 Ariel


 -Original Message-
 From: Rolf Turner 
[mailto:r.tur...@auckland.ac.nz<mailto:r.tur...@auckland.ac.nz>]
 Sent: Saturday, April 09, 2016 5:27 AM
 To: Jeff Newmiller
 Cc: Paulson, Ariel; 'r-help@r-project.org<mailto:r-help@r-project.org>'
 Subject: Re: [FORGED] Re: [R] identical() versus sapply()

 On 09/04/16 16:24, Jeff Newmiller wrote:
 I highly
recommend making friends with the str function. Try

 str( 1 )
 str( 1:2 )

 Interesting.  But to me counter-intuitive.  Since R makes no distinction 
between scalars and vectors of length 1 (or more accurately I think, since in R 
there is *no such thing as a scalar*, only a vector of length
 1) I don't see why "1" should be treated in a manner that is categorically 
different from the way in which "1:2" is treated.

 Can you, or someone else with deep insight into R and its rationale, explain 
the basis for this difference in treatment?

 for the clue you need, and then

 sapply( 1:2, identical, 1L )

 cheers,

 Rolf

 --
 Technical Editor ANZJS
 Department of Statistics
 University of Auckland
 Phone: +64-9-373-7599 ext. 88276



 R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PL

Re: [R] [FORGED] Re: identical() versus sapply()

2016-04-11 Thread Paulson, Ariel
Hi Jeff,


We are splitting hairs because R is splitting hairs, and causing us problems.  
Integer and numeric are different R classes with different properties, 
mathematical relationships notwithstanding.  For instance, the counterintuitive 
result:


> identical(as.integer(1), as.numeric(1))
[1] FALSE


Unfortunately the reply-to chain doesn't extend far enough -- here is the 
original problem:


> sapply(1, identical, 1)
[1] TRUE

> sapply(1:2, identical, 1)
[1] FALSE FALSE

> sapply(1:2, function(i) identical(as.numeric(i),1) )
[1]  TRUE FALSE

> sapply(1:2, function(i) identical(as(i,"numeric"),1) )
[1] FALSE FALSE

These are the results of R's hair-splitting!

Ariel


From: Jeff Newmiller 
Sent: Monday, April 11, 2016 6:49 PM
To: Bert Gunter; Paulson, Ariel
Cc: Rolf Turner; r-help@r-project.org
Subject: Re: [R] [FORGED] Re: identical() versus sapply()

Hypothesis regarding the thought process: integer is a perfect subset of 
numeric, so why split hairs?
--
Sent from my phone. Please excuse my brevity.

On April 11, 2016 12:36:56 PM PDT, Bert Gunter  wrote:

Indeed!

Slightly simplified to emphasize your point:

 class(as(1:2,"numeric"))
[1] "integer"

 class(as.numeric(1:2))
[1] "numeric"

whereas in ?as it says:

"Methods are pre-defined for coercing any object to one of the basic
datatypes. For example, as(x, "numeric") uses the existing as.numeric
function. "

I suspect this is related to my ignorance of S4 classes (i.e. as() )
and how they relate to S3 classes, but I certainly don't get it
either.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things
into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Apr 11, 2016 at 9:30 AM, Paulson, Ariel  wrote:
 Ok, I see the difference between 1 and 1:2, I'll just leave it as one of those 
"only in R" things.

 But it seems then, that as.numeric() should guarantee a FALSE outcome, yet it 
does not.

 To build on what Rolf pointed out, I would really love for someone to explain 
this one:

 str(1)
  num 1

 str(1:2)
  int [1:2] 1 2

 str(as.numeric(1:2))
  num [1:2] 1 2

 str(as(1:2,"numeric"))
  int [1:2] 1 2

 Which doubly makes no sense.  1) Either the class is "numeric" or it isn't; I 
did not call as.integer() here.  2) method of recasting should not affect final 
class.

 Thanks,
 Ariel


 -----Original Message-
 From: Rolf Turner [mailto:r.tur...@auckland.ac.nz]
 Sent: Saturday, April 09, 2016 5:27 AM
 To: Jeff Newmiller
 Cc: Paulson, Ariel; 'r-help@r-project.org'
 Subject: Re: [FORGED] Re: [R] identical() versus sapply()

 On 09/04/16 16:24, Jeff Newmiller wrote:
 I highly
recommend making friends with the str function. Try

 str( 1 )
 str( 1:2 )

 Interesting.  But to me counter-intuitive.  Since R makes no distinction 
between scalars and vectors of length 1 (or more accurately I think, since in R 
there is *no such thing as a scalar*, only a vector of length
 1) I don't see why "1" should be treated in a manner that is categorically 
different from the way in which "1:2" is treated.

 Can you, or someone else with deep insight into R and its rationale, explain 
the basis for this difference in treatment?

 for the clue you need, and then

 sapply( 1:2, identical, 1L )

 cheers,

 Rolf

 --
 Technical Editor ANZJS
 Department of Statistics
 University of Auckland
 Phone: +64-9-373-7599 ext. 88276



 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [FORGED] Re: identical() versus sapply()

2016-04-11 Thread Paulson, Ariel
Ok, I see the difference between 1 and 1:2, I'll just leave it as one of those 
"only in R" things.

But it seems then, that as.numeric() should guarantee a FALSE outcome, yet it 
does not. 

To build on what Rolf pointed out, I would really love for someone to explain 
this one:

> str(1)
 num 1

> str(1:2)
 int [1:2] 1 2

> str(as.numeric(1:2))
 num [1:2] 1 2

> str(as(1:2,"numeric"))
 int [1:2] 1 2

Which doubly makes no sense.  1) Either the class is "numeric" or it isn't; I 
did not call as.integer() here.  2) method of recasting should not affect final 
class.

Thanks,
Ariel


-Original Message-
From: Rolf Turner [mailto:r.tur...@auckland.ac.nz] 
Sent: Saturday, April 09, 2016 5:27 AM
To: Jeff Newmiller
Cc: Paulson, Ariel; 'r-help@r-project.org'
Subject: Re: [FORGED] Re: [R] identical() versus sapply()

On 09/04/16 16:24, Jeff Newmiller wrote:
> I highly recommend making friends with the str function. Try
>
> str( 1 )
> str( 1:2 )

Interesting.  But to me counter-intuitive.  Since R makes no distinction 
between scalars and vectors of length 1 (or more accurately I think, since in R 
there is *no such thing as a scalar*, only a vector of length
1) I don't see why "1" should be treated in a manner that is categorically 
different from the way in which "1:2" is treated.

Can you, or someone else with deep insight into R and its rationale, explain 
the basis for this difference in treatment?

> for the clue you need, and then
>
> sapply( 1:2, identical, 1L )

cheers,

Rolf

--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] identical() versus sapply()

2016-04-08 Thread Paulson, Ariel
Sorry if this has been answered elsewhere, but I can't find any discussion of 
it.

Wondering why the following situation occurs (duplicated on 3.2.2 CentOS6 and 
3.0.1 Win2k, so I don't think it is a bug):

> sapply(1, identical, 1)
[1] TRUE

> sapply(1:2, identical, 1)
[1] FALSE FALSE

> sapply(1:2, function(i) identical(as.numeric(i),1) )
[1]  TRUE FALSE

> sapply(1:2, function(i) identical(as(i,"numeric"),1) )
[1] FALSE FALSE

I have been unable to find anything different about the versions of "1" that 
identical() is not finding identical.

Thanks,
Ariel



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] efficient sine interpolation

2014-05-12 Thread Ortiz-Bobea, Ariel
Hello,

I'm trying to fit a sine curve over successive temperature readings (i.e. 
minimum and maximum temperature) over several days and for many locations. The 
code below shows a hypothetical example of 5000 locations with  7 days of 
temperature data. Not very efficient when you have many more locations and days.

The linear interpolation takes 0.7 seconds, and the sine interpolations take 2 
to 4 seconds depending on the approach.

Any ideas on how to speed this up? Thanks in advance.

Ariel

### R Code ##

# 1- Prepare data fake data
  days<- 7
  n <- 5000*days
  tmin <- matrix(rnorm(n, mean=0) , ncol=days, nrow=5000)
  tmax <- matrix(rnorm(n, mean=10), ncol=days, nrow=5000)
  m <- matrix(NA, ncol=days*2, nrow=5000)
  m[,seq(1,ncol(m),2)]  <- tmin
  m[,seq(2,ncol(m)+1,2)]<- tmax
  # check first row
  plot(1:ncol(m), m[1,], type="l")

# 2 -linear interpolation: 0.66 seconds
  xout <- seq(0,ncol(m),0.25/24*2)[-1] # time step = 0.25 hours or 15 minutes
  system.time( m1 <- t(apply(m,1, function(y) approx(x=1:ncol(m), y=y, 
xout=xout, method="linear")$y)) )
  # Check first row
  plot(1:ncol(m), m[1,], type="l")
  points(xout, m1[1,], col="red", cex=1)


# 3- sine interpolation
  sine.approx1 <- function(index, tmin, tmax) {
b <- (2*pi)/24  # period = 24 hours
c <- pi/2 # horizontal shift
xout <- seq(0,24,0.25)[-1]
yhat <- apply(cbind(tmin[index,],tmax[index,]), 1, function(z) diff(z)/2 * 
sin(b*xout-c) + mean(z))
#yhat <- yhat[-nrow(yhat),]
yhat <- c(yhat)
#plot(yhat, type="l")
  }
  sine.approx2 <- function(index, tmin, tmax) {
b <- (2*pi)/24  # period = 24 hours
c <- pi/2 # horizontal shift
xout1 <- seq(0 ,12,0.25)
xout2 <- seq(12,24,0.25)[-1]
xout2 <- xout2[-length(xout2)]
yhat1 <- apply(cbind(tmin[index,]   ,tmax[index,]), 
1, function(z) diff(z)/2 * sin(b*xout1-c) + mean(z))
yhat2 <- apply(cbind(tmax[index,][-length(tmax[index,])],tmin[index,][-1]), 
1, function(z) diff(z)/2 * sin(b*xout2+c) + mean(z))
yhat2 <- cbind(yhat2,NA)
yhat3 <- rbind(yhat1,yhat2)
#yhat3 <- yhat3[-nrow(yhat3),]
yhat3 <- c(yhat3)
yhat <- yhat3
#plot(c(yhat1))
#plot(c(yhat2))
#plot(yhat, type="l")
  }

  # Single sine: 2.23 seconds
  system.time( m2 <- t(sapply(1:nrow(m), function(i) sine.approx1(i, tmin=tmin, 
tmax=tmax))) )

  # Double sine: 4.03 seconds
  system.time( m3 <- t(sapply(1:nrow(m), function(i) sine.approx2(i, tmin=tmin, 
tmax=tmax))) )

  # take a look at approach 1
  plot(seq(-1,ncol(m)-1,1)[-1], m[1,], type="l")
  points(xout, m2[1,], col="red", cex=1)

  # take a look at approach 2
  plot(seq(-1,ncol(m)-1,1)[-1], m[1,], type="l")
  points(xout, m3[1,], col="blue", cex=1)


---
Ariel Ortiz-Bobea
Fellow
Resources for the Future
1616 P Street, N.W.
Washington, DC 20036
202-328-5173


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] speeding up applying hist() over rows of a matrix

2014-05-02 Thread Ortiz-Bobea, Ariel
This works great, thanks a lot!
-AOB

-Original Message-
From: William Dunlap [mailto:wdun...@tibco.com] 
Sent: Friday, May 02, 2014 12:31 PM
To: Ortiz-Bobea, Ariel
Cc: r-help@r-project.org
Subject: Re: [R] speeding up applying hist() over rows of a matrix

And since as.integer(cut(x,bins)) is essentially findInterval(x,bins) (since we 
throw away the labels made by cut()), I tried using findInterval instead of 
cut() and it cut the time by more than half, so your 5.0 s. is now c. 0.1 s.
f3 <- function (m, bins)
{
nbins <- length(bins) - 1L
m <- array(findInterval(m, bins), dim = dim(m))
t(apply(m, 1, tabulate, nbins = nbins)) }
> system.time(r3 <- f3(m1,bins))
   user  system elapsed
   0.090.000.09
> identical(r0,r3)
[1] TRUE

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, May 2, 2014 at 9:23 AM, William Dunlap  wrote:
> Your original code, as a function of 'm' and 'bins' is
> f0 <- function (m, bins) {
> t(apply(m, 1, function(x) hist(x, breaks = bins, plot = 
> FALSE)$counts)) } and the time it takes to run on your m1 is about 5 
> s. on my machine
>> system.time(r0 <- f0(m1,bins))
>user  system elapsed
>4.950.005.02
>
>
> hist(x,breaks=bins) is essentially tabulate(cut(x,bins),nbins=length(bins)-1).
> See how much it speeds things up by replacing hist() with tabulate(cut()):
> f1 <- function (m, bins)
> {
> nbins <- length(bins) - 1L
> t(apply(m, 1, function(x) tabulate(cut(x, bins), nbins = nbins))) 
> } That doesn't help with the time, but it does give the same output
>> system.time(r1 <- f1(m1,bins))
>user  system elapsed
>4.850.105.35
>> identical(r0, r1)
> [1] TRUE
>
> Now try speeding it up by calling cut() on the whole matrix first and 
> then applying tabulate to each row, as in
> f2 <- function (m, bins)  {
> nbins <- length(bins) - 1L
> m <- array(as.integer(cut(m, bins)), dim = dim(m))
> t(apply(m, 1, tabulate, nbins = nbins)) } That saves quite a bit 
> of time and gives the same output
>> system.time(r2 <- f2(m1,bins))
>user  system elapsed
>0.250.000.25
>> identical(r0, r2)
> [1] TRUE
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, May 1, 2014 at 12:48 PM, Ortiz-Bobea, Ariel  
> wrote:
>> Hello everyone,
>>
>>
>>
>> I'm trying to construct bins for each row in a matrix. I'm using apply() in 
>> combination with hist() to do this. Performing this binning for a 10K-by-50 
>> matrix takes about 5 seconds, but only 0.5 seconds for a 1K-by-500 matrix. 
>> This suggests the bottleneck is accessing rows in apply() rather than the 
>> calculations going on inside hist().
>>
>>
>>
>> My initial idea is to process as many columns (as make sense for the 
>> intended use) at once. However, I still have many many rows to process and I 
>> would appreciate any feedback on how to speed this up.
>>
>>
>>
>> Any thoughts?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Ariel
>>
>>
>>
>> Here is the illustration:
>>
>>
>>
>> # create data
>>
>> m1 <- matrix(10*rnorm(50*10^4), ncol=50)
>>
>> m2 <- matrix(10*rnorm(50*10^4), ncol=500)
>>
>>
>>
>> # compute bins
>>
>> bins <- seq(-100,100,1)
>>
>> system.time({ out1 <- t(apply(m1,1, function(x) hist(x,breaks=bins, 
>> plot=FALSE)$counts)) })
>>
>> system.time({ out2 <- t(apply(m2,1, function(x) hist(x,breaks=bins, 
>> plot=FALSE)$counts)) })
>>
>>
>> ---
>> Ariel Ortiz-Bobea
>> Fellow
>> Resources for the Future
>> 1616 P Street, N.W.
>> Washington, DC 20036
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] speeding up applying hist() over rows of a matrix

2014-05-01 Thread Ortiz-Bobea, Ariel
Hello everyone,



I'm trying to construct bins for each row in a matrix. I'm using apply() in 
combination with hist() to do this. Performing this binning for a 10K-by-50 
matrix takes about 5 seconds, but only 0.5 seconds for a 1K-by-500 matrix. This 
suggests the bottleneck is accessing rows in apply() rather than the 
calculations going on inside hist().



My initial idea is to process as many columns (as make sense for the intended 
use) at once. However, I still have many many rows to process and I would 
appreciate any feedback on how to speed this up.



Any thoughts?



Thanks,



Ariel



Here is the illustration:



# create data

m1 <- matrix(10*rnorm(50*10^4), ncol=50)

m2 <- matrix(10*rnorm(50*10^4), ncol=500)



# compute bins

bins <- seq(-100,100,1)

system.time({ out1 <- t(apply(m1,1, function(x) hist(x,breaks=bins, 
plot=FALSE)$counts)) })

system.time({ out2 <- t(apply(m2,1, function(x) hist(x,breaks=bins, 
plot=FALSE)$counts)) })


---
Ariel Ortiz-Bobea
Fellow
Resources for the Future
1616 P Street, N.W.
Washington, DC 20036

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weighted SUR/NSUR

2013-08-16 Thread Ariel
Arne Henningsen-3 wrote
>> Is it possible
>> to run SUR with weights using systemfit? I mean weighted seemingly
> unrelated
>> regression (weighted SUR)
> 
> Currently, systemfit cannot estimate (SUR) models with
> observation-specific
> weights :-(
> 
>> or weighted nonlinear unrelated regression (weighted NSUR).
> 
> We are still not yet finished with implementing nonlinear models in
> systemfit (see http://www.systemfit.org/) :-(

I recently had a student come to me with a very similar (okay, identical)
problem as the OP.  I had to learn PROC MODEL, anyway, so I thought I’d poke
around in R while I was at it.  I have nothing to add about any problems
with or the lack of maturity of the estimation procedure for nlsystemfit(),
but I do have some ideas about observation-level weights.

It took me awhile to make the leap from the fairly straightforward linear
weighted least squares (for example, see  Weisberg's Applied Linear
Regression textbook equation 5.8) to understanding how weighting worked in
nonlinear least squares.  The R help forum certainly came in handy:
https://stat.ethz.ch/pipermail/r-help/2004-November/060424.html.  I can add
weights into a nonlinear regression by simply multiplying both the response
and the nonlinear function by the square root of the desired weights. 
Here’s a toy example, where I compare a model fit using the “weights”
argument in nls() with a model where I put the weights in “by hand” :

DNase1 = subset(DNase, Run == 1)
fit2 = nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
 data = DNase1,
 start = list(Asym = 3, xmid = 0, scal = 1), weights =
rep(1:8, each = 2))
summary(fit2)

# Take the square root of the weights for fitting “by hand”
sw = sqrt(rep(1:8, each = 2) )
fit3 = nls(sw*density ~ sw*(Asym/(1 + exp((xmid - log(conc))/scal))),
DNase1,
 start = list(Asym = 3, xmid = 0, scal = 1) )
summary(fit3)

# The predicted values for fit3 need to be divided by the weights 
# but the residuals are weighted residuals
predict(fit2)
predict(fit3)/sw

It seems like this weighted approach could be easily extended to the model
formulas for a system of nonlinear equations (it would be similar for linear
equations) to be fit with systemfit.  

>  Parresol (2001) in his paper
> titled Additivity of nonlinear biomass
> equations has run weighted NSUR using PROC MODEL (SAS institute Inc.1993).
> I was wondering if r can do that.

It turned out I had to use this weighting approach in PROC MODEL, as well,
when each equation in the system had a different set of weights.  The
estimates I get when fitting the Parresol example mentioned by the OP using
nlsystemfit and PROC MODEL are within spitting distance of each other, so I
feel like I am at least making the same mistakes in both software packages.  

I'm wondering if my logic is sound or if I'm missing some complication that
occurs when working with systems of equations.  I’ve seen several folks
looking to fit weighted systems of equations in R with systemfit, and this
approach might get them what they need.

Ariel



--
View this message in context: 
http://r.789695.n4.nabble.com/Weighted-SUR-NSUR-tp4670602p4673973.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple comparisons with generalised least squares

2012-07-10 Thread Ariel

racmar wrote
> 
> I have also been searching various forums and books to see if there are
> any methods I could use and have only found people, such as yourself,
> asking the same question. 
> 

I was looking into this recently, as well, and found that the problem has to
do with building the model.matrix/terms/model.frame for gls objects when
using the glht function.  I ended up creating three gls-specific functions
and was able to get estimates/confidence intervals for the toy example
below.   You may find these functions useful (under the caveat that I only
checked that I got estimates and not that the estimates were correct ;) ).

Ariel

Example:
library(nlme)

Orthodont$fage <- factor(Orthodont$age)

#toy example with Orthodont data using age as a factor
fitgls <- gls( distance ~ fage, data=Orthodont, 
weights = varIdent(form =~1|fage))



library(multcomp)

#notice the error about the model.matrix for gls objects when using glht
confint( glht (fitgls, mcp(fage="Tukey") ))

#create model.frame, terms, and model.matrix functions specifically for gls
objects
model.matrix.gls <- function(object, ...) 
model.matrix(terms(object), data = getData(object), ...)


model.frame.gls <- function(object, ...) 
model.frame(formula(object), data = getData(object), ...)


terms.gls <- function(object, ...) 
terms(model.frame(object),...)


#now run glht again
confint( glht(  fitgls, mcp(fage="Tukey") ))




--
View this message in context: 
http://r.789695.n4.nabble.com/multiple-comparisons-with-generalised-least-squares-tp3441513p4636009.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] defining a function using strings

2008-10-16 Thread Chernomoretz Ariel
Hi All,
I need to evaluate a series expansion using Legendre polynomials.
Using the 'orthopolinom' package I can get a list of the first n
Legendre polynomials as character strings.

> library(orthopolynom)
> l<-legendre.polynomials(4)
> l
[[1]]
1

[[2]]
x

[[3]]
-0.5 + 1.5*x^2

[[4]]
-1.5*x + 2.5*x^3

[[5]]
0.375 - 3.75*x^2 + 4.375*x^4

But I can't figure out how to implement functions that could be
evaluated for arbitrary 'x', from this list,
Thanks for your help.

Ariel./


-- 
 Dr. Ariel Chernomoretz
   Departamento de Fisica, FCEyN,Universidad de Buenos Aires,
   (1428) Ciudad Universitaria,Ciudad de Buenos Aires, Argentina.
   TE +54 11 4576 3390 ext 817
   Fax +54 11 4576 3357
   email: [EMAIL PROTECTED]    Webpage: http://www.df.uba.ar/users/ariel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.