Re: [R] Sampling

2015-03-29 Thread Daniel Nordlund

On 3/29/2015 11:10 PM, Partha Sinha wrote:

I have 1000 data points.  i want to take 30 samples and find mean. I
also want to repeat this process 100 times. How to go about it?
Regards
Parth

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



see ?replicate and ?sample.  Simple example where yourdata is a simple 
vector of values, and assuming you want to sample without replacement. 
Generalizing it to other data structures is left as an exercise for the 
reader.


replicate(100,mean(sample(yourdata,30, replace=FALSE)))

hope this is helpful,

Dan

--
Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating phi using function()

2015-03-29 Thread Daniel Nordlund


The argument 'K' is missing since you are only passing four arguments to 
the phi() function, but you defined it with five formal parameters. It 
looks like the argument 'j' is not necessary in the function.  It is an 
unnecessary carry-over from the summation notation and it is never used 
in the function.


Dan


On 3/29/2015 4:08 PM, Jim Lemon wrote:

Hi T.,
Your translation of the formula looks okay, and the error message is about
a missing argument. Perhaps you have not included the necessary arguments
to "phi" in the call to "mls".

Jim


On Sun, Mar 29, 2015 at 11:59 PM, T.Riedle  wrote:


Hi everybody,
I am trying to generate the formula shown in the attachment. My formula so
far looks as follows:

phi <- function(w1, w2, j, k, K){
zaehler <- (k/K)^(w1-1)*(1-k/K)^(w2-1)
nenner <- sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
return( zaehler/nenner )
}

Unfortunately something must be wrong here as I get the following message
when running a midas regression

m22.phi<- midas_r(rv~mls(rvh,1:max.lag+h1,1,phi), start = list(rvh=c(1,1)))
Error in phi(c(1, 1), 44L, 1) : argument "K" is missing, with no default
Called from: .rs.breakOnError(TRUE)
Browse[1]> K<-125
Browse[1]> 125

Could anybody look into my phi formula and tell me what is wrong with it?

Thanks in advance.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Daniel Nordlund
Bothell, WA USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sampling

2015-03-29 Thread Partha Sinha
I have 1000 data points.  i want to take 30 samples and find mean. I
also want to repeat this process 100 times. How to go about it?
Regards
Parth

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A problem someone should know about

2015-03-29 Thread Richard M. Heiberger
This looks like a specific Macintosh error that appears at random intervals.
I get it at random, and unreproducible times.  I reported it (or
perhaps a close relative)
to the r-sig-mac list in September 2014.

Rich


On Sun, Mar 29, 2015 at 9:59 PM, Rolf Turner  wrote:
> On 30/03/15 11:52, Ian Lester wrote:
>>
>> I’m a novice and this message looks like it shouldn’t be ignored. Someone
>> who knows what they’re doing should probably take a look.
>> Thanks
>> Ian Lester
>>
>>> logfat.lm<-(lm(body.fat~log(BMI)))
>>> plot(logfat)
>>
>> Error in plot(logfat) : object 'logfat' not found
>>>
>>> plot(logfat.lm)
>>
>> Hit  to see next plot:
>> Hit  to see next plot:
>> Hit  to see next plot:
>> Mar 29 18:10:18 iansimac.gateway rsession[69550] : Error: this
>> application, or a library it uses, has passed an invalid numeric
>> value (NaN, or not-a-number) to CoreGraphics API. This is a serious
>> error and contributes to an overall degradation of system stability
>> and reliability. This notice is a courtesy: please fix this problem.
>> It will become a fatal error in an upcoming update.
>
>
> Please make your examples *reproducible* as the posting guide requests.
>
> I *presume* that your data are the "fat" data from the "UsingR" package,
> which you did not mention.
>
> After installing and loading "UsingR" I did
>
>> logfat.lm <- lm(body.fat~log(BMI),data=fat)
>> plot(logfat.lm)
>
> and got a sequence of plots, with no error thrown. It would appear that
> whatever is causing the error that you saw is peculiar to your system.
>
> cheers,
>
> Rolf Turner
>
> --
> Rolf Turner
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> Home phone: +64-9-480-4619
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] A problem someone should know about

2015-03-29 Thread Rolf Turner

On 30/03/15 11:52, Ian Lester wrote:

I’m a novice and this message looks like it shouldn’t be ignored. Someone who 
knows what they’re doing should probably take a look.
Thanks
Ian Lester


logfat.lm<-(lm(body.fat~log(BMI)))
plot(logfat)

Error in plot(logfat) : object 'logfat' not found

plot(logfat.lm)

Hit  to see next plot:
Hit  to see next plot:
Hit  to see next plot:
Mar 29 18:10:18 iansimac.gateway rsession[69550] : Error: this
application, or a library it uses, has passed an invalid numeric
value (NaN, or not-a-number) to CoreGraphics API. This is a serious
error and contributes to an overall degradation of system stability
and reliability. This notice is a courtesy: please fix this problem.
It will become a fatal error in an upcoming update.


Please make your examples *reproducible* as the posting guide requests.

I *presume* that your data are the "fat" data from the "UsingR" package,
which you did not mention.

After installing and loading "UsingR" I did

> logfat.lm <- lm(body.fat~log(BMI),data=fat)
> plot(logfat.lm)

and got a sequence of plots, with no error thrown. It would appear that 
whatever is causing the error that you saw is peculiar to your system.


cheers,

Rolf Turner

--
Rolf Turner
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread John Maindonald
There are two issues:
1) Damage was done to accuracy in the calculation of t(X) %*% X
Where the mean to SD ratio is large, maybe even of the order of
100 or so, centre that column.

2) pseudo inverse() goes awry with columns of X that are of the
order of large negative (or, I expect, +ve) powers of ten.
were supplied to it. I�d guess that this has to do with the way that
a check for singularity is implemented.

Ben�s example:

set.seed(101)
n_obs <- 1000
y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(rep(1,n_obs),x1,x2)

coef(lm(y~x1+x2))
##   (Intercept)x1x2
##  1.155959e+01 -1.658420e+14  3.797342e-02

library(corpcor)
t(cbind(coef(lm(y~x1+x2)),
   as.vector(pseudoinverse(t(X) %*% X) %*% t(X) %*% y)))
##  (Intercept)x1 x2
## [1,]   11.559587 -1.658420e+14 0.03797342
## [2,] 9.511348  1.151908e-13  0.03788714

## Examine mean to SD ratios
round(c(mean(x1)/sd(x1), mean(x2)/sd(x2)), 2)
## [1] 263.65   3.28
## Notice that the coefficients for x2, where the mean/sd ratio
## is smaller, roughly agree.

## Damage was done to accuracy in the calculation of t(X) %*% X
## (but there is more to it than that, as we will see).

## Try
xc1 <- scale(x1,center=TRUE, scale=FALSE)
xc2 <- scale(x2,center=TRUE, scale=FALSE)
Xc <- cbind(rep(1,n_obs),x1,x2)
as.vector(pseudoinverse(t(Xc) %*% Xc) %*% t(Xc) %*% y)
## [1] 9.511348e+00 1.151908e-13 3.788714e-02

## Note now, however, that one should be able to dispense with
## the column of 1s, with no change to the coefficients of x1 & x2
Xc0 <- cbind(xc1,xc2)
as.vector(pseudoinverse(t(Xc0) %*% Xc0) %*% t(Xc0) %*% y)
## [1] 1.971167e-20 3.788714e-02

##
## Now try a more sensible scaling for xc1
Xcs <- cbind(rep(1,n_obs), xc1*1e14, xc2)
Xcs0 <- cbind(xc1*1e14, xc2)

t(cbind(coef(lm(y~I(1e14*xc1)+xc2)),
as.vector(pseudoinverse(t(Xcs) %*% Xcs) %*% t(Xcs) %*% y)))
##  (Intercept) I(1e+14 * xc1)xc2
## [1,]9.899249   -1.65842 0.03797342
## [2,]9.899249   -1.65842 0.03797342

## Eureka!

## cf also
as.vector(pseudoinverse(t(Xcs0) %*% Xcs0) %*% t(Xcs0) %*% y)
## [1] -1.65842037  0.03797342


John Maindonald email: 
john.maindon...@anu.edu.au

Centre for Mathematics & Its Applications,

Australian National University, Canberra ACT 0200.


On 29/03/2015, at 23:00, 
mailto:r-help-requ...@r-project.org>> 
mailto:r-help-requ...@r-project.org>> wrote:

From: Ben Bolker mailto:bbol...@gmail.com>>
Subject: Re: [R] Error in lm() with very small (close to zero) regressor
Date: 29 March 2015 6:28:22 NZDT
To: mailto:r-h...@stat.math.ethz.ch>>


peter dalgaard  gmail.com> writes:



On 28 Mar 2015, at 00:32 , RiGui  
business.uzh.ch> wrote:

Hello everybody,

I have encountered the following problem with lm():

When running lm() with a regressor close to zero -
of the order e-10, the
value of the estimate is of huge absolute value , of order millions.

However, if I write the formula of the OLS estimator,
in matrix notation:
pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the
estimate has value 0.

 How do you know this answer is "correct"?

here is the code:

y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, 0.01235657,0.45)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)

Eh?  You want
X  <- cbind(rep(1,n_obs),x1,x2)

bFE <- lm(y ~ x1 + x2)
bFE

bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
bOLS


Note: I am applying a deviation from the
mean projection to the data, that
is why I have some regressors with such small values.

Thank you for any help!

Raluca

 Is there a reason you can't scale your regressors?


Example not reproducible:


 I agree that the OP's question was not reproducible, but it's
not too hard to make it reproducible. I bothered to use
library("sos"); findFn("pseudoinverse") to find pseudoinverse()
in corpcor:

It is true that we get estimates with very large magnitudes,
but their

set.seed(101)
n_obs <- 1000
y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)

bFE <- lm(y ~ x1 + x2)
bFE
coef(summary(bFE))

Estimate   Std. Error t value  Pr(>|t|)
(Intercept)  1.155959e+01 2.312956e+01  0.49977541 0.6173435
x1  -1.658420e+14 1.872598e+15 -0.08856254 0.9294474
x2   3.797342e-02 2.813000e-02  1.34992593 0.1773461

library("corpcor")
bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
bOLS

[,1]
[1,] 9.807664e-16
[2,] 8.880273e-01

And if we scale the predictor:

bFE2 <- lm(y ~ I(1e14*x1) + x2)
coef(summary(bFE2))

Estimate Std. Error t value  Pr(>|t|)
(Intercept)   11.55958731   23.12956  0.49977541 0.6173435
I(1e+14 * x1) -1.65842037   18.72598 -0.08856254 0.9294474
x2 0.037973420.02813  1.34992593 0.1773461

bOLS stays constant

Re: [R] Vennerable Plots for Publications

2015-03-29 Thread Dario Strbenac
That is an adequate solution. It's always better if R package authors don't 
hard-code graphics parameters, though.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A problem someone should know about

2015-03-29 Thread Ian Lester
I’m a novice and this message looks like it shouldn’t be ignored. Someone who 
knows what they’re doing should probably take a look.
Thanks
Ian Lester

>logfat.lm<-(lm(body.fat~log(BMI)))
> plot(logfat)
Error in plot(logfat) : object 'logfat' not found
> plot(logfat.lm)
Hit  to see next plot: 
Hit  to see next plot: 
Hit  to see next plot: 
Mar 29 18:10:18 iansimac.gateway rsession[69550] : Error: this 
application, or a library it uses, has passed an invalid numeric value (NaN, or 
not-a-number) to CoreGraphics API. This is a serious error and contributes to 
an overall degradation of system stability and reliability. This notice is a 
courtesy: please fix this problem. It will become a fatal error in an upcoming 
update.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Use Rstudio and RAnalyticFlow together

2015-03-29 Thread Céline b
Hi,

I would like to know if someone has any idea to use RAnalyticFlow and Rstudio 
together, i.e., use the nice flowchart system of RAnalyticFlow within the great 
interface of Rstudio ? I searched on the internet but without success ...
This would simplify/clarify so much my codes ! :)
Thank for any help.
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating phi using function()

2015-03-29 Thread Jim Lemon
Hi T.,
Your translation of the formula looks okay, and the error message is about
a missing argument. Perhaps you have not included the necessary arguments
to "phi" in the call to "mls".

Jim


On Sun, Mar 29, 2015 at 11:59 PM, T.Riedle  wrote:

> Hi everybody,
> I am trying to generate the formula shown in the attachment. My formula so
> far looks as follows:
>
> phi <- function(w1, w2, j, k, K){
> zaehler <- (k/K)^(w1-1)*(1-k/K)^(w2-1)
> nenner <- sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
> return( zaehler/nenner )
> }
>
> Unfortunately something must be wrong here as I get the following message
> when running a midas regression
>
> m22.phi<- midas_r(rv~mls(rvh,1:max.lag+h1,1,phi), start = list(rvh=c(1,1)))
> Error in phi(c(1, 1), 44L, 1) : argument "K" is missing, with no default
> Called from: .rs.breakOnError(TRUE)
> Browse[1]> K<-125
> Browse[1]> 125
>
> Could anybody look into my phi formula and tell me what is wrong with it?
>
> Thanks in advance.
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help w/ variable names in loop with lmer

2015-03-29 Thread Ben Bolker
David Crow  cide.edu> writes:

> 
> Hi, R users-
> 
> I'm estimating random effects models with cross-level interactions; I want
> to interact each of a vector of level-1 variables with each of a vector of
> level-2 variables.  Here's the code:
> 
> 
> #create data frame with level-1 variables
> k <- as.data.frame(cbind(robo, asalto, secuestro, asesinato))
> 
> #create data frame with level-2 variables
> l <- as.data.frame(cbind(IDH_IDH, IDH_ingpc, eco_pb, IM_indice, tasa_robo,
> hom_tasa, totdelitos1, totdelitos2, total, pri, pan, prd))
> 
> #get cross-level interactions
> 
> for (i in 1:length(k)) {
> for (j in 1:length(l)) {
> print(summary(lmer(hrprotcrim ~ k[,i]*l[,j] + (k[,i] | Municipio
> }
> }
> ==
> 
> The code works and produces 48 (4 level-1 x 12 level-2) sets of output.
> The problem is, the output is illegible because instead of the variable
> names, I get the indices:
> 
> [output]
> ==
> Linear mixed model fit by REML ['lmerMod']
> Formula: hrprotcrim ~ k[, i] * l[, j] + (k[, i] | Municipio)
> 

[snip]

> Two questions:
> 
> 1)  How can I get variable names instead of indices in the above output
> 2)  How can I estimate this with "mapply" instead of the double loop?
> 
> Here's the code for "mapply"
> 
> M4 <- mapply(function(k,l){summary(lmer(hrprotcrim ~ k*l + (k |
> Municipio)))})
> 
> And here's what I get:
> 
> list()
> 

I would suggest appropriate use of ?reformulate

Assume dd is a data frame containing all variables

lev1vars <- c("robo", "asalto", "secuestro", "asesinato")
lev2vars <- c("IDH_IDH", "IDH_ingpc", ...)
ffun <- function(L1var,L2var) {
ff <- reformulate(paste(L1var,L2var,sep="*"),
  paste(L1var,"Municipio",sep="|"),
  response="hrprotcrim")
environment(ff) <- parent.frame()  ## trickiness
return(summary(lmer(ff,data=dd)))
}
mapply(ffun,lev1vars,lev2vars)

?

If you had given a reproducible example I could have tested this ...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help w/ variable names in loop with lmer

2015-03-29 Thread David Crow
Hi, R users-

I'm estimating random effects models with cross-level interactions; I want
to interact each of a vector of level-1 variables with each of a vector of
level-2 variables.  Here's the code:


#create data frame with level-1 variables
k <- as.data.frame(cbind(robo, asalto, secuestro, asesinato))

#create data frame with level-2 variables
l <- as.data.frame(cbind(IDH_IDH, IDH_ingpc, eco_pb, IM_indice, tasa_robo,
hom_tasa, totdelitos1, totdelitos2, total, pri, pan, prd))

#get cross-level interactions

for (i in 1:length(k)) {
for (j in 1:length(l)) {
print(summary(lmer(hrprotcrim ~ k[,i]*l[,j] + (k[,i] | Municipio
}
}
==

The code works and produces 48 (4 level-1 x 12 level-2) sets of output.
The problem is, the output is illegible because instead of the variable
names, I get the indices:

[output]
==
Linear mixed model fit by REML ['lmerMod']
Formula: hrprotcrim ~ k[, i] * l[, j] + (k[, i] | Municipio)

REML criterion at convergence: 8801.4

Scaled residuals:
Min  1Q  Median  3Q Max
-2.4447 -0.7017 -0.2639  0.6766  3.0835

Random effects:
 GroupsNameVariance Std.Dev. Corr
 Municipio (Intercept) 1.067868 1.0334
   k[, i]  0.005387 0.0734   1.00
 Residual  2.976150 1.7252
Number of obs: 2163, groups:  Municipio, 180

Fixed effects:
   Estimate Std. Error t value
(Intercept)2.710847   0.101715  26.651
k[, i]-0.056720   0.355802  -0.159
l[, j] 0.002701   0.002289   1.180
k[, i]:l[, j]  0.006510   0.006340   1.027

Correlation of Fixed Effects:
(Intr) k[, i] l[, j]
k[, i]  -0.048
l[, j]  -0.514  0.028
k[,i]:l[,j]  0.034 -0.566 -0.072
==

Two questions:

1)  How can I get variable names instead of indices in the above output
2)  How can I estimate this with "mapply" instead of the double loop?

Here's the code for "mapply"

M4 <- mapply(function(k,l){summary(lmer(hrprotcrim ~ k*l + (k |
Municipio)))})

And here's what I get:

list()

I'd be grateful for any pointers.

Best,
David



-- 
Personal Web site:
http://investigadores.cide.edu/crow/

Web site for M�xico, las Am�ricas y el Mundo:
http://mexicoyelmundo.cide.edu/


David Crow, Ph.D.
Profesor-Investigador/Assistant Professor
Director General, *Las Am�ricas y el Mundo*
Divisi�n de Estudios Internacionales
Carretera M�xico-Toluca 3655
Col. Lomas de Santa Fe 01210  M�xico, D.F.
Tel.:  5727-9800, ext. 2152
Fax:  5727-9872


Conmutador: 5727-98-00 Lada sin costo: 01 800 021 2433 (CIDE) |�

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread RiGui

RiGui  business.uzh.ch> writes:

>

[snip]
 
> I am terribly sorry for the code not being reproducible, is the
> first time I am posting here, I run the code several times before I
> posted, but...I forgot about the library used.

  Thanks for updating.
 
> To answer to your questions:
> 
>> How do you know this answer is "correct"? 
 
> What I am doing is actually a "fixed effect" estimation. I apply a
> projection matrix to the data, both dependent and independent
> variables, projection which renders the regressors that do not vary,
> equal to basically zero - the x1 from the post.
 
> Once I apply the projection, I need to run OLS to get the estimates,
> so x1 should be zero.

  Yes, but not *exactly* zero.
 
> Therefore, the results with the scaled regressor is not correct. 
 
> Besides, I do not see why the bOLS is wrong, since is the formula of
> the OLS estimator from any Econometrics book.

" Because it's numerically unstable.
 Unfortunately, you can't always translate formulas directly from
books into code and expect them to be reliable.

  Based on Peter's comments, I believe that as expected lm()
is actually getting closer to the 'correct' answer.
"


Thank you to both of you for your comments! I will re-do the analysis
following your advice.

Best,

Raluca





--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705231.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread Ben Bolker
RiGui  business.uzh.ch> writes:

>

[snip]
 
> I am terribly sorry for the code not being reproducible, is the
> first time I am posting here, I run the code several times before I
> posted, but...I forgot about the library used.

  Thanks for updating.
 
> To answer to your questions:
> 
>> How do you know this answer is "correct"? 
 
> What I am doing is actually a "fixed effect" estimation. I apply a
> projection matrix to the data, both dependent and independent
> variables, projection which renders the regressors that do not vary,
> equal to basically zero - the x1 from the post.
 
> Once I apply the projection, I need to run OLS to get the estimates,
> so x1 should be zero.

  Yes, but not *exactly* zero.
 
> Therefore, the results with the scaled regressor is not correct. 
 
> Besides, I do not see why the bOLS is wrong, since is the formula of
> the OLS estimator from any Econometrics book.

 Because it's numerically unstable.
 Unfortunately, you can't always translate formulas directly from
books into code and expect them to be reliable.

  Based on Peter's comments, I believe that as expected lm()
is actually getting closer to the 'correct' answer.

> Here again the corrected code: 

> library(corpcor)
> 
> n_obs <- 1000
> y  <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, 0.01235657,0.45)
> x2 <- rnorm(n_obs, 10,3.21)
> X  <- cbind(x1,x2)

As Peter points out, one example uses an intercept and the
other doesn't: you should either use X <- cbind(1,x1,x2) or
lm(y~x1+x2-1) for compatibility.

>  bFE <- lm(y ~ x1 + x2)
>  bFE
> 
>  bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
>  bOLS

[snip: Gmane doesn't like it if I quote "too much"]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread peter dalgaard

> On 28 Mar 2015, at 18:52 , RiGui  wrote:
> 
> Thank you for your replies! 
> 
> I am terribly sorry for the code not being reproducible, is the first time I
> am posting here, I run the code several times before I posted, but...I
> forgot about the library used.
> 
> To answer to your questions:
> 
> How do you know this answer is "correct"? 
> 
> What I am doing is actually a "fixed effect" estimation. I apply a
> projection matrix to the data, both dependent and independent variables,
> projection which renders the regressors that do not vary, equal to basically
> zero - the x1 from the post. 
> 
> Once I apply the projection, I need to run OLS to get the estimates, so x1
> should be zero. 

Please rethink: If a regressor is very small, the regression coefficient will 
be very large; if it is small and random, OLS estimators will be highly 
variable. 

R has no way of knowing that a regressor with small values isn't what the user 
intended (e.g. it could be picoMolar concentrations stated in Molar units); if 
you want a mechanism that eliminates near-zero regressors you need to do it 
explicitly. 

> Therefore, the results with the scaled regressor is not correct. 
> 
> Besides, I do not see why the bOLS is wrong, since is the formula of the OLS
> estimator from any Econometrics book.

Textbooks often gloss over details like numerical stability (and in general, 
textbooks often use slightly oversimplified methods in order not to confuse 
students unnecessarily). 
Better books will give the (X'X)^-1 X'Y formula with a warning not to use it as 
is, but e.g. use the X=QR decomposition [which gives (R'Q'QR)^-1 R'Q'Y = 
(R'R)^-1 R'Q'Y = R^-1 Q'Y].


> Here again the corrected code: 
> 
> install.packages("corpcor")
> library(corpcor)
> 
> n_obs <- 1000
> y  <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, 0.01235657,0.45)
> x2 <- rnorm(n_obs, 10,3.21)
> X  <- cbind(x1,x2)
> 
> bFE <- lm(y ~ x1 + x2)
> bFE
> 
> bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
> bOLS
> 

Notice again, that these are not comparable in that bFE has an intercept term 
and bOLS hasn't. You need to compare with

y ~ x1 + x2 - 1

and 

y ~ x2 - 1


> 
> Best,
> 
> Raluca Gui 
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705212.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] generating phi using function()

2015-03-29 Thread T.Riedle
Hi everybody,
I am trying to generate the formula shown in the attachment. My formula so far 
looks as follows:

phi <- function(w1, w2, j, k, K){
zaehler <- (k/K)^(w1-1)*(1-k/K)^(w2-1)
nenner <- sum( ((1:K)/K)^(w1-1)*(1-(1:K)/K)^(w2-1))
return( zaehler/nenner )
}

Unfortunately something must be wrong here as I get the following message when 
running a midas regression

m22.phi<- midas_r(rv~mls(rvh,1:max.lag+h1,1,phi), start = list(rvh=c(1,1)))
Error in phi(c(1, 1), 44L, 1) : argument "K" is missing, with no default
Called from: .rs.breakOnError(TRUE)
Browse[1]> K<-125
Browse[1]> 125

Could anybody look into my phi formula and tell me what is wrong with it?

Thanks in advance.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread RiGui
Thank you for your replies! 

I am terribly sorry for the code not being reproducible, is the first time I
am posting here, I run the code several times before I posted, but...I
forgot about the library used.

To answer to your questions:

How do you know this answer is "correct"? 

What I am doing is actually a "fixed effect" estimation. I apply a
projection matrix to the data, both dependent and independent variables,
projection which renders the regressors that do not vary, equal to basically
zero - the x1 from the post. 

Once I apply the projection, I need to run OLS to get the estimates, so x1
should be zero. 

Therefore, the results with the scaled regressor is not correct. 

Besides, I do not see why the bOLS is wrong, since is the formula of the OLS
estimator from any Econometrics book.

Here again the corrected code: 

install.packages("corpcor")
library(corpcor)

n_obs <- 1000
y  <- rnorm(n_obs, 10,2.89)
x1 <- rnorm(n_obs, 0.01235657,0.45)
x2 <- rnorm(n_obs, 10,3.21)
X  <- cbind(x1,x2)

 bFE <- lm(y ~ x1 + x2)
 bFE

 bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
 bOLS


Best,

Raluca Gui 




--
View this message in context: 
http://r.789695.n4.nabble.com/Error-in-lm-with-very-small-close-to-zero-regressor-tp4705185p4705212.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple break in univariate series

2015-03-29 Thread Temel İspanyolca
DR. UWE LIGGES

I have sent turkish real Gdp data (1998-2013) in annex.
Turkey has lived two crises in this period, in 2001 and 2008. You can see
in data.
My problem is to indicate these dates any statistic test as a structural
change or point change.

Sincerely
Engin

2015-03-28 23:04 GMT+01:00 Uwe Ligges :

>  Forwarded Message 
>>> Subject: [R] multiple break in univariate series
>>> Date: Sat, 28 Mar 2015 00:41:35 +0100
>>> From: Temel İspanyolca 
>>> To: R-help@r-project.org
>>>
>>> Hello
>>> Any one knows multiple break test for univariate series ?
>>>
>>
>
> Which kind of breaks? shift?
>
> You may want to look for CUSUM or MOSUM tests or even permutation tests.
>
> Packages: strucchange, changepoint
>
> http://cran.r-project.org/web/packages/changepoint/index.html
>
> http://cran.r-project.org/web/packages/strucchange/index.html
>
> Best,
> Uwe Ligges
>
>
>
>
>  --
>>> *Thanks*
>>> Engin YILMAZ
>>>
>>>  [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/
>>> posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>>


-- 
*Saygılarımla*
Engin YILMAZ
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Categorizing by month

2015-03-29 Thread lychang
Thanks for that Ben, I appreciate it a lot. 

I have another question if you don't mind taking the time to answer it. I
want to create a subset from dataset called "datafile5" with only the
variables "Date," "PM2.5 mean concentration," "SITE_LONGITUDE," and
"SITE_LATITUDE," dating only in September 2009. 

Do you know how I can do this? 



--
View this message in context: 
http://r.789695.n4.nabble.com/Categorizing-by-month-tp4705173p4705208.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] multiple break in univariate series

2015-03-29 Thread Achim Zeileis

On Sat, 28 Mar 2015, Bert Gunter wrote:


Once you have looked at the data and chosen change points to test
based on the data, the tests for change points are invalid (unless you
make appropriate adjustments for post hoc tests).

And no, I am not making this up. Consult any competent statistician.


Yes, you must not select a breakpoint by eyeballing a time series and then 
conduct a structural break test for this given breakpoint as if it were 
exogenously given. This will be far too liberal.


But most practitioners would think it is still ok to conduct a structural 
break test with _unknown_ breakpoint. Ideally, the hypothesis to be tested 
and the significance level were formulated prior to the collection of the 
data, though.


To add to Uwe's comments: maxstat_test() in package "coin" would be a 
non-parametric permutation test. sctest() in package "strucchange" 
provides a wide range of tests for general parametric models, especially 
linear regressions.


Best,
Z


Cheers,
Bert





Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Sat, Mar 28, 2015 at 5:52 PM, Uwe Ligges
 wrote:



On 29.03.2015 00:09, Temel ?spanyolca wrote:



DR. UWE LIGGES


I have sent turkish real Gdp data (1998-2013) in annex.
Turkey has lived two crises in this period, in 2001 and 2008. You can
see in data.
My problem is to indicate these dates any statistic test as a structural
change or point change.



Have you tried CUSUM? Or some permutations test? You do not have muh data


Best,
Uwe Ligges





Sincerely
Engin

2015-03-28 23:04 GMT+01:00 Uwe Ligges mailto:lig...@statistik.tu-dortmund.de>>:

 Forwarded Message 
Subject: [R] multiple break in univariate series
Date: Sat, 28 Mar 2015 00:41:35 +0100
From: Temel ?spanyolca mailto:ispanyol...@gmail.com>>
To: R-help@r-project.org 

Hello
Any one knows multiple break test for univariate series ?



Which kind of breaks? shift?

You may want to look for CUSUM or MOSUM tests or even permutation
tests.

Packages: strucchange, changepoint

http://cran.r-project.org/web/__packages/changepoint/index.__html


http://cran.r-project.org/web/__packages/strucchange/index.__html


Best,
Uwe Ligges




--
*Thanks*
Engin YILMAZ

  [[alternative HTML version deleted]]


R-help@r-project.org  mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/__listinfo/r-help

PLEASE do read the posting guide
http://www.R-project.org/__posting-guide.html

and provide commented, minimal, self-contained, reproducible
code.





--
*Sayg?lar?mla*
Engin YILMAZ



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error in lm() with very small (close to zero) regressor

2015-03-29 Thread peter dalgaard

> On 28 Mar 2015, at 18:28 , Ben Bolker  wrote:
> 
> peter dalgaard  gmail.com> writes:
> 
>> 
>> 
>>> On 28 Mar 2015, at 00:32 , RiGui  business.uzh.ch> wrote:
>>> 
>>> Hello everybody,
>>> 
>>> I have encountered the following problem with lm():
>>> 
>>> When running lm() with a regressor close to zero - 
>> of the order e-10, the
>>> value of the estimate is of huge absolute value , of order millions. 
>>> 
>>> However, if I write the formula of the OLS estimator, 
>> in matrix notation:
>>> pseudoinverse(t(X)*X) * t(X) * y , the results are correct, meaning the
>>> estimate has value 0.
> 
>  How do you know this answer is "correct"?
> 
>>> here is the code:
>>> 
>>> y  <- rnorm(n_obs, 10,2.89)
>>> x1 <- rnorm(n_obs, 0.01235657,0.45)
>>> x2 <- rnorm(n_obs, 10,3.21)
>>> X  <- cbind(x1,x2)
>>> 
>>> bFE <- lm(y ~ x1 + x2)
>>> bFE
>>> 
>>> bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
>>> bOLS
>>> 
>>> 
>>> Note: I am applying a deviation from the 
>> mean projection to the data, that
>>> is why I have some regressors with such small values.
>>> 
>>> Thank you for any help!
>>> 
>>> Raluca
> 
>  Is there a reason you can't scale your regressors?
> 
>>> 
>> Example not reproducible:
>> 
> 
>  I agree that the OP's question was not reproducible, but it's
> not too hard to make it reproducible. I bothered to use
> library("sos"); findFn("pseudoinverse") to find pseudoinverse()
> in corpcor:

Well, it shouldn't be my work, nor yours... And I thought it particularly 
egregious to treat a function from an unspecified package as gospel.

> 
> It is true that we get estimates with very large magnitudes,
> but their 
> 
> set.seed(101)
> n_obs <- 1000
> y  <- rnorm(n_obs, 10,2.89)
> x1 <- rnorm(n_obs, mean=1.235657e-14,sd=4.5e-17)
> x2 <- rnorm(n_obs, 10,3.21)
> X  <- cbind(x1,x2)
> 
> bFE <- lm(y ~ x1 + x2)
> bFE
> coef(summary(bFE))
> 
> Estimate   Std. Error t value  Pr(>|t|)
> (Intercept)  1.155959e+01 2.312956e+01  0.49977541 0.6173435
> x1  -1.658420e+14 1.872598e+15 -0.08856254 0.9294474
> x2   3.797342e-02 2.813000e-02  1.34992593 0.1773461
> 
> library("corpcor")
> bOLS <- pseudoinverse(t(X) %*% X) %*% t(X) %*% y
> bOLS
> 
> [,1]
> [1,] 9.807664e-16
> [2,] 8.880273e-01
> 
> And if we scale the predictor:
> 
> bFE2 <- lm(y ~ I(1e14*x1) + x2)
> coef(summary(bFE2))
> 
> Estimate Std. Error t value  Pr(>|t|)
> (Intercept)   11.55958731   23.12956  0.49977541 0.6173435
> I(1e+14 * x1) -1.65842037   18.72598 -0.08856254 0.9294474
> x2 0.037973420.02813  1.34992593 0.1773461
> 
> bOLS stays constant.
> 
> To be honest, I haven't thought about this enough to see
> which answer is actually correct, although I suspect the
> problem is in bOLS, since the numerical methods (unlike
> the brute-force pseudoinverse method given here) behind
> lm have been carefully considered for numerical stability.

In particular, the pseudoinverse() function has a tol= argument which allows it 
to zap small singular values. 

> pseudoinverse(crossprod(X))%*%crossprod(X,y)
 [,1]
[1,] 9.807664e-16
[2,] 8.880273e-01
> pseudoinverse(crossprod(X),tol=1e-40)%*%crossprod(X,y)
  [,1]
[1,]  1.286421e+15
[2,] -5.327384e-01

Also, notice that there is no intercept in the above, so it would be more 
reasonable to compare to 

> bFE <- lm(y ~ x1 + x2-1)
> summary(bFE)

Call:
lm(formula = y ~ x1 + x2 - 1)

Residuals:
Min  1Q  Median  3Q Max 
-9.1712 -1.9439 -0.0321  1.7637  9.4540 

Coefficients:
Estimate Std. Error t value Pr(>|t|)
x1 7.700e+14  2.435e+13   31.62   <2e-16 ***
x2 3.766e-02  2.811e-021.340.181
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.771 on 998 degrees of freedom
Multiple R-squared:  0.9275,Adjusted R-squared:  0.9273 
F-statistic:  6382 on 2 and 998 DF,  p-value: < 2.2e-16

Or, maybe better to use cbind(1,X) in the pseudoinverse() method. It doesn't 
help, though.

What really surprises me is this

> X  <- cbind(x1,x2)
> crossprod(X)
 x1   x2
x1 1.526698e-25 1.265085e-10
x2 1.265085e-10 1.145462e+05
> solve(crossprod(X))
Error in solve.default(crossprod(X)) : 
  system is computationally singular: reciprocal condition number = 1.13052e-31
> solve(crossprod(X), tol=1e-40)
  x1x2
x1  7.722232e+25 -8.528686e+10
x2 -8.528686e+10  1.029237e-04
> pseudoinverse(crossprod(X), tol=1e-40)
  [,1]  [,2]
[1,]  6.222152e+25 -6.217178e+10
[2,] -6.871948e+10  7.739466e-05

How does pseudoinverse() go so badly wrong? Notice that apart from the scaling, 
this really isn't very ill-conditioned, but a lot of accuracy seems to be lost 
in its SVD step. Notice that

> X  <- cbind(1e14*x1,x2)
> solve(crossprod(X))
x2
0.0077222325 -0.0008528686
x2 -0.0008528686  0.0001029237
> pseudoinverse(crossprod(X))
  [,1]