Re: [R] plm issues: error for "within" or "random", but not for "pooling"

Liviu Andronic Wed, 24 Feb 2010 03:15:12 -0800

Dear Giovanni
Thank you for the quick reply and sorry  for not being able to respond
in kind: since our last e-mail we decided to change the way we measure
the variables, and this took some time. I managed to track down the
original issue, I think, to an improperly specified subset vector to
the "data=df[ , ]" argument. I guess this would count as a user error.


Working with plm I encountered some other potential issues:
- [, "var"] subsetting: on my data the following works fine
> summary(ibes.kld.df.p[ , ]$ibes1.delta1y.diff)
total sum of squares : 2472.4
      id     time
0.289638 0.032026

but the below takes 100% CPU for about a minute, and then fails.
> summary(ibes.kld.df.p[ , "ibes1.delta1y.diff"])
Error in substring(blanks, 1, pad) : invalid substring argument(s)

I am not sure what characteristics of my data causes this (perhaps
many NAs?), but I cannot reproduce a dummy example based on EmplUK:
> data("EmplUK", package = "plm")
> E <- pdata.frame(EmplUK, index = c("firm", "year"), drop.index = 
> TRUE,row.names = TRUE)
> summary(E$emp)
total sum of squares : 261540
       id      time
0.9807654 0.0091085
> summary(E[, "emp"])  ##in the dummy, both ways of subsetting work fine
total sum of squares : 261540
       id      time
0.9807654 0.0091085


- p.value of coef t test == p.value of regression F test (for pooling
and within, but not for random):
> x.pool <- try(plm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p, 
> model="pooling"))
> summary(x.pool); x.ibes.diff1; x.kld.diff1
Oneway (individual) effect Pooling Model

Call:
plm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p,
    model = "pooling")

Unbalanced Panel: n=2336, T=1-15, N=9330

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
-5.4500 -0.1500  0.0799  0.2100  4.0500

Coefficients :
                 Estimate Std. Error t-value Pr(>|t|)
(Intercept)       -0.1199     0.0056   -21.4   <2e-16 ***
get(x.kld.diff1)   0.0297     0.0165     1.8    0.071 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    2720
Residual Sum of Squares: 2720
F-statistic: 3.25802 on 1 and 9328 DF, p-value: 0.0711
[1] "ibes2.delta12y.diff"
[1] "kld.delta1y_prod.diff"
> x.fe <- try(plm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p, 
> model="within"))
> summary(x.fe); x.ibes.diff1; x.kld.diff1
Oneway (individual) effect Within Model

Call:
plm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p,
    model = "within")

Unbalanced Panel: n=2336, T=1-15, N=9330

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
-4.1000 -0.1200  0.0121  0.1600  4.1300

Coefficients :
                 Estimate Std. Error t-value Pr(>|t|)
get(x.kld.diff1)   0.0324     0.0166    1.95    0.051 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    1790
Residual Sum of Squares: 1780
F-statistic: 3.80843 on 1 and 6993 DF, p-value: 0.051
[1] "ibes2.delta12y.diff"
[1] "kld.delta1y_prod.diff"


I suppose that this is OK, since for the pooling case I can confirm it
with the simple lm(), but I am not sure that I understand why this
happens?
> x.simp <- try(lm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p))
> summary(x.simp); x.ibes.diff1; x.kld.diff1

Call:
lm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p)

Residuals:
    Min      1Q  Median      3Q     Max
-5.4501 -0.1501  0.0799  0.2099  4.0499

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
(Intercept)       -0.1199     0.0056   -21.4   <2e-16 ***
get(x.kld.diff1)   0.0297     0.0165     1.8    0.071 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.54 on 9328 degrees of freedom
  (3966 observations deleted due to missingness)
Multiple R-squared: 0.000349,   Adjusted R-squared: 0.000242
F-statistic: 3.26 on 1 and 9328 DF,  p-value: 0.0711

[1] "ibes2.delta12y.diff"
[1] "kld.delta1y_prod.diff"


For random, the two are different:
> x.re <- try(plm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p, 
> model="random"))
> summary(x.re); x.ibes.diff1; x.kld.diff1
Oneway (individual) effect Random Effect Model
   (Swamy-Arora's transformation)

Call:
plm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p,
    model = "random")

Unbalanced Panel: n=2336, T=1-15, N=9330

Effects:
                var std.dev share
idiosyncratic 0.255   0.505  0.88
individual    0.036   0.190  0.12
theta  :
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.0639  0.1620  0.2340  0.2640  0.4060  0.4340

Residuals :
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-5.24000 -0.14300  0.06630 -0.00171  0.19700  3.79000

Coefficients :
                 Estimate Std. Error t-value Pr(>|t|)
(Intercept)      -0.11510    0.00708  -16.26   <2e-16 ***
get(x.kld.diff1)  0.02935    0.01592    1.84    0.065 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    2420
Residual Sum of Squares: 2420
F-statistic: -0.417224 on 1 and 9328 DF, p-value: 1
[1] "ibes2.delta12y.diff"
[1] "kld.delta1y_prod.diff"


- no R-squared in summary() output: I was a bit surprised to see no
R-squared reported by summary(). Although it is not present in my
plm() regressions nor in the vignette, it is in the output included in
the AER book.


- pFtest() generates an NA p-value. Any ideas on what would cause this?
> pFtest(x.pool, x.fe)

        F test for individual effects

data:  get(x.ibes.diff1) ~ get(x.kld.diff1)
F = 1.3694, df1 = -2335, df2 = 9328, p-value = NA
alternative hypothesis: significant effects

Warning message:
In pf(q, df1, df2, lower.tail, log.p) : NaNs produced


Thank you
Liviu


On 2/4/10, Millo Giovanni <giovanni_mi...@generali.com> wrote:
> Dear Liviu,
>
>  it's difficult to tell without seeing the data. I might guess that you have 
> some completely empty groups about which Tapply complains when doing the 
> time-demeaning, but it would be just a guess.
>
>  I realize you can't share the data in the present form, but may I suggest 
> you try and subset your data in some random way, find a "problematic" subset 
> (one which gives the error) then change labels and everything so that the 
> data become unrecognizable, and send us that example? You can also randomly 
> transform them, as this is likely to be a missing values issue.
>
>  Giovanni
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plm issues: error for "within" or "random", but not for "pooling"

Reply via email to