Dear Giovanni Thank you for the quick reply and sorry for not being able to respond in kind: since our last e-mail we decided to change the way we measure the variables, and this took some time. I managed to track down the original issue, I think, to an improperly specified subset vector to the "data=df[ , ]" argument. I guess this would count as a user error.
Working with plm I encountered some other potential issues: - [, "var"] subsetting: on my data the following works fine > summary(ibes.kld.df.p[ , ]$ibes1.delta1y.diff) total sum of squares : 2472.4 id time 0.289638 0.032026 but the below takes 100% CPU for about a minute, and then fails. > summary(ibes.kld.df.p[ , "ibes1.delta1y.diff"]) Error in substring(blanks, 1, pad) : invalid substring argument(s) I am not sure what characteristics of my data causes this (perhaps many NAs?), but I cannot reproduce a dummy example based on EmplUK: > data("EmplUK", package = "plm") > E <- pdata.frame(EmplUK, index = c("firm", "year"), drop.index = > TRUE,row.names = TRUE) > summary(E$emp) total sum of squares : 261540 id time 0.9807654 0.0091085 > summary(E[, "emp"]) ##in the dummy, both ways of subsetting work fine total sum of squares : 261540 id time 0.9807654 0.0091085 - p.value of coef t test == p.value of regression F test (for pooling and within, but not for random): > x.pool <- try(plm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p, > model="pooling")) > summary(x.pool); x.ibes.diff1; x.kld.diff1 Oneway (individual) effect Pooling Model Call: plm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p, model = "pooling") Unbalanced Panel: n=2336, T=1-15, N=9330 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -5.4500 -0.1500 0.0799 0.2100 4.0500 Coefficients : Estimate Std. Error t-value Pr(>|t|) (Intercept) -0.1199 0.0056 -21.4 <2e-16 *** get(x.kld.diff1) 0.0297 0.0165 1.8 0.071 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 2720 Residual Sum of Squares: 2720 F-statistic: 3.25802 on 1 and 9328 DF, p-value: 0.0711 [1] "ibes2.delta12y.diff" [1] "kld.delta1y_prod.diff" > x.fe <- try(plm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p, > model="within")) > summary(x.fe); x.ibes.diff1; x.kld.diff1 Oneway (individual) effect Within Model Call: plm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p, model = "within") Unbalanced Panel: n=2336, T=1-15, N=9330 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -4.1000 -0.1200 0.0121 0.1600 4.1300 Coefficients : Estimate Std. Error t-value Pr(>|t|) get(x.kld.diff1) 0.0324 0.0166 1.95 0.051 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 1790 Residual Sum of Squares: 1780 F-statistic: 3.80843 on 1 and 6993 DF, p-value: 0.051 [1] "ibes2.delta12y.diff" [1] "kld.delta1y_prod.diff" I suppose that this is OK, since for the pooling case I can confirm it with the simple lm(), but I am not sure that I understand why this happens? > x.simp <- try(lm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p)) > summary(x.simp); x.ibes.diff1; x.kld.diff1 Call: lm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p) Residuals: Min 1Q Median 3Q Max -5.4501 -0.1501 0.0799 0.2099 4.0499 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1199 0.0056 -21.4 <2e-16 *** get(x.kld.diff1) 0.0297 0.0165 1.8 0.071 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.54 on 9328 degrees of freedom (3966 observations deleted due to missingness) Multiple R-squared: 0.000349, Adjusted R-squared: 0.000242 F-statistic: 3.26 on 1 and 9328 DF, p-value: 0.0711 [1] "ibes2.delta12y.diff" [1] "kld.delta1y_prod.diff" For random, the two are different: > x.re <- try(plm(get(x.ibes.diff1) ~ get(x.kld.diff1), ibes.kld.df.p, > model="random")) > summary(x.re); x.ibes.diff1; x.kld.diff1 Oneway (individual) effect Random Effect Model (Swamy-Arora's transformation) Call: plm(formula = get(x.ibes.diff1) ~ get(x.kld.diff1), data = ibes.kld.df.p, model = "random") Unbalanced Panel: n=2336, T=1-15, N=9330 Effects: var std.dev share idiosyncratic 0.255 0.505 0.88 individual 0.036 0.190 0.12 theta : Min. 1st Qu. Median Mean 3rd Qu. Max. 0.0639 0.1620 0.2340 0.2640 0.4060 0.4340 Residuals : Min. 1st Qu. Median Mean 3rd Qu. Max. -5.24000 -0.14300 0.06630 -0.00171 0.19700 3.79000 Coefficients : Estimate Std. Error t-value Pr(>|t|) (Intercept) -0.11510 0.00708 -16.26 <2e-16 *** get(x.kld.diff1) 0.02935 0.01592 1.84 0.065 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 2420 Residual Sum of Squares: 2420 F-statistic: -0.417224 on 1 and 9328 DF, p-value: 1 [1] "ibes2.delta12y.diff" [1] "kld.delta1y_prod.diff" - no R-squared in summary() output: I was a bit surprised to see no R-squared reported by summary(). Although it is not present in my plm() regressions nor in the vignette, it is in the output included in the AER book. - pFtest() generates an NA p-value. Any ideas on what would cause this? > pFtest(x.pool, x.fe) F test for individual effects data: get(x.ibes.diff1) ~ get(x.kld.diff1) F = 1.3694, df1 = -2335, df2 = 9328, p-value = NA alternative hypothesis: significant effects Warning message: In pf(q, df1, df2, lower.tail, log.p) : NaNs produced Thank you Liviu On 2/4/10, Millo Giovanni <giovanni_mi...@generali.com> wrote: > Dear Liviu, > > it's difficult to tell without seeing the data. I might guess that you have > some completely empty groups about which Tapply complains when doing the > time-demeaning, but it would be just a guess. > > I realize you can't share the data in the present form, but may I suggest > you try and subset your data in some random way, find a "problematic" subset > (one which gives the error) then change labels and everything so that the > data become unrecognizable, and send us that example? You can also randomly > transform them, as this is likely to be a missing values issue. > > Giovanni > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.