Re: [R] Panel Data Help

2016-02-01 Thread Jeff Newmiller
I am going to go out on a limb and say that the answer to your question is 
"Yes".

However,  I cannot decipher specifics from your description.  If you want a 
more useful answer you need to follow the advice in the Posting Guide mentioned 
in the footer  (including posting in plain text rather than HTML, and providing 
some sample data). You will also benefit from reading [1], with particular 
attention to using the dput function. 

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
-- 
Sent from my phone. Please excuse my brevity.

On February 1, 2016 12:36:36 PM PST, Daniel Dorchuck  
wrote:
>Hi,
>
>I'm currently working on an econometrics project on banking and looking
>to
>merge a dataframe of bank specific data with dataframes of macro
>variables.
>I am then going to transform the data set into a plm dataframe using
>the plm
>package. The bank specific observations are indexed across time while
>the
>macro ones are indexed only across time. Is there a way to merge the
>two so
>I can use both in my panel regression?
>
>Best,
>Dan
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Panel Data--filling in missing dates in a span only

2015-03-11 Thread Adams, Jean
Steve,

Here is one approach that works.  I am calling your first data frame df.

# list all years from min to max observed in each ID
years - tapply(df$Date, df$ID, function(x) min(x):max(x))

# create a data frame based on the observed range of years
fulldf - data.frame(ID=rep(names(years), sapply(years, length)),
  Date=unlist(years))

# merge the data frame of observations with the data frame with all years
merge(fulldf, df, all=TRUE)

Jean

On Tue, Mar 10, 2015 at 5:53 PM, Steven Archambault archste...@gmail.com
wrote:

 Hi folks,

 I have this panel data (below), with observations missing in each of the
 panels. I want to fill in years for the missing data, but only those years
 within the span of the existing data. For instance, BC-0002 needs on year,
 1995. I do not want any years after the last observation.

 structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label =
 c(BC-0002,
 BC-0003, BC-0004), class = factor), Date = c(1989L, 1990L,
 1991L, 1992L, 1993L, 1994L, 1996L, 1989L, 1990L, 1991L, 1992L,
 1993L, 1994L, 1996L, 1995L, 1996L, 1997L, 1998L, 2000L, 1994L,
 1993L, 1999L, 1998L), DepthtoWater_bgs = c(317.85, 317.25, 321.25,
 312.31, 313.01, 330.41, 321.01, 166.58, 167.55, 168.65, 168.95,
 169.25, 168.85, 169.75, 260.6, 261.65, 262.15, 265.45, 266.15,
 265.25, 265.05, 266.95, 267.75)), .Names = c(ID, Date,
 DepthtoWater_bgs
 ), class = data.frame, row.names = c(NA, -23L))


 I have been using this code to expand the entire panels, but it is not
 what exactly what I want.

 fexp - expand.grid(ID=unique(wells$ID), Date=unique(wells$Date))
 merge(fexp, wells, all=TRUE)

 Any help would be much appreciated!

 Thanks,
 Steve

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Panel data - replicating Stata's xtpcse in R

2011-04-07 Thread Achim Zeileis

On Thu, 7 Apr 2011, Florian Markowetz wrote:


Dear list,

I am trying to replicate an econometrics study that was orginally done in 
Stata. (Blanton and Blanton. 2009. A Sectoral Analysis of Human Rights and FDI: 
Does Industry Type Matter?  International Studies Quarterley 53 (2):469 - 493.) 
The model I try to replicate is in Stata given as

xtpcse total_FDI lag_total ciri human_cap worker_rts polity_4 market 
income econ_growth log_trade fix_dollar fixed_xr xr_fluct lab_growth 
english, pairwise corr(ar1)


According to the paper, this is an OLS regression with panel corrected 
standard errors including a lagged dependent variable (lag_total is 
total_FDI t-1) and controlling first order correlations within each 
panel (corr(ar1)).


I'm not sure about the Stata command (because I haven't got Stata 
installed myself) and how it translates to R. Other people might know 
better.


From the verbal description OLS plus panel-corrected standard errors I 
would have expected that the coefficients could be estimated by lm() but 
that does not seem to be the case. Note sure why...it doesn't seem to be 
_O_LS then. Did you check that the Stata command produces the same output 
as indicated in the paper? (Maybe some data preprocessing is 
necessary...?)


In any case, I've had success with replicating such results with the plm 
package (see also http://www.jstatsoft.org/v27/i02/). Typically using the 
model = pooling (i.e., OLS) and then computing the standard errors via 
vcovBK(). The latter stands for Beck  Katz which is what the pcse 
package also implements.


In a few other cases, I replicated the so-called panel-corrected standard 
errors via geeglm() from geepack (http://www.jstatsoft.org/v15/i02/).

Using the default corstr = independence (i.e., again correspond to OLS).
Other corstr could be employed.

Just as additional information: Many econometricians don't know much about 
the type of models the nlme estimates. Usually, least squares technology 
is preferred in econometrics rather than likelihood-based ideas. Also, 
other multi-level models are rarely used. If specified in the same way, 
both approaches often yield similar results. There is a paragraph in the 
above-mentioned JSS paper on plm that discusses (dis)similarities with 
nlme.


Finally, a JSS paper on the pcse package is also waiting for publication 
in a special volume...hopefully online next month.


Good luck with the replication!
Best,
Z


The BIG QUESTION is how to replicate this line in R.

Econometrics is a new field to me, but a bit of searching showed that  packages 
like plm, nlme, pcse should be able to handle this kind of problem. In 
particular, function gls() uses auto-correlation structure and pcse() corrects 
the standard errors of the fitted model. Below is some code to show what I have 
done, and some problems I ran into.

## setup and load data from web
library(foreign)
library(nlme)
library(pcse)
D - 
read.dta(http://umdrive.memphis.edu/rblanton/public/ISQ_data/blanton_isq08_data.dta;)
D[544,year] - 2005 ## fixing an unexpected NA in the year column

## Model formula
form - total_FDI ~ lag_total + ciri + human_cap + worker_rts + polity_4 + 
market_size + income + econ_growth + log_trade + fixed_xr + fix_dollar + xr_fluct 
+ english + lab_growth

## Model 1: no auto-correlation
res1  - gls(model=form, data=D,correlation=NULL,na.action=na.omit)
coefficients(res1)

## Model 2: with auto-correlation
corr - corAR1(.1,~1|c_name)
res2  - gls(model=form, data=D,correlation=corr,na.action=na.omit)
coefficients(res2)

Now, I know from the paper how the Stata coefficients looked like.  For 
example, for log_total it should be .852 and for market_size .21 (these 
were the two significant ones). The result of Model1 is closer to this 
than the result of Model 2, but there is still quite a gap.


The goal is to do OLS on panel data with AR(1) and PCSE - am I on the 
right track here? More specifically:


Question 1: Auto-correlation - how to specify the parameter 'value' in 
corAR1 (the .1 above is completely arbitrary) - Any other ideas how to 
translate Stata's corr(AR1) into R? (I'm not even completely sure what 
Stata does there and didn't find any details in the online manuals)


Question 2: PCSE - the pcse function seems to work on objects of class 
'lm' only. Any way to use it for gls-objects?


Any help is greatly appreciated!
Florian

--
Florian Markowetz

Cancer Research UK
Cambridge Research Institute
Li Ka Shing Centre
Robinson Way, Cambridge, CB2 0RE, UK

phone: +44 (0) 1223 40 4315
email: florian.markow...@cancer.org.uk
web  : http://www.markowetzlab.org
skype: florian.markowetz

This communication is from Cancer Research UK. Our website is at 
www.cancerresearchuk.org. We are a registered charity in England and Wales 
(1089464) and in Scotland (SC041666) and a company limited by guarantee 
registered in England and Wales under number 4325234. Our registered address is 
Angel Building, 407 St John Street, London, 

Re: [R] Panel Data Analysis in R

2010-12-30 Thread Jude Ryan
You wrote:


Ø  Dear All,

Ø  Can anyone provide me with reference notes(or steps) towards analysis of?? 
(un)balanced panel data in R.

Ø  Thank you!

The plm package does panel data analysis in R. See the vignette at: 
cran.r-project.org/web/packages/plm/vignettes/plm.pdf. There are other similar 
articles by the same authors, Yves Croissant and
Giovanni Millo, and one of these is the best to get you started. If the plm 
package does not do all that you are looking for, you will have to explore the 
more general packages for mixed models (or hierarchical models or models for 
longitudinal data) in statistics, like nlme and lme4. See section 7 of the 
vignette for more details on these packages. The plm package deals with panel 
data econometrics. Good books on the topic of panel data econometrics are: 
Wooldridge J (2002), Econometric Analysis of Cross{Section and Panel Data. MIT 
press and Baltagi B (2001), Econometric Analysis of Panel Data. 3rd edition. 
John Wiley and Sons ltd but these books do not have any R code (or any other 
code for that matter).

The classic reference for mixed models in R is the book by Pinheiro and Bates: 
Pinheiro J, Bates D (2000). Mixed{E_ects Models in S and S-plus. 
Springer-Verlag, and, Pinheiro J, Bates D, DebRoy S, the~R Core~team DS (2007). 
nlme: Linear and Nonlinear Mixed E_ects Models. R package version 3.1-86, URL 
http://CRAN.R-project.org.

Other books that deal with longitudinal data are: Applied Longitudinal Data 
Analysis: Modeling Change and Event Occurrence by Judith D. Singer and John B. 
Willett (social science perspective) and Longitudinal and Panel Data: Analysis 
and Applications in the Social Sciences by Edward W Frees (a statistician). 
Both these books have URL's where you can get the R programs for the books. The 
URL for the Singer and Willet book is: 
http://www.ats.ucla.edu/stat/examples/alda.htm.


Jude Ryan
Director, Analytical Insights | MarketShare
1270 Avenue of the Americas, Suite # 2702, New York, NY 10020
P: 646.745.9916 x222 | M: 973.943.2029
www.marketshare.com
twitter.com/marketsharep


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Panel data with binary dependent variable

2010-05-11 Thread yves croissant
To my knowledge, fixed and random effect models may be estimated for the
logit model and only the random effect model for the probit model
(because of the incidental parameter problem).

I think clogit in the survival package fits the model that is called the
fixed effect logit model in the econometrics litterature. 

To my knowledge, there is currently no implementation ot the random
effect model for probit and logit.

Yves

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Panel data with binary dependent variable

2010-05-11 Thread Daniel Malter

RE models are available in the lme4 and MASS packages in the glmer and
glmmPQL functions, respectively.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Panel-data-with-binary-dependent-variable-tp2156043p2184223.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] panel data

2010-04-04 Thread Tal Galili
Thanks David, I never thought of using merge for this.

I usually used the cast command from the reshape package for this type
of task.


Cheers,
Tal



Contact
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
--




On Fri, Apr 2, 2010 at 11:31 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Apr 2, 2010, at 3:39 PM, Geoffrey Smith wrote:

  Hello, I have an unbalanced panel data set that looks like:

 ID,YEAR,HEIGHT
 Tom,2007,65
 Tom,2008,66
 Mary,2007,45
 Mary,2008,50
 Harry,2007,62
 Harry,2008,62
 James,2007,68
 Jack,2007,70
 Jordan,2008,72

 That is, James, Jack, and Jordan are missing a YEAR.

 Is there any command that will fill in the missing YEAR such that the
 end
 result will be balanced and look like:

 ID,YEAR,HEIGHT
 Tom,2007,65
 Tom,2008,66
 Mary,2007,45
 Mary,2008,50
 Harry,2007,62
 Harry,2008,62
 James,2007,68
 James,2008,NA
 Jack,2007,70
 Jack,2008,NA
 Jordan,2007,NA
 Jordan,2008,72


 It's not one command but it's an approach ...  assumes you have data in a
 dataframe named ftbl:

  fexp - expand.grid(ID=unique(ftbl$ID), YEAR=unique(ftbl$YEAR))
  merge(fexp, ftbl, all=TRUE)

   ID YEAR HEIGHT
 1   Harry 2007 62
 2   Harry 2008 62
 3Jack 2007 70
 4Jack 2008 NA
 5   James 2007 68
 6   James 2008 NA
 7  Jordan 2007 NA
 8  Jordan 2008 72
 9Mary 2007 45
 10   Mary 2008 50
 11Tom 2007 65
 12Tom 2008 66



  --

 David Winsemius, MD
 West Hartford, CT


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] panel data

2010-04-02 Thread David Winsemius


On Apr 2, 2010, at 3:39 PM, Geoffrey Smith wrote:


Hello, I have an unbalanced panel data set that looks like:

ID,YEAR,HEIGHT
Tom,2007,65
Tom,2008,66
Mary,2007,45
Mary,2008,50
Harry,2007,62
Harry,2008,62
James,2007,68
Jack,2007,70
Jordan,2008,72

That is, James, Jack, and Jordan are missing a YEAR.

Is there any command that will fill in the missing YEAR such that  
the end

result will be balanced and look like:

ID,YEAR,HEIGHT
Tom,2007,65
Tom,2008,66
Mary,2007,45
Mary,2008,50
Harry,2007,62
Harry,2008,62
James,2007,68
James,2008,NA
Jack,2007,70
Jack,2008,NA
Jordan,2007,NA
Jordan,2008,72


It's not one command but it's an approach ...  assumes you have data  
in a dataframe named ftbl:


 fexp - expand.grid(ID=unique(ftbl$ID), YEAR=unique(ftbl$YEAR))
 merge(fexp, ftbl, all=TRUE)

   ID YEAR HEIGHT
1   Harry 2007 62
2   Harry 2008 62
3Jack 2007 70
4Jack 2008 NA
5   James 2007 68
6   James 2008 NA
7  Jordan 2007 NA
8  Jordan 2008 72
9Mary 2007 45
10   Mary 2008 50
11Tom 2007 65
12Tom 2008 66



 --

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] panel data

2010-04-02 Thread Gabor Grothendieck
Try this:

 as.data.frame.table(tapply(DF[,3], DF[2:1], c), responseName = names(DF)[3])
   YEAR ID HEIGHT
1  2007  Harry 62
2  2008  Harry 62
3  2007   Jack 70
4  2008   Jack NA
5  2007  James 68
6  2008  James NA
7  2007 Jordan NA
8  2008 Jordan 72
9  2007   Mary 45
10 2008   Mary 50
11 2007Tom 65
12 2008Tom 66



On Fri, Apr 2, 2010 at 3:39 PM, Geoffrey Smith g...@asu.edu wrote:
 Hello, I have an unbalanced panel data set that looks like:

 ID,YEAR,HEIGHT
 Tom,2007,65
 Tom,2008,66
 Mary,2007,45
 Mary,2008,50
 Harry,2007,62
 Harry,2008,62
 James,2007,68
 Jack,2007,70
 Jordan,2008,72

 That is, James, Jack, and Jordan are missing a YEAR.

 Is there any command that will fill in the missing YEAR such that the end
 result will be balanced and look like:

 ID,YEAR,HEIGHT
 Tom,2007,65
 Tom,2008,66
 Mary,2007,45
 Mary,2008,50
 Harry,2007,62
 Harry,2008,62
 James,2007,68
 James,2008,NA
 Jack,2007,70
 Jack,2008,NA
 Jordan,2007,NA
 Jordan,2008,72

 Thank you.  Geoff

 --
 Geoffrey Smith
 Visiting Assistant Professor
 Department of Finance
 W. P. Carey School of Business
 Arizona State University

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.