Re: [R] Panel Data Help
I am going to go out on a limb and say that the answer to your question is "Yes". However, I cannot decipher specifics from your description. If you want a more useful answer you need to follow the advice in the Posting Guide mentioned in the footer (including posting in plain text rather than HTML, and providing some sample data). You will also benefit from reading [1], with particular attention to using the dput function. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example -- Sent from my phone. Please excuse my brevity. On February 1, 2016 12:36:36 PM PST, Daniel Dorchuckwrote: >Hi, > >I'm currently working on an econometrics project on banking and looking >to >merge a dataframe of bank specific data with dataframes of macro >variables. >I am then going to transform the data set into a plm dataframe using >the plm >package. The bank specific observations are indexed across time while >the >macro ones are indexed only across time. Is there a way to merge the >two so >I can use both in my panel regression? > >Best, >Dan > > [[alternative HTML version deleted]] > >__ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Panel Data--filling in missing dates in a span only
Steve, Here is one approach that works. I am calling your first data frame df. # list all years from min to max observed in each ID years - tapply(df$Date, df$ID, function(x) min(x):max(x)) # create a data frame based on the observed range of years fulldf - data.frame(ID=rep(names(years), sapply(years, length)), Date=unlist(years)) # merge the data frame of observations with the data frame with all years merge(fulldf, df, all=TRUE) Jean On Tue, Mar 10, 2015 at 5:53 PM, Steven Archambault archste...@gmail.com wrote: Hi folks, I have this panel data (below), with observations missing in each of the panels. I want to fill in years for the missing data, but only those years within the span of the existing data. For instance, BC-0002 needs on year, 1995. I do not want any years after the last observation. structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c(BC-0002, BC-0003, BC-0004), class = factor), Date = c(1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1996L, 1989L, 1990L, 1991L, 1992L, 1993L, 1994L, 1996L, 1995L, 1996L, 1997L, 1998L, 2000L, 1994L, 1993L, 1999L, 1998L), DepthtoWater_bgs = c(317.85, 317.25, 321.25, 312.31, 313.01, 330.41, 321.01, 166.58, 167.55, 168.65, 168.95, 169.25, 168.85, 169.75, 260.6, 261.65, 262.15, 265.45, 266.15, 265.25, 265.05, 266.95, 267.75)), .Names = c(ID, Date, DepthtoWater_bgs ), class = data.frame, row.names = c(NA, -23L)) I have been using this code to expand the entire panels, but it is not what exactly what I want. fexp - expand.grid(ID=unique(wells$ID), Date=unique(wells$Date)) merge(fexp, wells, all=TRUE) Any help would be much appreciated! Thanks, Steve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Panel data - replicating Stata's xtpcse in R
On Thu, 7 Apr 2011, Florian Markowetz wrote: Dear list, I am trying to replicate an econometrics study that was orginally done in Stata. (Blanton and Blanton. 2009. A Sectoral Analysis of Human Rights and FDI: Does Industry Type Matter? International Studies Quarterley 53 (2):469 - 493.) The model I try to replicate is in Stata given as xtpcse total_FDI lag_total ciri human_cap worker_rts polity_4 market income econ_growth log_trade fix_dollar fixed_xr xr_fluct lab_growth english, pairwise corr(ar1) According to the paper, this is an OLS regression with panel corrected standard errors including a lagged dependent variable (lag_total is total_FDI t-1) and controlling first order correlations within each panel (corr(ar1)). I'm not sure about the Stata command (because I haven't got Stata installed myself) and how it translates to R. Other people might know better. From the verbal description OLS plus panel-corrected standard errors I would have expected that the coefficients could be estimated by lm() but that does not seem to be the case. Note sure why...it doesn't seem to be _O_LS then. Did you check that the Stata command produces the same output as indicated in the paper? (Maybe some data preprocessing is necessary...?) In any case, I've had success with replicating such results with the plm package (see also http://www.jstatsoft.org/v27/i02/). Typically using the model = pooling (i.e., OLS) and then computing the standard errors via vcovBK(). The latter stands for Beck Katz which is what the pcse package also implements. In a few other cases, I replicated the so-called panel-corrected standard errors via geeglm() from geepack (http://www.jstatsoft.org/v15/i02/). Using the default corstr = independence (i.e., again correspond to OLS). Other corstr could be employed. Just as additional information: Many econometricians don't know much about the type of models the nlme estimates. Usually, least squares technology is preferred in econometrics rather than likelihood-based ideas. Also, other multi-level models are rarely used. If specified in the same way, both approaches often yield similar results. There is a paragraph in the above-mentioned JSS paper on plm that discusses (dis)similarities with nlme. Finally, a JSS paper on the pcse package is also waiting for publication in a special volume...hopefully online next month. Good luck with the replication! Best, Z The BIG QUESTION is how to replicate this line in R. Econometrics is a new field to me, but a bit of searching showed that packages like plm, nlme, pcse should be able to handle this kind of problem. In particular, function gls() uses auto-correlation structure and pcse() corrects the standard errors of the fitted model. Below is some code to show what I have done, and some problems I ran into. ## setup and load data from web library(foreign) library(nlme) library(pcse) D - read.dta(http://umdrive.memphis.edu/rblanton/public/ISQ_data/blanton_isq08_data.dta;) D[544,year] - 2005 ## fixing an unexpected NA in the year column ## Model formula form - total_FDI ~ lag_total + ciri + human_cap + worker_rts + polity_4 + market_size + income + econ_growth + log_trade + fixed_xr + fix_dollar + xr_fluct + english + lab_growth ## Model 1: no auto-correlation res1 - gls(model=form, data=D,correlation=NULL,na.action=na.omit) coefficients(res1) ## Model 2: with auto-correlation corr - corAR1(.1,~1|c_name) res2 - gls(model=form, data=D,correlation=corr,na.action=na.omit) coefficients(res2) Now, I know from the paper how the Stata coefficients looked like. For example, for log_total it should be .852 and for market_size .21 (these were the two significant ones). The result of Model1 is closer to this than the result of Model 2, but there is still quite a gap. The goal is to do OLS on panel data with AR(1) and PCSE - am I on the right track here? More specifically: Question 1: Auto-correlation - how to specify the parameter 'value' in corAR1 (the .1 above is completely arbitrary) - Any other ideas how to translate Stata's corr(AR1) into R? (I'm not even completely sure what Stata does there and didn't find any details in the online manuals) Question 2: PCSE - the pcse function seems to work on objects of class 'lm' only. Any way to use it for gls-objects? Any help is greatly appreciated! Florian -- Florian Markowetz Cancer Research UK Cambridge Research Institute Li Ka Shing Centre Robinson Way, Cambridge, CB2 0RE, UK phone: +44 (0) 1223 40 4315 email: florian.markow...@cancer.org.uk web : http://www.markowetzlab.org skype: florian.markowetz This communication is from Cancer Research UK. Our website is at www.cancerresearchuk.org. We are a registered charity in England and Wales (1089464) and in Scotland (SC041666) and a company limited by guarantee registered in England and Wales under number 4325234. Our registered address is Angel Building, 407 St John Street, London,
Re: [R] Panel Data Analysis in R
You wrote: Ø Dear All, Ø Can anyone provide me with reference notes(or steps) towards analysis of?? (un)balanced panel data in R. Ø Thank you! The plm package does panel data analysis in R. See the vignette at: cran.r-project.org/web/packages/plm/vignettes/plm.pdf. There are other similar articles by the same authors, Yves Croissant and Giovanni Millo, and one of these is the best to get you started. If the plm package does not do all that you are looking for, you will have to explore the more general packages for mixed models (or hierarchical models or models for longitudinal data) in statistics, like nlme and lme4. See section 7 of the vignette for more details on these packages. The plm package deals with panel data econometrics. Good books on the topic of panel data econometrics are: Wooldridge J (2002), Econometric Analysis of Cross{Section and Panel Data. MIT press and Baltagi B (2001), Econometric Analysis of Panel Data. 3rd edition. John Wiley and Sons ltd but these books do not have any R code (or any other code for that matter). The classic reference for mixed models in R is the book by Pinheiro and Bates: Pinheiro J, Bates D (2000). Mixed{E_ects Models in S and S-plus. Springer-Verlag, and, Pinheiro J, Bates D, DebRoy S, the~R Core~team DS (2007). nlme: Linear and Nonlinear Mixed E_ects Models. R package version 3.1-86, URL http://CRAN.R-project.org. Other books that deal with longitudinal data are: Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence by Judith D. Singer and John B. Willett (social science perspective) and Longitudinal and Panel Data: Analysis and Applications in the Social Sciences by Edward W Frees (a statistician). Both these books have URL's where you can get the R programs for the books. The URL for the Singer and Willet book is: http://www.ats.ucla.edu/stat/examples/alda.htm. Jude Ryan Director, Analytical Insights | MarketShare 1270 Avenue of the Americas, Suite # 2702, New York, NY 10020 P: 646.745.9916 x222 | M: 973.943.2029 www.marketshare.com twitter.com/marketsharep [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Panel data with binary dependent variable
To my knowledge, fixed and random effect models may be estimated for the logit model and only the random effect model for the probit model (because of the incidental parameter problem). I think clogit in the survival package fits the model that is called the fixed effect logit model in the econometrics litterature. To my knowledge, there is currently no implementation ot the random effect model for probit and logit. Yves __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Panel data with binary dependent variable
RE models are available in the lme4 and MASS packages in the glmer and glmmPQL functions, respectively. -- View this message in context: http://r.789695.n4.nabble.com/Panel-data-with-binary-dependent-variable-tp2156043p2184223.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] panel data
Thanks David, I never thought of using merge for this. I usually used the cast command from the reshape package for this type of task. Cheers, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Fri, Apr 2, 2010 at 11:31 PM, David Winsemius dwinsem...@comcast.netwrote: On Apr 2, 2010, at 3:39 PM, Geoffrey Smith wrote: Hello, I have an unbalanced panel data set that looks like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 Jack,2007,70 Jordan,2008,72 That is, James, Jack, and Jordan are missing a YEAR. Is there any command that will fill in the missing YEAR such that the end result will be balanced and look like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 James,2008,NA Jack,2007,70 Jack,2008,NA Jordan,2007,NA Jordan,2008,72 It's not one command but it's an approach ... assumes you have data in a dataframe named ftbl: fexp - expand.grid(ID=unique(ftbl$ID), YEAR=unique(ftbl$YEAR)) merge(fexp, ftbl, all=TRUE) ID YEAR HEIGHT 1 Harry 2007 62 2 Harry 2008 62 3Jack 2007 70 4Jack 2008 NA 5 James 2007 68 6 James 2008 NA 7 Jordan 2007 NA 8 Jordan 2008 72 9Mary 2007 45 10 Mary 2008 50 11Tom 2007 65 12Tom 2008 66 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] panel data
On Apr 2, 2010, at 3:39 PM, Geoffrey Smith wrote: Hello, I have an unbalanced panel data set that looks like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 Jack,2007,70 Jordan,2008,72 That is, James, Jack, and Jordan are missing a YEAR. Is there any command that will fill in the missing YEAR such that the end result will be balanced and look like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 James,2008,NA Jack,2007,70 Jack,2008,NA Jordan,2007,NA Jordan,2008,72 It's not one command but it's an approach ... assumes you have data in a dataframe named ftbl: fexp - expand.grid(ID=unique(ftbl$ID), YEAR=unique(ftbl$YEAR)) merge(fexp, ftbl, all=TRUE) ID YEAR HEIGHT 1 Harry 2007 62 2 Harry 2008 62 3Jack 2007 70 4Jack 2008 NA 5 James 2007 68 6 James 2008 NA 7 Jordan 2007 NA 8 Jordan 2008 72 9Mary 2007 45 10 Mary 2008 50 11Tom 2007 65 12Tom 2008 66 -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] panel data
Try this: as.data.frame.table(tapply(DF[,3], DF[2:1], c), responseName = names(DF)[3]) YEAR ID HEIGHT 1 2007 Harry 62 2 2008 Harry 62 3 2007 Jack 70 4 2008 Jack NA 5 2007 James 68 6 2008 James NA 7 2007 Jordan NA 8 2008 Jordan 72 9 2007 Mary 45 10 2008 Mary 50 11 2007Tom 65 12 2008Tom 66 On Fri, Apr 2, 2010 at 3:39 PM, Geoffrey Smith g...@asu.edu wrote: Hello, I have an unbalanced panel data set that looks like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 Jack,2007,70 Jordan,2008,72 That is, James, Jack, and Jordan are missing a YEAR. Is there any command that will fill in the missing YEAR such that the end result will be balanced and look like: ID,YEAR,HEIGHT Tom,2007,65 Tom,2008,66 Mary,2007,45 Mary,2008,50 Harry,2007,62 Harry,2008,62 James,2007,68 James,2008,NA Jack,2007,70 Jack,2008,NA Jordan,2007,NA Jordan,2008,72 Thank you. Geoff -- Geoffrey Smith Visiting Assistant Professor Department of Finance W. P. Carey School of Business Arizona State University [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.