Re: [R] create a dummy variables for companies with complete history.
You may want to consider another way of getting your answer that takes advantage of some of R's features: # Make some example data cods - LETTERS[1:10] # Ten companies yrs - 2010:2014 # 5 years set.seed(42) # Set random seed so we all get the same values # Chances of revenue for a given year are 95% rev - round(rbinom(50, 1, .95)*runif(50, 25, 50), 2) z - data.frame(expand.grid(year=yrs, cod=cods)[, 2:1], rev) # Remove years with missing (0) revenue z - z[z$rev 1, ] str(z) 'data.frame': 45 obs. of 3 variables: $ cod : Factor w/ 10 levels A,B,C,D,..: 1 1 1 1 1 2 2 2 2 2 ... $ year: int 2010 2011 2012 2013 2014 2010 2011 2012 2013 2014 ... $ rev : num 33.3 33.7 35 44.6 26 ... # Construct the dummy variable tbl - xtabs(~cod+year, z) tbl year cod 2010 2011 2012 2013 2014 A11111 B11111 C11111 D10111 E11011 F11111 G11111 H11111 I11101 J01101 dummy - as.integer(apply(tbl, 1, all)) dummy [1] 1 1 1 0 0 1 1 1 0 0 - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Dewey Sent: Wednesday, June 24, 2015 2:12 PM To: giacomo begnis; r-help@r-project.org Subject: Re: [R] create a dummy variables for companies with complete history. Comments below On 24/06/2015 19:26, giacomo begnis wrote: Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm z-read.table(file=c:/Rp/cddat.txt, sep=, header=T) attach(z) n-length(z$cod) // number of obs dataset Could also use nrow(z) d1-numeric(n) // dummy variable for (i in 5:n) { if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]=1} else { d1[i]=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 Did you really type = which means less than or equals to? If so, try replacing it with - and see what happens. When I run the program d1 is always equal to zero. Why? Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create a dummy variables for companies with complete history.
Giacomo, Please include some representative data. It is not clear why your offset of 4 (z$cod[i - 4]) is going to be an accurate surrogate for complete data. Since I do not have your data set or its true structure I am having to guess. # make 5 copies of 200 companies companies - paste0(rep(LETTERS[1:4], 5, each = 50), rep(1:50, 5)) companies - companies[order(companies)] years - rep(1:5, 200) z - data.frame(cod = companies, year = years, revenue = round(rnorm(1000, mean = 10, sd = 1))) # trim this down to the 728 rows you have by pulling out records at random set.seed(1) # so that you can repeat these results z - z[sample.int(1000, 728), ] z - z[order(z$cod, z$year), ] #No matter how you order these data, your offset approach will not tell you which companies have full records. head(z, 10) cod year revenue 1 A11 112192 2 A12 105840 4 A14 112357 5 A15 91772 7 A102 102601 8 A103 105183 11 A111 101269 12 A112 100719 14 A114 86138 15 A115 105044 #You can do something like the following. counts - table(z$cod) complete - names(counts[as.integer(counts) == 5]) # It is probably better to keep the dummy variable inside the dataframe. z$complete - ifelse(z$cod %in% complete, TRUE, FALSE) head(z, 20) cod year revenue complete 1 A11 112192FALSE 2 A12 105840FALSE 4 A14 112357FALSE 5 A15 91772FALSE 7 A102 102601FALSE 8 A103 105183FALSE 11 A111 101269FALSE 12 A112 100719FALSE 14 A114 86138FALSE 15 A115 105044FALSE 20 A125 95872FALSE 21 A131 78513 TRUE 22 A132 90502 TRUE 23 A133 108683 TRUE 24 A134 110711 TRUE 25 A135 87842 TRUE 28 A143 99939FALSE 30 A145 111289FALSE 31 A151 100930FALSE 32 A152 93765FALSE Do not use HTML. Use plain text. The character string // is not a comment indicator in R. Do not use attach(). It does not do anything in your example, but it is poor practice. Always write out TRUE and FALSE R. Mark Sharp, Ph.D. msh...@txbiomed.org On Jun 24, 2015, at 1:26 PM, giacomo begnis gmbeg...@yahoo.it wrote: Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm z-read.table(file=c:/Rp/cddat.txt, sep=, header=T) attach(z) n-length(z$cod) // number of obs dataset d1-numeric(n) // dummy variable for (i in 5:n) { if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]=1} else { d1[i]=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 When I run the program d1 is always equal to zero. Why? Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create a dummy variables for companies with complete history.
Please repost your question in plain text rather than HTML - you can see below that your code got rather mangled. Please also include some sample data using dput() - made-up data of similar form is fine, but it's very hard to answer a question based on guessing what the data look like. Sarah On Wed, Jun 24, 2015 at 2:26 PM, giacomo begnis gmbeg...@yahoo.it wrote: Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm z-read.table(file=c:/Rp/cddat.txt, sep=, header=T) attach(z) n-length(z$cod) // number of obs dataset d1-numeric(n) // dummy variable for (i in 5:n) { if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]=1} else { d1[i]=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 When I run the program d1 is always equal to zero. Why? Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm [[alternative HTML version deleted]] -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create a dummy variables for companies with complete history.
Comments below On 24/06/2015 19:26, giacomo begnis wrote: Hi, I have a dataset (728 obs) containing three variables code of a company, year and revenue. Some companies have a complete history of 5 years, others have not a complete history (for instance observations for three or four years).I would like to determine the companies with a complete history using a dummy variables.I have written the following program but there is somehting wrong because the dummy variable that I have create is always equal to zero.Can somebody help me?Thanks, gm z-read.table(file=c:/Rp/cddat.txt, sep=, header=T) attach(z) n-length(z$cod) // number of obs dataset Could also use nrow(z) d1-numeric(n) // dummy variable for (i in 5:n) { if (z$cod[i]==z$cod[i-4]) // cod is the code of a company { d1[i]=1} else { d1[i]=0} // d1=1 for a company with complete history, d1=0 if the history is not complete }d1 Did you really type = which means less than or equals to? If so, try replacing it with - and see what happens. When I run the program d1 is always equal to zero. Why? Once I have create the dummy variable with subset I obtains the code of the companies with a complete history and finally with a merge I determine a panel of companies with a complete history.But how to determine correctly d1?My best regards, gm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael http://www.dewey.myzen.co.uk/home.html __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.