Re: [R] create a dummy variables for companies with complete history.

2015-06-24 Thread David L Carlson
You may want to consider another way of getting your answer that takes 
advantage of some of R's features:

 # Make some example data
 cods - LETTERS[1:10] # Ten companies
 yrs - 2010:2014 # 5 years
 set.seed(42) # Set random seed so we all get the same values
 # Chances of revenue for a given year are 95%
 rev - round(rbinom(50, 1, .95)*runif(50, 25, 50), 2)
 z - data.frame(expand.grid(year=yrs, cod=cods)[, 2:1], rev)
 # Remove years with missing (0) revenue
 z - z[z$rev  1, ]
 str(z)
'data.frame':   45 obs. of  3 variables:
 $ cod : Factor w/ 10 levels A,B,C,D,..: 1 1 1 1 1 2 2 2 2 2 ...
 $ year: int  2010 2011 2012 2013 2014 2010 2011 2012 2013 2014 ...
 $ rev : num  33.3 33.7 35 44.6 26 ...
 
 # Construct the dummy variable
 tbl - xtabs(~cod+year, z)
 tbl
   year
cod 2010 2011 2012 2013 2014
  A11111
  B11111
  C11111
  D10111
  E11011
  F11111
  G11111
  H11111
  I11101
  J01101
 dummy - as.integer(apply(tbl, 1, all))
 dummy
 [1] 1 1 1 0 0 1 1 1 0 0

-
David L Carlson
Department of Anthropology
Texas AM University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Michael Dewey
Sent: Wednesday, June 24, 2015 2:12 PM
To: giacomo begnis; r-help@r-project.org
Subject: Re: [R] create a dummy variables for companies with complete history.

Comments below

On 24/06/2015 19:26, giacomo begnis wrote:
 Hi, I have a dataset  (728 obs) containing three variables code of a company, 
 year and revenue. Some companies have a complete history of 5 years, others 
 have not a complete history (for instance observations for three or four 
 years).I would like to determine the companies with a complete history using 
 a dummy variables.I have written the following program but there is somehting 
 wrong because the dummy variable that I have create is always equal to 
 zero.Can somebody help me?Thanks, gm

 z-read.table(file=c:/Rp/cddat.txt, sep=, header=T)
 attach(z)
 n-length(z$cod)  // number of obs dataset


Could also use nrow(z)

 d1-numeric(n)   // dummy variable

 for (i in 5:n)  {
 if (z$cod[i]==z$cod[i-4]) // cod is the code of a company

  { d1[i]=1} else { d1[i]=0}  // d1=1 for a 
company with complete history, d1=0 if the history is not complete  }d1

Did you really type = which means less than or equals to? If so, try 
replacing it with - and see what happens.

 When I run the program d1 is always equal to zero. Why?
 Once I have create the dummy variable with subset I obtains the code of the 
 companies with a complete history and finally with a merge  I determine a 
 panel of companies with a complete history.But how to determine correctly 
 d1?My best regards, gm



   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a dummy variables for companies with complete history.

2015-06-24 Thread Mark Sharp
Giacomo,

Please include some representative data. It is not clear why your offset of 4 
(z$cod[i - 4]) is going to be an accurate surrogate for complete data.

Since I do not have your data set or its true structure I am having to guess.
# make 5 copies of 200 companies
companies - paste0(rep(LETTERS[1:4], 5, each = 50), rep(1:50, 5))
companies - companies[order(companies)]
years - rep(1:5, 200)
z - data.frame(cod = companies, year = years,
revenue = round(rnorm(1000, mean = 10, sd = 1)))
# trim this down to the 728 rows you have by pulling out records at random
set.seed(1) # so that you can repeat these results
z - z[sample.int(1000, 728), ]
z - z[order(z$cod, z$year), ]

#No matter how you order these data, your offset approach will not tell you 
which companies have full records.
 head(z, 10)
   cod year revenue
1   A11  112192
2   A12  105840
4   A14  112357
5   A15   91772
7  A102  102601
8  A103  105183
11 A111  101269
12 A112  100719
14 A114   86138
15 A115  105044

#You can do something like the following.

counts - table(z$cod)
complete - names(counts[as.integer(counts) == 5])
# It is probably better to keep the dummy variable inside the dataframe.
z$complete - ifelse(z$cod %in% complete, TRUE, FALSE)

 head(z, 20)
   cod year revenue complete
1   A11  112192FALSE
2   A12  105840FALSE
4   A14  112357FALSE
5   A15   91772FALSE
7  A102  102601FALSE
8  A103  105183FALSE
11 A111  101269FALSE
12 A112  100719FALSE
14 A114   86138FALSE
15 A115  105044FALSE
20 A125   95872FALSE
21 A131   78513 TRUE
22 A132   90502 TRUE
23 A133  108683 TRUE
24 A134  110711 TRUE
25 A135   87842 TRUE
28 A143   99939FALSE
30 A145  111289FALSE
31 A151  100930FALSE
32 A152   93765FALSE
 
Do not use HTML. Use plain text. The character string // is not a comment 
indicator in R. Do not use attach(). It does not do anything in your example, 
but it is poor practice. Always write out TRUE and FALSE
R. Mark Sharp, Ph.D.
msh...@txbiomed.org





 On Jun 24, 2015, at 1:26 PM, giacomo begnis gmbeg...@yahoo.it wrote:
 
 Hi, I have a dataset  (728 obs) containing three variables code of a company, 
 year and revenue. Some companies have a complete history of 5 years, others 
 have not a complete history (for instance observations for three or four 
 years).I would like to determine the companies with a complete history using 
 a dummy variables.I have written the following program but there is somehting 
 wrong because the dummy variable that I have create is always equal to 
 zero.Can somebody help me?Thanks, gm
 
 z-read.table(file=c:/Rp/cddat.txt, sep=, header=T)
 attach(z)
 n-length(z$cod)  // number of obs dataset
 
 d1-numeric(n)   // dummy variable
 
 for (i in 5:n)  {
if (z$cod[i]==z$cod[i-4]) // cod is the code of a company  
{ d1[i]=1} else { d1[i]=0}  // d1=1 for a company with 
 complete history, d1=0 if the history is not complete  }d1
 When I run the program d1 is always equal to zero. Why?
 Once I have create the dummy variable with subset I obtains the code of the 
 companies with a complete history and finally with a merge  I determine a 
 panel of companies with a complete history.But how to determine correctly 
 d1?My best regards, gm
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a dummy variables for companies with complete history.

2015-06-24 Thread Sarah Goslee
Please repost your question in plain text rather than HTML - you can
see below that your code got rather mangled. Please also include some
sample data using dput() - made-up data of similar form is fine, but
it's very hard to answer a question based on guessing what the data
look like.

Sarah

On Wed, Jun 24, 2015 at 2:26 PM, giacomo begnis gmbeg...@yahoo.it wrote:
 Hi, I have a dataset  (728 obs) containing three variables code of a company, 
 year and revenue. Some companies have a complete history of 5 years, others 
 have not a complete history (for instance observations for three or four 
 years).I would like to determine the companies with a complete history using 
 a dummy variables.I have written the following program but there is somehting 
 wrong because the dummy variable that I have create is always equal to 
 zero.Can somebody help me?Thanks, gm

 z-read.table(file=c:/Rp/cddat.txt, sep=, header=T)
 attach(z)
 n-length(z$cod)  // number of obs dataset

 d1-numeric(n)   // dummy variable

 for (i in 5:n)  {
if (z$cod[i]==z$cod[i-4]) // cod is the code of a company  
{ d1[i]=1} else { d1[i]=0}  // d1=1 for a company with 
 complete history, d1=0 if the history is not complete  }d1
 When I run the program d1 is always equal to zero. Why?
 Once I have create the dummy variable with subset I obtains the code of the 
 companies with a complete history and finally with a merge  I determine a 
 panel of companies with a complete history.But how to determine correctly 
 d1?My best regards, gm



 [[alternative HTML version deleted]]



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create a dummy variables for companies with complete history.

2015-06-24 Thread Michael Dewey

Comments below

On 24/06/2015 19:26, giacomo begnis wrote:

Hi, I have a dataset  (728 obs) containing three variables code of a company, 
year and revenue. Some companies have a complete history of 5 years, others 
have not a complete history (for instance observations for three or four 
years).I would like to determine the companies with a complete history using a 
dummy variables.I have written the following program but there is somehting 
wrong because the dummy variable that I have create is always equal to zero.Can 
somebody help me?Thanks, gm

z-read.table(file=c:/Rp/cddat.txt, sep=, header=T)
attach(z)
n-length(z$cod)  // number of obs dataset



Could also use nrow(z)


d1-numeric(n)   // dummy variable

for (i in 5:n)  {
if (z$cod[i]==z$cod[i-4]) // cod is the code of a company


 { d1[i]=1} else { d1[i]=0}  // d1=1 for a 
company with complete history, d1=0 if the history is not complete  }d1


Did you really type = which means less than or equals to? If so, try 
replacing it with - and see what happens.



When I run the program d1 is always equal to zero. Why?
Once I have create the dummy variable with subset I obtains the code of the 
companies with a complete history and finally with a merge  I determine a panel 
of companies with a complete history.But how to determine correctly d1?My best 
regards, gm



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Michael
http://www.dewey.myzen.co.uk/home.html

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.