Re: [R] Creating dummy variables in r
On Jan 30, 2013, at 04:58 , Bert Gunter wrote: > You almost never need dummy variables in R. R creates them > automatically from factors given model and possibly contrasts > specification. > > ?contrasts ## for some technical details. > > If you have not read "An Introduction to R" do so now. Pay particular > attention to the chapter on modeling and categorical variables. You > can also google around to find appropriate tutorials. Here is one: > > http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm > > I repeat: DO not create dummy variablesby hand in R unless you have > understood the above and have good reason to do so. In this case it's a cutpoint-type situation, and the user might be excused for not wanting to deal with the mysteries of cut() (yet). More importantly, the main issue here seems to be a lack of understanding of where new variables are located. I.e., if the data set is called dd, you need dd$prev1 <- (etc) and if you use attach(), do it _after_ modifying the data (or detach() and reattach). Otherwise, new variables end up in the global environment. (This is logical enough once you realize that the result of a computation does not necessarily "fit" into the dataset.) By the way: You don't need ifelse(): as.numeric(ret1 >= .5) or even just (ret1 >= .5) works. > > -- Bert > > On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson > wrote: >> Hello, >> >> Semi-new r user here and still learning the ropes. I am creating dummy >> variables for a dataset on stock prices in r. One dummy variable is >> called prev1 and is: >> >> prev1 <- ifelse(ret1 >= .5, 1, 0) >> >> where ret1 is the previous day's return. >> >> The variable "prev1" is created fine and works in my regression model >> and for running conditional statistics. However, when I call the >> names() function on the dataset the freshly created variable (prev1) >> doesn't show up; also, when I export the dataset the prev1 variable >> doesn't show up in the exported file. Is there a way to make the >> variable show up on both the call function but more importantly on the >> exported file? Or am I forced to create dummy variables elsewhere(much >> tougher)? >> >> >> Thanks, >> >> Joe >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating dummy variables in r
You almost never need dummy variables in R. R creates them automatically from factors given model and possibly contrasts specification. ?contrasts ## for some technical details. If you have not read "An Introduction to R" do so now. Pay particular attention to the chapter on modeling and categorical variables. You can also google around to find appropriate tutorials. Here is one: http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm I repeat: DO not create dummy variablesby hand in R unless you have understood the above and have good reason to do so. -- Bert On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson wrote: > Hello, > > Semi-new r user here and still learning the ropes. I am creating dummy > variables for a dataset on stock prices in r. One dummy variable is > called prev1 and is: > > prev1 <- ifelse(ret1 >= .5, 1, 0) > > where ret1 is the previous day's return. > > The variable "prev1" is created fine and works in my regression model > and for running conditional statistics. However, when I call the > names() function on the dataset the freshly created variable (prev1) > doesn't show up; also, when I export the dataset the prev1 variable > doesn't show up in the exported file. Is there a way to make the > variable show up on both the call function but more importantly on the > exported file? Or am I forced to create dummy variables elsewhere(much > tougher)? > > > Thanks, > > Joe > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Variables in R
Is your variable Clarity a categorical with 4 levels? Thus, the need for k-1 (3) dummies? Your error may be the result of creating k instead of k-1 dummies, but can't be sure from the example. In R, you don't have to (unless you really want to) explicitly create separate variables. You can use the internal contrast functions. See ?contr.treatment Which is dummy coding by default. You can specify which group is the reference group. Alternatively, if you prefer effects coding, you can see ?contr.sum There are others as well. Tom Fletcher -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of whitaker m. (mw1006) Sent: Wednesday, December 16, 2009 8:59 AM To: r-help@r-project.org Subject: [R] Creating Dummy Variables in R Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1VVS2 5008 1 0 0 1000 5.2 0 0 1 8643 01 0 3402.6 0 0 1 90 0.5 1 0 0 4502.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? Any helps is greatly appreciated. Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Variables in R
On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote: Hi, I am trying to create a set of dummy variables to use within a multiple linear regression and am unable to find the codes within the manuals. For example i have: Price Weight Clarity IF VVS1VVS2 5008 1 0 0 1000 5.2 0 0 1 8643 01 0 3402.6 0 0 1 90 0.5 1 0 0 4502.3 0 1 0 Where price is dependent upon weight (single value in each observation) and clarity (split into three levels, IF, VVS1, VVS2). I am having trouble telling the program that clarity is a set of 3 dummy variables and keep getting error messages, what is the correct way? You should code the categorical variable "Clarity" as a "factor" so that R knows that this is a categorical variable and can deal with it appropriately in subsequent computations such as summary() or lm(). Thus, I would recommend to store your data as dat <- data.frame( Price = c(500, 1000, 864, 340, 90, 450), Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3), Clarity = c("IF", "VVS1", "VVS2")[c(1, 3, 2, 3, 1, 2)]) which yields, e.g., R> summary(dat) PriceWeight Clarity Min. : 90.0 Min. :0.500 IF :2 1st Qu.: 367.5 1st Qu.:2.375 VVS1:2 Median : 475.0 Median :2.800 VVS2:2 Mean : 540.7 Mean :3.600 3rd Qu.: 773.0 3rd Qu.:4.650 Max. :1000.0 Max. :8.000 and then you can also do R> lm(Price ~ Weight + Clarity, data = dat) Call: lm(formula = Price ~ Weight + Clarity, data = dat) Coefficients: (Intercept) Weight ClarityVVS1 ClarityVVS2 -45.0580.01 490.02 403.00 or if you wish to choose a different coding R> lm(Price ~ 0 + Weight + Clarity, data = dat) Call: lm(formula = Price ~ 0 + Weight + Clarity, data = dat) Coefficients: WeightClarityIF ClarityVVS1 ClarityVVS2 80.01 -45.05 444.97 357.95 Some further reading of introductory material on linear regression in R would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc. hth, Z Any helps is greatly appreciated. Matthew __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Creating Dummy Variables in R
On 12/16/2009 03:58 PM, whitaker m. (mw1006) wrote: > Hi, > I am trying to create a set of dummy variables to use within a multiple > linear regression and am unable to find the codes within the manuals. > > For example i have: > Price Weight Clarity > IF VVS1VVS2 > 5008 1 0 0 > 1000 5.2 0 0 1 > 8643 01 0 > 3402.6 0 0 1 > 90 0.5 1 0 0 > 4502.3 0 1 0 > > Where price is dependent upon weight (single value in each observation) and > clarity (split into three levels, IF, VVS1, VVS2). > I am having trouble telling the program that clarity is a set of 3 dummy > variables and keep getting error messages, what is the correct way? > Without an example of your code, it's a bit difficult. But it might be easier to use one variable "clarity" with three possible values (IF, VVS1, VVS2), defined as a factor. lm(Price ~ Weight + Clarity) should then do the trick (unless you explicitly want to use a different dummy coding than the default) Stephan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.