Re: [R] Creating dummy variables in r

2013-01-30 Thread peter dalgaard

On Jan 30, 2013, at 04:58 , Bert Gunter wrote:

 You almost never need dummy variables in R. R creates them
 automatically from factors given model and possibly contrasts
 specification.
 
 ?contrasts  ## for some technical details.
 
 If you have not read An Introduction to R do so now. Pay particular
 attention to the chapter on modeling and categorical variables. You
 can also google around to find appropriate tutorials. Here is one:
 
 http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm
 
 I repeat: DO not create dummy variablesby hand in R unless you have
 understood the above and have good reason to do so.

In this case it's a cutpoint-type situation, and the user might be excused for 
not wanting to deal with the mysteries of cut() (yet). 

More importantly, the main issue here seems to be a lack of understanding of 
where new variables are located. I.e., if the data set is called dd, you need

dd$prev1 - (etc)

and if you use attach(), do it _after_ modifying the data (or detach() and 
reattach).

Otherwise, new variables end up in the global environment. (This is logical 
enough once you realize that the result of a computation does not necessarily 
fit into the dataset.)

By the way: You don't need ifelse(): as.numeric(ret1 = .5) or even just (ret1 
= .5) works. 

 
 -- Bert
 
 On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson
 thoms...@email.arizona.edu wrote:
 Hello,
 
 Semi-new r user here and still learning the ropes. I am creating dummy
 variables for a dataset on stock prices in r. One dummy variable is
 called prev1 and is:
 
 prev1 - ifelse(ret1 = .5, 1, 0)
 
 where ret1 is the previous day's return.
 
 The variable prev1 is created fine and works in my regression model
 and for running conditional statistics. However, when I call the
 names() function on the dataset the freshly created variable (prev1)
 doesn't show up; also, when I export the dataset the prev1 variable
 doesn't show up in the exported file. Is there a way to make the
 variable show up on both the call function but more importantly on the
 exported file? Or am I forced to create dummy variables elsewhere(much
 tougher)?
 
 
 Thanks,
 
 Joe
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating dummy variables in r

2013-01-29 Thread Bert Gunter
You almost never need dummy variables in R. R creates them
automatically from factors given model and possibly contrasts
specification.

?contrasts  ## for some technical details.

If you have not read An Introduction to R do so now. Pay particular
attention to the chapter on modeling and categorical variables. You
can also google around to find appropriate tutorials. Here is one:

http://www.ats.ucla.edu/stat/r/modules/dummy_vars.htm

I repeat: DO not create dummy variablesby hand in R unless you have
understood the above and have good reason to do so.

-- Bert

On Tue, Jan 29, 2013 at 7:21 PM, Joseph Norman Thomson
thoms...@email.arizona.edu wrote:
 Hello,

 Semi-new r user here and still learning the ropes. I am creating dummy
 variables for a dataset on stock prices in r. One dummy variable is
 called prev1 and is:

 prev1 - ifelse(ret1 = .5, 1, 0)

 where ret1 is the previous day's return.

 The variable prev1 is created fine and works in my regression model
 and for running conditional statistics. However, when I call the
 names() function on the dataset the freshly created variable (prev1)
 doesn't show up; also, when I export the dataset the prev1 variable
 doesn't show up in the exported file. Is there a way to make the
 variable show up on both the call function but more importantly on the
 exported file? Or am I forced to create dummy variables elsewhere(much
 tougher)?


 Thanks,

 Joe

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating Dummy Variables in R

2009-12-16 Thread S Devriese
On 12/16/2009 03:58 PM, whitaker m. (mw1006) wrote:
 Hi,
 I am trying to create a set of dummy variables to use within a multiple 
 linear regression and am unable to find the codes within the manuals.
 
 For example i have:
 Price Weight Clarity
  IF  VVS1VVS2
 5008 1 0  0
 1000  5.2  0 0  1
 8643  01  0
 3402.6  0 0  1
 90  0.5  1 0  0 
 4502.3  0 1  0
 
 Where price is dependent upon weight (single value in each observation) and 
 clarity (split into three levels, IF, VVS1, VVS2).
 I am having trouble telling the program that clarity is a set of 3 dummy 
 variables and keep getting error messages, what is the correct way?
 

Without an example of your code, it's a bit difficult. But it might be
easier to use one variable clarity with three possible values (IF,
VVS1, VVS2), defined as a factor.
lm(Price ~ Weight + Clarity) should then do the trick (unless you
explicitly want to use a different dummy coding than the default)

Stephan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating Dummy Variables in R

2009-12-16 Thread Achim Zeileis

On Wed, 16 Dec 2009, whitaker m. (mw1006) wrote:


Hi,
I am trying to create a set of dummy variables to use within a multiple linear 
regression and am unable to find the codes within the manuals.

For example i have:
Price Weight Clarity
IF  VVS1VVS2
5008 1 0  0
1000  5.2  0 0  1
8643  01  0
3402.6  0 0  1
90  0.5  1 0  0
4502.3  0 1  0

Where price is dependent upon weight (single value in each observation) and 
clarity (split into three levels, IF, VVS1, VVS2).
I am having trouble telling the program that clarity is a set of 3 dummy 
variables and keep getting error messages, what is the correct way?


You should code the categorical variable Clarity as a factor so that R 
knows that this is a categorical variable and can deal with it 
appropriately in subsequent computations such as summary() or lm().


Thus, I would recommend to store your data as

dat - data.frame(
  Price = c(500, 1000, 864, 340, 90, 450),
  Weight = c(8, 5.2, 3, 2.6, 0.5, 2.3),
  Clarity = c(IF, VVS1, VVS2)[c(1, 3, 2, 3, 1, 2)])

which yields, e.g.,

R summary(dat)
 PriceWeight  Clarity
 Min.   :  90.0   Min.   :0.500   IF  :2
 1st Qu.: 367.5   1st Qu.:2.375   VVS1:2
 Median : 475.0   Median :2.800   VVS2:2
 Mean   : 540.7   Mean   :3.600
 3rd Qu.: 773.0   3rd Qu.:4.650
 Max.   :1000.0   Max.   :8.000

and then you can also do

R lm(Price ~ Weight + Clarity, data = dat)

Call:
lm(formula = Price ~ Weight + Clarity, data = dat)

Coefficients:
(Intercept)   Weight  ClarityVVS1  ClarityVVS2
 -45.0580.01   490.02   403.00

or if you wish to choose a different coding

R lm(Price ~ 0 + Weight + Clarity, data = dat)

Call:
lm(formula = Price ~ 0 + Weight + Clarity, data = dat)

Coefficients:
 WeightClarityIF  ClarityVVS1  ClarityVVS2
  80.01   -45.05   444.97   357.95


Some further reading of introductory material on linear regression in R 
would be useful. Also look at ?lm, ?factor, ?model.matrix, ?contrasts etc.


hth,
Z


Any helps is greatly appreciated.
Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating Dummy Variables in R

2009-12-16 Thread Tom Fletcher
Is your variable Clarity a categorical with 4 levels? Thus, the need for
k-1 (3) dummies? Your error may be the result of creating k instead of
k-1 dummies, but can't be sure from the example.

In R, you don't have to (unless you really want to) explicitly create
separate variables. You can use the internal contrast functions. 

See

?contr.treatment

Which is dummy coding by default. You can specify which group is the
reference group. 

Alternatively, if you prefer effects coding, you can see
?contr.sum 

There are others as well. 

Tom Fletcher



-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
On Behalf Of whitaker m. (mw1006)
Sent: Wednesday, December 16, 2009 8:59 AM
To: r-help@r-project.org
Subject: [R] Creating Dummy Variables in R

Hi,
I am trying to create a set of dummy variables to use within a multiple
linear regression and am unable to find the codes within the manuals.

For example i have:
Price Weight Clarity
 IF  VVS1VVS2
5008 1 0  0
1000  5.2  0 0  1
8643  01  0
3402.6  0 0  1
90  0.5  1 0  0 
4502.3  0 1  0

Where price is dependent upon weight (single value in each observation)
and clarity (split into three levels, IF, VVS1, VVS2).
I am having trouble telling the program that clarity is a set of 3 dummy
variables and keep getting error messages, what is the correct way?

Any helps is greatly appreciated.
Matthew

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.