Re: [R] Consistency of Logistic Regression

2010-11-13 Thread Uwe Ligges



On 12.11.2010 20:11, Marc Schwartz wrote:

You are not creating your data set properly.

Your 'mat' is:


mat

column1 column2
11   0
21   0
30   1
40   0
51   1
61   0
71   0
80   1
90   0
10   1   1


What you really want is:

DF- data.frame(y = c(1,0,1,0,0,1,0,0,1,1), x = c(5,4,1,6,3,6,5,3,7,9))



Actually it is in general safer to have a factor y rather than numeric y 
for classification tasks.


Best,
Uwe



DF

y x
1  1 5
2  0 4
3  1 1
4  0 6
5  0 3
6  1 6
7  0 5
8  0 3
9  1 7
10 1 9



MOD- glm(y ~ x, data = DF, family = binomial)



summary(MOD)


Call:
glm(formula = y ~ x, family = binomial, data = DF)

Deviance Residuals:
 Min   1Q   Median   3Q  Max
-1.3353  -1.0229  -0.1239   0.9956   1.7477

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  -1.6118 1.7833  -0.9040.366
x 0.3293 0.3383   0.9730.330

(Dispersion parameter for binomial family taken to be 1)

 Null deviance: 13.863  on 9  degrees of freedom
Residual deviance: 12.767  on 8  degrees of freedom
AIC: 16.767

Number of Fisher Scoring iterations: 4


HTH,

Marc Schwartz


On Nov 12, 2010, at 12:56 PM, Benjamin Godlove wrote:


I think it is likely I am missing something.  Here is a very simple example:

R code:

mat- matrix(nrow = 10, ncol = 2, c(1,0,1,0,0,1,0,0,1,1),
c(5,4,1,6,3,6,5,3,7,9), dimnames = list(c(1,2,3,4,5,6,7,8,9,10),
c(column1,column2)))

g- glm(mat[1:10] ~ mat[11:20], family = binomial (link = logit))

g$converged


SAS code:

data mat;
input col1 col2;
datalines;
1 5
0 4
1 1
0 6
0 3
1 6
0 5
0 3
1 7
1 9
;

proc logistic data=mat descending;
model col1 = col2 / link=logit;
run;

SAS output (in case you don't have access to SAS):
Convergence criterion satisfied

  Estimate   SE
Intercept-1.6118  1.7833
col20.3293  0.3383


Of course, with an example this small, it is not so surprising that the two
methods differ; and they hardly differ by a single S.  But as the datasets
get larger, the difference is more pronounced.  Let me know if you would
like me to send you a large dataset.  I get the feeling I am doing something
wrong in R, so please let me know what you think.

Thank you!

Ben Godlove

On Thu, Nov 11, 2010 at 1:59 PM, Albyn Jonesjo...@reed.edu  wrote:


do you have factors (categorical variables) in the model?  it could be
just a parameterization difference.

albyn

On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:

Dear R developers,

I have noticed a discrepancy between the coefficients returned by R's

glm()

for logistic regression and SAS's PROC LOGISTIC.  I am using dist =

binomial

and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS

uses

Fisher's scoring, but the difference is something like 100 SE on the
intercept.  What accounts for such a huge difference?

Thank you for your time.

Ben Godlove

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Albyn Jones
Reed College
jo...@reed.edu




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-12 Thread Benjamin Godlove
I think it is likely I am missing something.  Here is a very simple example:

R code:

mat - matrix(nrow = 10, ncol = 2, c(1,0,1,0,0,1,0,0,1,1),
c(5,4,1,6,3,6,5,3,7,9), dimnames = list(c(1,2,3,4,5,6,7,8,9,10),
c(column1,column2)))

g - glm(mat[1:10] ~ mat[11:20], family = binomial (link = logit))

g$converged


SAS code:

data mat;
input col1 col2;
datalines;
1 5
0 4
1 1
0 6
0 3
1 6
0 5
0 3
1 7
1 9
;

proc logistic data=mat descending;
model col1 = col2 / link=logit;
run;

SAS output (in case you don't have access to SAS):
Convergence criterion satisfied

  Estimate   SE
Intercept-1.6118  1.7833
col20.3293  0.3383


Of course, with an example this small, it is not so surprising that the two
methods differ; and they hardly differ by a single S.  But as the datasets
get larger, the difference is more pronounced.  Let me know if you would
like me to send you a large dataset.  I get the feeling I am doing something
wrong in R, so please let me know what you think.

Thank you!

Ben Godlove

On Thu, Nov 11, 2010 at 1:59 PM, Albyn Jones jo...@reed.edu wrote:

 do you have factors (categorical variables) in the model?  it could be
 just a parameterization difference.

 albyn

 On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
  Dear R developers,
 
  I have noticed a discrepancy between the coefficients returned by R's
 glm()
  for logistic regression and SAS's PROC LOGISTIC.  I am using dist =
 binomial
  and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS
 uses
  Fisher's scoring, but the difference is something like 100 SE on the
  intercept.  What accounts for such a huge difference?
 
  Thank you for your time.
 
  Ben Godlove
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 Albyn Jones
 Reed College
 jo...@reed.edu



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-12 Thread Marc Schwartz
You are not creating your data set properly.

Your 'mat' is:

 mat
   column1 column2
11   0
21   0
30   1
40   0
51   1
61   0
71   0
80   1
90   0
10   1   1


What you really want is:

DF - data.frame(y = c(1,0,1,0,0,1,0,0,1,1), x = c(5,4,1,6,3,6,5,3,7,9))

 DF
   y x
1  1 5
2  0 4
3  1 1
4  0 6
5  0 3
6  1 6
7  0 5
8  0 3
9  1 7
10 1 9



MOD - glm(y ~ x, data = DF, family = binomial)


 summary(MOD)

Call:
glm(formula = y ~ x, family = binomial, data = DF)

Deviance Residuals: 
Min   1Q   Median   3Q  Max  
-1.3353  -1.0229  -0.1239   0.9956   1.7477  

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)  -1.6118 1.7833  -0.9040.366
x 0.3293 0.3383   0.9730.330

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 13.863  on 9  degrees of freedom
Residual deviance: 12.767  on 8  degrees of freedom
AIC: 16.767

Number of Fisher Scoring iterations: 4


HTH,

Marc Schwartz


On Nov 12, 2010, at 12:56 PM, Benjamin Godlove wrote:

 I think it is likely I am missing something.  Here is a very simple example:
 
 R code:
 
 mat - matrix(nrow = 10, ncol = 2, c(1,0,1,0,0,1,0,0,1,1),
 c(5,4,1,6,3,6,5,3,7,9), dimnames = list(c(1,2,3,4,5,6,7,8,9,10),
 c(column1,column2)))
 
 g - glm(mat[1:10] ~ mat[11:20], family = binomial (link = logit))
 
 g$converged
 
 
 SAS code:
 
 data mat;
 input col1 col2;
 datalines;
 1 5
 0 4
 1 1
 0 6
 0 3
 1 6
 0 5
 0 3
 1 7
 1 9
 ;
 
 proc logistic data=mat descending;
 model col1 = col2 / link=logit;
 run;
 
 SAS output (in case you don't have access to SAS):
 Convergence criterion satisfied
 
  Estimate   SE
 Intercept-1.6118  1.7833
 col20.3293  0.3383
 
 
 Of course, with an example this small, it is not so surprising that the two
 methods differ; and they hardly differ by a single S.  But as the datasets
 get larger, the difference is more pronounced.  Let me know if you would
 like me to send you a large dataset.  I get the feeling I am doing something
 wrong in R, so please let me know what you think.
 
 Thank you!
 
 Ben Godlove
 
 On Thu, Nov 11, 2010 at 1:59 PM, Albyn Jones jo...@reed.edu wrote:
 
 do you have factors (categorical variables) in the model?  it could be
 just a parameterization difference.
 
 albyn
 
 On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
 Dear R developers,
 
 I have noticed a discrepancy between the coefficients returned by R's
 glm()
 for logistic regression and SAS's PROC LOGISTIC.  I am using dist =
 binomial
 and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS
 uses
 Fisher's scoring, but the difference is something like 100 SE on the
 intercept.  What accounts for such a huge difference?
 
 Thank you for your time.
 
 Ben Godlove
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 Albyn Jones
 Reed College
 jo...@reed.edu
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Erik Iverson

Is the algorithm converging? Is there separation (i.e.,
perfect predictor) in the model?
Are you getting a warning about fitted probabilities of
0 or 1?, etc.

We would need much more information (preferably a reproducible
example) before we can help.

Benjamin Godlove wrote:

Dear R developers,

I have noticed a discrepancy between the coefficients returned by R's glm()
for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
Fisher's scoring, but the difference is something like 100 SE on the
intercept.  What accounts for such a huge difference?

Thank you for your time.

Ben Godlove

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Albyn Jones
do you have factors (categorical variables) in the model?  it could be
just a parameterization difference.

albyn

On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
 Dear R developers,
 
 I have noticed a discrepancy between the coefficients returned by R's glm()
 for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
 and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
 Fisher's scoring, but the difference is something like 100 SE on the
 intercept.  What accounts for such a huge difference?
 
 Thank you for your time.
 
 Ben Godlove
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Albyn Jones
Reed College
jo...@reed.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Consistency of Logistic Regression

2010-11-11 Thread Ben Bolker
Albyn Jones jones at reed.edu writes:

 
 do you have factors (categorical variables) in the model?  it could be
 just a parameterization difference.
 
 albyn
 
 On Thu, Nov 11, 2010 at 12:41:03PM -0500, Benjamin Godlove wrote:
  Dear R developers,
  
  I have noticed a discrepancy between the coefficients returned by R's glm()
  for logistic regression and SAS's PROC LOGISTIC.  I am using dist = binomial
  and link = logit for both R and SAS.  I believe R uses IRLS whereas SAS uses
  Fisher's scoring, but the difference is something like 100 SE on the
  intercept.  What accounts for such a huge difference?


  As previous posters said.  Specifically:

* a huge change in the intercept is very unlikely to be caused by
a change in underlying algorithm unless the data are pathological
(separation, convergence issues, etc.)

* R's default 'intercept' is the value of the first factor
level; SAS's is the value of the last factor level. See ?contr.SAS ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.