Re: [R] Logistic regression for large data

2022-11-14 Thread Bill Dunlap
summary(Base)

would show if one of columns of Base was read as character data instead of
the expected numeric.  That could cause an explosion in the number of dummy
variables, hence a huge design matrix.

-Bill


On Fri, Nov 11, 2022 at 11:30 PM George Brida 
wrote:

> Dear R users,
>
> I have a database  called Base.csv   (attached to this email) which
> contains 13 columns and 8257 rows and whose the first 8 columns are dummy
> variables which take 1 or 0. The problem is when I wrote the following
> instructions to do a logistic regression , R runs for hours and hours
> without giving an output:
>
> Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";")
>
> fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit"))
>
> Apparently, there is not enough memory to have the requested output. Is
> there any other function for logistic regression that handle large data and
> return output in reasonable time.
>
> Many thanks
>
> Kind regards
>
> George
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression for large data

2022-11-12 Thread Ebert,Timothy Aaron
Hi George,
   I did not get an attachment.
   My first step would be to try simplifying things. Do all of these work?

fit_1=glm(Base[,2]~Base[,1],family=binomial(link="logit"))
fit_1=glm(Base[,2]~Base[,10],family=binomial(link="logit"))
fit_1=glm(Base[,2]~Base[,11],family=binomial(link="logit"))
fit_1=glm(Base[,2]~Base[,12],family=binomial(link="logit"))
fit_1=glm(Base[,2]~Base[,13],family=binomial(link="logit"))

This is not a large dataset. That said, if your computer is nearly out of 
memory, even a small dataset might be too much. It might have plenty of 
physical memory, but also lots of (open files, cookies, applications, other 
stuff) that eat memory.

Regards,
Tim



-Original Message-
From: R-help  On Behalf Of George Brida
Sent: Friday, November 11, 2022 4:17 PM
To: r-help@r-project.org
Subject: [R] Logistic regression for large data

[External Email]

Dear R users,

I have a database  called Base.csv   (attached to this email) which
contains 13 columns and 8257 rows and whose the first 8 columns are dummy 
variables which take 1 or 0. The problem is when I wrote the following 
instructions to do a logistic regression , R runs for hours and hours without 
giving an output:

Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";")
fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit"))

Apparently, there is not enough memory to have the requested output. Is there 
any other function for logistic regression that handle large data and return 
output in reasonable time.

Many thanks

Kind regards

George
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7Cb0af80b8620648fcc1ab08dac47fb19e%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638038350110240752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7Csdata=oxeFwGCpH%2B9Ha%2BDFaWRygEcvOJ2O6AngSKNhMwE%2FczI%3Dreserved=0
PLEASE do read the posting guide 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7Cb0af80b8620648fcc1ab08dac47fb19e%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638038350110240752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7Csdata=FtLW709kbnkMLzylkRRtR1Y%2Fw5oehodb0dmS8DqwGig%3Dreserved=0
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression for large data

2022-11-11 Thread David Winsemius
That’s not a large data set. Something else besides memory limits is going on. 
You should post output of summary(Base). 

— 
David
Sent from my iPhone

> On Nov 11, 2022, at 11:29 PM, George Brida  wrote:
> 
> Dear R users,
> 
> I have a database  called Base.csv   (attached to this email) which
> contains 13 columns and 8257 rows and whose the first 8 columns are dummy
> variables which take 1 or 0. The problem is when I wrote the following
> instructions to do a logistic regression , R runs for hours and hours
> without giving an output:
> 
> Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";")
> fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit"))
> 
> Apparently, there is not enough memory to have the requested output. Is
> there any other function for logistic regression that handle large data and
> return output in reasonable time.
> 
> Many thanks
> 
> Kind regards
> 
> George
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression for large data

2022-11-11 Thread George Brida
Dear R users,

I have a database  called Base.csv   (attached to this email) which
contains 13 columns and 8257 rows and whose the first 8 columns are dummy
variables which take 1 or 0. The problem is when I wrote the following
instructions to do a logistic regression , R runs for hours and hours
without giving an output:

Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";")
fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit"))

Apparently, there is not enough memory to have the requested output. Is
there any other function for logistic regression that handle large data and
return output in reasonable time.

Many thanks

Kind regards

George
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with Panel Data

2019-04-23 Thread Marc Schwartz via R-help


> On Apr 23, 2019, at 8:26 AM, Paul Bernal  wrote:
> 
> Dear friends, hope you are all doing great,
> 
> I would like to know if there is any R package that allows fitting of
> logistic regression to panel data.
> 
> I installed and loaded package plm, but from what I have read so far, plm
> only allows fitting of linear regression to panel data, not logistic.
> 
> Any help and/or guidance will be greatly appreciated,
> 
> Best regards,
> 
> Paul


Hi Paul,

You might look at the pglm package on CRAN.

Since there can be multiple methodologies that are apropos in this setting, 
such as GEE, mixed-effects, robust covariance matrix, etc., you might want to 
look at the CRAN task view that covers this domain:

  https://cran.r-project.org/web/views/Econometrics.html

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression with Panel Data

2019-04-23 Thread Paul Bernal
Dear friends, hope you are all doing great,

I would like to know if there is any R package that allows fitting of
logistic regression to panel data.

I installed and loaded package plm, but from what I have read so far, plm
only allows fitting of linear regression to panel data, not logistic.

Any help and/or guidance will be greatly appreciated,

Best regards,

Paul

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Achim Zeileis

On Fri, 1 Jul 2016, Faradj Koliev wrote:


Dear Achim Zeileis, 
Many thanks for your quick and informative answer. 

I?m sure that the vcovCL should work, however, I experience some problems. 


> coeftest(model, vcov=vcovCL(model, cluster=mydata$ID))

First I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : 
  length of 'cluster' does not match number of observations


This probably means that not all rows of "mydata" have been used for the 
estimation of the model, e.g., due to missing values or something like 
that. Hence, the mismatch seems to occur.



After checking the observations I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found
Called from: vcovCL(model, cluster = mydata$ID)
Browse[1]> 


The variable "tets" is presumably one of the regressors in your "model" 
and apparently it cannot be found when extracting the model scores. Maybe 
you haven't stored the model frame and deleted the data.


Hard to say without a simple reproducible example.
Z


What can I do to fix it? What am I doing wrong now? 





  1 jul 2016 kl. 14:57 skrev Achim Zeileis
  :

On Fri, 1 Jul 2016, Faradj Koliev wrote:

  Dear all,

  I use ?polr? command (library: MASS) to estimate an
  ordered logistic regression.

  My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2
  ,data=mydata, Hess = TRUE))

  But how do I get robust clustered standard errors?
  I??ve tried coeftest(resA, vcov=vcovHC(resA,
  cluster=lipton$ID))


The vcovHC() function currently does not (yet) have a "cluster"
argument. We are working on it but it's not finished yet.

As an alternative I include the vcovCL() function below that computes
the usual simple clustered sandwich estimator. This can be applied to
"polr" objects and plugged into coeftest(). So

coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))

should work.

  and summary(a <- robcov(model,mydata$ID)).


The robcov() function does in principle what you want by I'm not sure
whether it works with polr(). But for sure it works with lrm() from
the "rms" package.

Hope that helps,
Z

vcovCL <- function(object, cluster = NULL, adjust = NULL)
{
 stopifnot(require("sandwich"))

 ## cluster specification
 if(is.null(cluster)) cluster <- attr(object, "cluster")
 if(is.null(cluster)) stop("no 'cluster' specification found")
 cluster <- factor(cluster)

 ## estimating functions and dimensions
 ef <- estfun(object)
 n <- NROW(ef)
 k <- NCOL(ef)
 if(n != length(cluster))
   stop("length of 'cluster' does not match number of observations")
 m <- length(levels(cluster))

 ## aggregate estimating functions by cluster and compute meat
 ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
   drop = FALSE]))
 ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
 mt <- crossprod(ef)/n

 ## bread
 br <- try(bread(object), silent = TRUE)
 if(inherits(br, "try-error")) br <- vcov(object) * n

 ## put together sandwich
 vc <- 1/n * (br %*% mt %*% br)

 ## adjustment
 if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
 adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)

 ## return
 return(adj * vc)
}


  Neither works for me. So I wonder what am I doing wrong
  here?


  All suggestions are welcome ? thank you!
  [[alternative HTML version deleted]]

  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and
  more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
  reproducible code.





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Faradj Koliev
Dear Achim Zeileis, 

Many thanks for your quick and informative answer. 

I’m sure that the vcovCL should work, however, I experience some problems. 


> coeftest(model, vcov=vcovCL(model, cluster=mydata$ID))

First I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : 
  length of 'cluster' does not match number of observations

After checking the observations I got this error: 

Error in vcovCL(model, cluster = mydata$ID) : object 'tets' not found
Called from: vcovCL(model, cluster = mydata$ID)
Browse[1]> 

What can I do to fix it? What am I doing wrong now? 





> 1 jul 2016 kl. 14:57 skrev Achim Zeileis :
> 
> On Fri, 1 Jul 2016, Faradj Koliev wrote:
> 
>> Dear all, 
>> 
>> I use ?polr? command (library: MASS) to estimate an ordered logistic 
>> regression.
>> 
>> My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
>> TRUE))
>> 
>> But how do I get robust clustered standard errors? 
>> I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID))
> 
> The vcovHC() function currently does not (yet) have a "cluster" argument. We 
> are working on it but it's not finished yet.
> 
> As an alternative I include the vcovCL() function below that computes the 
> usual simple clustered sandwich estimator. This can be applied to "polr" 
> objects and plugged into coeftest(). So
> 
> coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))
> 
> should work.
> 
>> and summary(a <- robcov(model,mydata$ID)).
> 
> The robcov() function does in principle what you want by I'm not sure whether 
> it works with polr(). But for sure it works with lrm() from the "rms" package.
> 
> Hope that helps,
> Z
> 
> vcovCL <- function(object, cluster = NULL, adjust = NULL)
> {
>  stopifnot(require("sandwich"))
> 
>  ## cluster specification
>  if(is.null(cluster)) cluster <- attr(object, "cluster")
>  if(is.null(cluster)) stop("no 'cluster' specification found")
>  cluster <- factor(cluster)
> 
>  ## estimating functions and dimensions
>  ef <- estfun(object)
>  n <- NROW(ef)
>  k <- NCOL(ef)
>  if(n != length(cluster))
>stop("length of 'cluster' does not match number of observations")
>  m <- length(levels(cluster))
> 
>  ## aggregate estimating functions by cluster and compute meat
>  ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
>drop = FALSE]))
>  ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
>  mt <- crossprod(ef)/n
> 
>  ## bread
>  br <- try(bread(object), silent = TRUE)
>  if(inherits(br, "try-error")) br <- vcov(object) * n
> 
>  ## put together sandwich
>  vc <- 1/n * (br %*% mt %*% br)
> 
>  ## adjustment
>  if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
>  adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)
> 
>  ## return
>  return(adj * vc)
> }
> 
> 
>> Neither works for me. So I wonder what am I doing wrong here?
>> 
>> 
>> All suggestions are welcome ? thank you!
>>  [[alternative HTML version deleted]]
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logistic regression and robust standard errors

2016-07-01 Thread Achim Zeileis

On Fri, 1 Jul 2016, Faradj Koliev wrote:

Dear all, 



I use ?polr? command (library: MASS) to estimate an ordered logistic regression.

My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
TRUE))

But how do I get robust clustered standard errors? 


I??ve tried coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID))


The vcovHC() function currently does not (yet) have a "cluster" argument. 
We are working on it but it's not finished yet.


As an alternative I include the vcovCL() function below that computes the 
usual simple clustered sandwich estimator. This can be applied to "polr" 
objects and plugged into coeftest(). So


coeftest(resA, vcov=vcovCL(resA, cluster=lipton$ID))

should work.


and summary(a <- robcov(model,mydata$ID)).


The robcov() function does in principle what you want by I'm not sure 
whether it works with polr(). But for sure it works with lrm() from the 
"rms" package.


Hope that helps,
Z

vcovCL <- function(object, cluster = NULL, adjust = NULL)
{
  stopifnot(require("sandwich"))

  ## cluster specification
  if(is.null(cluster)) cluster <- attr(object, "cluster")
  if(is.null(cluster)) stop("no 'cluster' specification found")
  cluster <- factor(cluster)

  ## estimating functions and dimensions
  ef <- estfun(object)
  n <- NROW(ef)
  k <- NCOL(ef)
  if(n != length(cluster))
stop("length of 'cluster' does not match number of observations")
  m <- length(levels(cluster))

  ## aggregate estimating functions by cluster and compute meat
  ef <- sapply(levels(cluster), function(i) colSums(ef[cluster == i, ,
drop = FALSE]))
  ef <- if(NCOL(ef) > 1L) t(ef) else matrix(ef, ncol = 1L)
  mt <- crossprod(ef)/n

  ## bread
  br <- try(bread(object), silent = TRUE)
  if(inherits(br, "try-error")) br <- vcov(object) * n

  ## put together sandwich
  vc <- 1/n * (br %*% mt %*% br)

  ## adjustment
  if(is.null(adjust)) adjust <- class(object)[1L] == "lm"
  adj <- if(adjust) m/(m - 1L) * (n - 1L)/(n - k) else m/(m - 1L)

  ## return
  return(adj * vc)
}



Neither works for me. So I wonder what am I doing wrong here?


All suggestions are welcome ? thank you!
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression and robust standard errors

2016-07-01 Thread Faradj Koliev
Dear all, 


I use ”polr” command (library: MASS) to estimate an ordered logistic regression.

My model:   summary( model<- polr(y ~ x1+x2+x3+x4+x1*x2 ,data=mydata, Hess = 
TRUE))

But how do I get robust clustered standard errors? 

I’’ve tried   coeftest(resA, vcov=vcovHC(resA, cluster=lipton$ID)) and 
summary(a <- robcov(model,mydata$ID)). Neither works for me. So I wonder what 
am I doing wrong here? 


All suggestions are welcome – thank you! 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] logistic regression with package 'mice'

2016-04-10 Thread Antonello Preti
Dear all, I request your help to solve a problem I've encountered in using
'mice' for multiple imputation.
I want to apply a logistic regression model.
I need to extract information on the fit of the model.
Is there any way to calculate a likelihood ratio or the McFadden-pseudoR2
from the results of the logistic model?
I mean, as it is possible to extract pooled averaging and odds ratio...

Thank you in advance,
Antonello

Here an example of logistic regression on imputed data:


library(mice)

imp <- mice(nhanes)

# logistic regression on the imputed data

fit <- glm.mids((hyp==2)~bmi+chl, data=imp, family = binomial)

summary(fit)

summary(pool(fit)) ### pool averaging across all imputed dataset

summary(pool(fit, method = "rubin1987"))### pool across all imputed
dataset

### odds ratio

su <- summary(pool(fit, method = "rubin1987"))[,c(1,6,7)]

stime <- data.frame(exp(su))

names(stime) <- c("OR", "95% low", "95% high")

options(scipen=999)
stime
options(scipen=1)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression output baseline (reference) category

2016-03-26 Thread David Winsemius

> On Mar 25, 2016, at 10:19 PM, Michael Artz  wrote:
> 
> Hi,
>  I have now read an introductory text on regression and I think I do
> understand what the intercept is doing.  However, my original question is
> still unanswered.  I understand that the intercept term is the constant
> that each other term is measured against.  I think baseline is a good word
> for it.  However, it does not represent any one of the x variables by
> itself.

It represents all of the X variables at their reference levels. There are no 
individual intercepts on a variable-by-variable basis. You accepted the notion 
of "baseline" You cannot parcel out single variable intercepts. What would they 
actually mean anyhow?



>  Is there a way in R, to extrapolate the individual x variable
> intercepts from the equation somehow.

If you describe the process by which that could be calculated, we might have 
basis for discussion, but as it is I think you still need to be studying the 
theory more. I don't intend any further resspones on R-help where these 
questions are off-topic, so you should direct any further questions to 
stats.stackexchange.com

> 
> 
> On Tue, Mar 15, 2016 at 8:26 PM, David Winsemius 
> wrote:
> 
>> 
>>> On Mar 15, 2016, at 1:27 PM, Michael Artz 
>> wrote:
>>> 
>>> Hi,
>>>  I am trying to use the summary from the glm function as a data source.
>> I
>>> am using the call sink() then
>>> summary(logisticRegModel)$coefficients then sink().
>> 
>> Since it's a matrix you may need to locate a function that write matrices
>> to files. I seem to remember that the MASS package has one.
>> 
>>> The independent
>>> variables are categorical and thus there is always a baseline value for
>>> every category that is omitted from the glm output.
>> 
>> Well, it's not really omitted, so much as shared among all variables. For
>> further reading in the halp pages consult:
>> 
>> ?model.matrix
>> ?contrasts
>> ?contr.treatment
>> 
>> But you probably need to supplement that with an introductory text that
>> covers R regression defaults.
>> 
>>> I am interested in how
>>> to get the Z column for all of the categorical values.
>> 
>> The Z column? You meant the "z value" column. Again, since it's a matrix
>> you need to use column indexing with "["
>> 
>> summary(logisticRegModel)$coefficients[  , "z value"]
>> 
>> Read up on the summary function for glm objects at:
>> 
>> ?summary.glm
>> 
>> 
>>> I don't see any row
>>> for the reference category.
>> 
>> What do you imagine the (Intercept) row to be doing? If you are having
>> difficulty understanding this (which is not really an R-specific issue)
>> there are probably already several explanations to similar questions on:
>> 
>> http://stats.stackexchange.com/
>> 
>> 
>>> 
>>> How can I get this Z value in the output?
>> 
>> Asked and answered.
>> 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression output baseline (reference) category

2016-03-25 Thread Michael Artz
Hi,
  I have now read an introductory text on regression and I think I do
understand what the intercept is doing.  However, my original question is
still unanswered.  I understand that the intercept term is the constant
that each other term is measured against.  I think baseline is a good word
for it.  However, it does not represent any one of the x variables by
itself.  Is there a way in R, to extrapolate the individual x variable
intercepts from the equation somehow.


On Tue, Mar 15, 2016 at 8:26 PM, David Winsemius 
wrote:

>
> > On Mar 15, 2016, at 1:27 PM, Michael Artz 
> wrote:
> >
> > Hi,
> >   I am trying to use the summary from the glm function as a data source.
> I
> > am using the call sink() then
> > summary(logisticRegModel)$coefficients then sink().
>
> Since it's a matrix you may need to locate a function that write matrices
> to files. I seem to remember that the MASS package has one.
>
> >  The independent
> > variables are categorical and thus there is always a baseline value for
> > every category that is omitted from the glm output.
>
> Well, it's not really omitted, so much as shared among all variables. For
> further reading in the halp pages consult:
>
> ?model.matrix
> ?contrasts
> ?contr.treatment
>
> But you probably need to supplement that with an introductory text that
> covers R regression defaults.
>
> >  I am interested in how
> > to get the Z column for all of the categorical values.
>
> The Z column? You meant the "z value" column. Again, since it's a matrix
> you need to use column indexing with "["
>
> summary(logisticRegModel)$coefficients[  , "z value"]
>
> Read up on the summary function for glm objects at:
>
> ?summary.glm
>
>
> >  I don't see any row
> > for the reference category.
>
> What do you imagine the (Intercept) row to be doing? If you are having
> difficulty understanding this (which is not really an R-specific issue)
> there are probably already several explanations to similar questions on:
>
> http://stats.stackexchange.com/
>
>
> >
> > How can I get this Z value in the output?
>
> Asked and answered.
>
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression output baseline (reference) category

2016-03-15 Thread David Winsemius

> On Mar 15, 2016, at 1:27 PM, Michael Artz  wrote:
> 
> Hi,
>   I am trying to use the summary from the glm function as a data source. I
> am using the call sink() then
> summary(logisticRegModel)$coefficients then sink().

Since it's a matrix you may need to locate a function that write matrices to 
files. I seem to remember that the MASS package has one. 

>  The independent
> variables are categorical and thus there is always a baseline value for
> every category that is omitted from the glm output.

Well, it's not really omitted, so much as shared among all variables. For 
further reading in the halp pages consult:

?model.matrix
?contrasts
?contr.treatment

But you probably need to supplement that with an introductory text that covers 
R regression defaults.

>  I am interested in how
> to get the Z column for all of the categorical values.

The Z column? You meant the "z value" column. Again, since it's a matrix you 
need to use column indexing with "["

summary(logisticRegModel)$coefficients[  , "z value"]

Read up on the summary function for glm objects at:

?summary.glm


>  I don't see any row
> for the reference category.

What do you imagine the (Intercept) row to be doing? If you are having 
difficulty understanding this (which is not really an R-specific issue) there 
are probably already several explanations to similar questions on:

http://stats.stackexchange.com/

 
> 
> How can I get this Z value in the output?

Asked and answered.

> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression output baseline (reference) category

2016-03-15 Thread Bert Gunter
The reference category is aliased with the constant term in the
default contr.treatment contrasts.

See ?contr.treatment , ?C, ?contrasts

If you don't know what this means, you should probably consult a local
statistical resource or ask about linear model contrasts at a
statistical help website like stats.stackexchange.com. This list is
for R programming questions.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Mar 15, 2016 at 1:27 PM, Michael Artz  wrote:
> Hi,
>I am trying to use the summary from the glm function as a data source. I
> am using the call sink() then
> summary(logisticRegModel)$coefficients then sink().  The independent
> variables are categorical and thus there is always a baseline value for
> every category that is omitted from the glm output.  I am interested in how
> to get the Z column for all of the categorical values.  I don't see any row
> for the reference category.  How can I get this Z value in the output?
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression output baseline (reference) category

2016-03-15 Thread Michael Artz
Hi,
   I am trying to use the summary from the glm function as a data source. I
am using the call sink() then
summary(logisticRegModel)$coefficients then sink().  The independent
variables are categorical and thus there is always a baseline value for
every category that is omitted from the glm output.  I am interested in how
to get the Z column for all of the categorical values.  I don't see any row
for the reference category.  How can I get this Z value in the output?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2016-01-25 Thread Greg Snow
Do you have the sample sizes that the sample proportions were computed
from (e.g. 0.5 could be 1 out of 2 or 100 out of 200)?

If you do then you can specify the model with the proportions as the y
variable and the corresponding sample sizes as the weights argument to
glm.

If you only have proportions without an integer sample size then you
may want to switch to using beta regression instead of logistic
regression.

On Sat, Jan 23, 2016 at 1:41 PM, pari hesabi  wrote:
> Hello everybody,
>
> I am trying to fit a logistic regression model by using glm() function in R. 
> My response variable is a sample proportion NOT binary numbers(0,1).
>
> Regarding glm() function, I receive this error:  non integer # successes in a 
> binomial glm!
>
> I would appreciate if anybody conducts me.
>
>
> Regards,
>
> Pari
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2016-01-25 Thread Wensui Liu
But beta can only be used to model the open interval between zero and one

On Monday, January 25, 2016, Greg Snow <538...@gmail.com> wrote:

> Do you have the sample sizes that the sample proportions were computed
> from (e.g. 0.5 could be 1 out of 2 or 100 out of 200)?
>
> If you do then you can specify the model with the proportions as the y
> variable and the corresponding sample sizes as the weights argument to
> glm.
>
> If you only have proportions without an integer sample size then you
> may want to switch to using beta regression instead of logistic
> regression.
>
> On Sat, Jan 23, 2016 at 1:41 PM, pari hesabi  > wrote:
> > Hello everybody,
> >
> > I am trying to fit a logistic regression model by using glm() function
> in R. My response variable is a sample proportion NOT binary numbers(0,1).
> >
> > Regarding glm() function, I receive this error:  non integer # successes
> in a binomial glm!
> >
> > I would appreciate if anybody conducts me.
> >
> >
> > Regards,
> >
> > Pari
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538...@gmail.com 
>
> __
> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
https://statcompute.wordpress.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2016-01-23 Thread John C Frain
Alternatively you might use log(p/1-p) as your dependent variable and use
OLS with robust standard errors. Much of your inference would be analogous
to a logistic regression

John C Frain
3 Aranleigh Park
Rathfarnham
Dublin 14
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:fra...@tcd.ie
mailto:fra...@gmail.com

On 23 January 2016 at 20:46, David Winsemius  wrote:

>
> > On Jan 23, 2016, at 12:41 PM, pari hesabi 
> wrote:
> >
> > Hello everybody,
> >
> > I am trying to fit a logistic regression model by using glm() function
> in R. My response variable is a sample proportion NOT binary numbers(0,1).
>
> So multiply the sample proportions (and 1-proportions) by the number of
> samples, round to integers, you will have an appropriate response variable
> and complements, and you can fit a binomial model.
>
> >
> > Regarding glm() function, I receive this error:  non integer # successes
> in a binomial glm!
> >
> > I would appreciate if anybody conducts me.
> >
> >
> > Regards,
> >
> > Pari
> >
> >   [[alternative HTML version deleted]]
> --
>
> David Winsemius
> Alameda, CA, USA
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2016-01-23 Thread Wensui Liu
with glm(), you might try the quasi binomial family

On Saturday, January 23, 2016, pari hesabi  wrote:

> Hello everybody,
>
> I am trying to fit a logistic regression model by using glm() function in
> R. My response variable is a sample proportion NOT binary numbers(0,1).
>
> Regarding glm() function, I receive this error:  non integer # successes
> in a binomial glm!
>
> I would appreciate if anybody conducts me.
>
>
> Regards,
>
> Pari
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org  mailing list -- To UNSUBSCRIBE and
> more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
WenSui Liu
https://statcompute.wordpress.com/

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2016-01-23 Thread David Winsemius

> On Jan 23, 2016, at 12:41 PM, pari hesabi  wrote:
> 
> Hello everybody,
> 
> I am trying to fit a logistic regression model by using glm() function in R. 
> My response variable is a sample proportion NOT binary numbers(0,1).

So multiply the sample proportions (and 1-proportions) by the number of 
samples, round to integers, you will have an appropriate response variable and 
complements, and you can fit a binomial model.

> 
> Regarding glm() function, I receive this error:  non integer # successes in a 
> binomial glm!
> 
> I would appreciate if anybody conducts me.
> 
> 
> Regards,
> 
> Pari
> 
>   [[alternative HTML version deleted]]
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression R and Stata grouping variable

2015-05-27 Thread Therneau, Terry M., Ph.D.
You were not completely clear, but it appears that you have data where each subject has 
results from 8 trials, as a pair of variables is changed.  If that is correct, then you 
want to have a variance that corrects for the repeated measures.  In R the glm command 
handles the simple case but not the repeated measures one.  Statisticially you can use a 
generalized estimating equations approach (package gee) or a random effect per subject 
approach (lme or lmer package).


Terry T.


On 05/27/2015 05:00 AM, r-help-requ...@r-project.org wrote:

I mostly use Stata 13 for my regression analysis. I want to conduct a logistic 
regression on a proportion/number of success. Because I receive errors in Stata 
I did not expect nor understand (if there are Stata experts who want to know 
more about the problems I face and can potentially help me solve them, I would 
be glad to give more details), I want to repeat the analysis in R. In Stata I 
would use the command: xtlogit DEP_PROP INDEP_A INDEP_B INDEP_C, i(ID). ID is 
the identifier for each subject. There are eight lines with data for each 
subject because there are three within factors (INDEP_A, B, C) with two levels 
each (0 and 1). I can repeat this analysis in R by using the command: 
glm(DEP_SUC ~ INDEP_A + INDEP_B + INDEP_C, family = ?binomial?). DEP_SUC is 
here a table with the successes and misses per row. Again, there are eight rows 
for each subject. But while I know how to group these lines in Stata by using 
the option i(ID ), I do not know what to do in R. I have se!

ar!

  ch for more information about the i() command, but did not find any usefull 
information.

So, to summarize: I want to find out how three variables (binary) influence a 
proportion and use logistic regression. In Stata I can group multiple lines per 
subject using the i( ) command in logistic regression. What is the equivalent 
in R?

Thank you in advance!


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression R and Stata – grouping variable

2015-05-26 Thread Lambert Bruno
Hello,

I mostly use Stata 13 for my regression analysis. I want to conduct a logistic 
regression on a proportion/number of success. Because I receive errors in Stata 
I did not expect nor understand (if there are Stata experts who want to know 
more about the problems I face and can potentially help me solve them, I would 
be glad to give more details), I want to repeat the analysis in R. In Stata I 
would use the command: xtlogit DEP_PROP INDEP_A INDEP_B INDEP_C, i(ID). ID is 
the identifier for each subject. There are eight lines with data for each 
subject because there are three within factors (INDEP_A, B, C) with two levels 
each (0 and 1). I can repeat this analysis in R by using the command: 
glm(DEP_SUC ~ INDEP_A + INDEP_B + INDEP_C, family = �binomial�). DEP_SUC is 
here a table with the successes and misses per row. Again, there are eight rows 
for each subject. But while I know how to group these lines in Stata by using 
the option i(ID ), I do not know what to do in R. I have search for more 
information about the i() command, but did not find any usefull information.

So, to summarize: I want to find out how three variables (binary) influence a 
proportion and use logistic regression. In Stata I can group multiple lines per 
subject using the i( ) command in logistic regression. What is the equivalent 
in R?

Thank you in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logistic Regression Results per Condition

2014-10-03 Thread Eve Legrand
Hi everyone!
   I conducted a study for which I conducted logistic regressions (and it
works), but now I'd like to have the results per condition, and I failed to
discover how to have them. I explain myself:
   In conduted a study in which participants can realize one behavior
(coded 1 if realized) or another one (coded 0 if realized) (and errors
are coded 2). This DV is called Rep2. I also had one IV called
TypeInt. This IV has two modalities: participants can be in the
OnlyIntention condition or in the ImplementationIntention condition. We
registered participants behaviors 64 times (we had 64 repeated measures),
this variable is called NumEssai, and 40 participants participated (this
variable is called Sujet).
   So I realized logistic regression for panel data (so with a cluster
correction) on this variables thanks to the lrm and robcov functions, and I
obtained results which are (normally) the good ones. So I obtained the
interactions between all these variables. Here is my script:

blibrary(rms)
PunSon - read.csv2(C:/Users/Eve/Desktop/Stats/Stats.csv)

NumEssai - PunSon$NumEssai
Rep2 - PunSon$Rep2
TypeInt - PunSon$TypeInt
Sujet - PunSon$Sujet

f - lrm(Rep2 ~ NumEssai * TypeInt, data = PunSon[PunSon$Rep2 != 2, ],
x=TRUE, y=TRUE)
g - robcov(f, cluster=PunSon[PunSon$Rep2 != 2, ]$Sujet)
g/b

  But now, I'd like to have the results per condition (for example, have
the results in the onlyintention condition). I tried some things like
using the factor function (I join the script I writed below), but it
doesn't work... Does someone knows how to have results per condition?

blibrary(rms)
PunSon - read.csv2(C:/Users/Eve/Desktop/Stats/Stats.csv)

NumEssai - PunSon$NumEssai
Rep2 - PunSon$Rep2
TypeInt - factor(PunSon$TypeInt)
Sujet - PunSon$Sujet

f - lrm(Rep2 ~ NumEssai * TypeInt, data = PunSon[PunSon$Rep2 != 2, ],
x=TRUE, y=TRUE)
g - robcov(f, cluster=PunSon[PunSon$Rep2 != 2, ]$Sujet)
g/b

Thanks by advance for any help!
Eve

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression for data with repeated measures

2014-07-01 Thread Suzon Sepp
Hi,

It seems that I'm quite lost in this wide and powerful R's universe, so I
permit myself to ask your help about issues with which I'm struggling.
Thank you,

I would like to know if the answer’s accuracy (correct = 1; incorrect = 0)
varies depending on 2 categorical variables which are the group (A and B)
and the condition (a, b and c) knowing that I’ve got n subjects and 12
trials by conditions for each subject (i.e. 12 repetitions).

To do that, I’m focusing on logistic regression analysis. I’ve got no
problem with this kind of analysis until now (logistic regression with
numeric predictor variables and/or categorical predictor with 2 levels
only) but, in this new context, I think I have to focus more specifically
on logistic regression including *nested (or random?) factors* in a*repeated
measures design* (because of the variables “Subject” and “Trial”) with a
categorical predictor variable with *more than 2 levels* (the variable
“Condition”) and I never did such a thing…yet.

mydata =
mydata$Subject: Factor w/38 levels: i01, i02, i03, i04
mydata$Group: Factor w/2 levels: A, B
mydata$Condition: Factor w/3 levels: a, b, c
mydata$Trial: Factor w/12 levels: t01, t02, ...t12
mydata$Accuracy: Factor w/2 levels: 0, 1

Subject  Group  Trial  Condition  Accuracy
   i01  A   t01 a   0
   i01  A   t02 a   1
...
   i01  A   t12 a   1
   i01  A   t01 b   1
   i01  A   t02 b   1
...
   i01  A   t12 b   0
   i01  A   t01 c   0
   i01  A   t02 c   1
...
   i01  A   t12 c   1
   i02  B   t01 a   1
...

First, I’m wondering if I have to calculate a % of accuracy for each
subject and each condition and thus “remove” the variable “Trial” but
“lose” data (power?) in the same time… or to take into account this
variable in the analysis and in this case, how to do that?

Second, I don’t know which function I’ve to choose (lmer, glm, glmer…)?

Third, I’m not sure I proceed correctly to specify in this analysis that
the responses all come from the same subject: within-subject design =
…+(1|Subject) as I can do for a repeated measures ANOVA to analyze the
effect of my different variables on a numeric one such as the response
time: 
test=aov(Int~Group*Condition+*Error(Subject/(Group*Condition)*),data=mydata)
and here again how can I add the variable Trial if I don't work on an
average reaction time for each subject in the different conditions?

Below, examples of models I can write with glmer(),

fit.1=glmer(Accuracy~Group* Condition
+(1|Subject),data=mydata,family=binomial)

fit.2=glmer(Accuracy~Group* Condition
+(1|Subject)-1,data=mydata,family=binomial)   (“without intercept”)

fit.3=glmer(Accuracy~Group* Condition +(1|Subject)+(1|Trial)...??


I believed the analysis I've to conduct will be in the range of my
qualifications then I realize it could be more complicated than that of
course (ex GLMMs), I can hear do it as we do usually (=repeated measures
ANOVA on a percentage of correct answers for each subject ??) as if there's
only one way to follow but I think there's a lot, which one's revelant for
my data, that's I want to find.

Hope you can put me on the track,

Best

Suzon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression for data with repeated measures

2014-07-01 Thread Mitchell Maltenfort
http://stats.stackexchange.com/questions/62225/conditional-logistic-regression-vs-glmm-in-r
might be a good start

Ersatzistician and Chutzpahthologist
I can answer any question.  I don't know is an answer. I don't know
yet is a better answer.


On Tue, Jul 1, 2014 at 10:24 AM, Suzon Sepp suzon.s...@gmail.com wrote:
 Hi,

 It seems that I'm quite lost in this wide and powerful R's universe, so I
 permit myself to ask your help about issues with which I'm struggling.
 Thank you,

 I would like to know if the answer’s accuracy (correct = 1; incorrect = 0)
 varies depending on 2 categorical variables which are the group (A and B)
 and the condition (a, b and c) knowing that I’ve got n subjects and 12
 trials by conditions for each subject (i.e. 12 repetitions).

 To do that, I’m focusing on logistic regression analysis. I’ve got no
 problem with this kind of analysis until now (logistic regression with
 numeric predictor variables and/or categorical predictor with 2 levels
 only) but, in this new context, I think I have to focus more specifically
 on logistic regression including *nested (or random?) factors* in a*repeated
 measures design* (because of the variables “Subject” and “Trial”) with a
 categorical predictor variable with *more than 2 levels* (the variable
 “Condition”) and I never did such a thing…yet.

 mydata =
 mydata$Subject: Factor w/38 levels: i01, i02, i03, i04
 mydata$Group: Factor w/2 levels: A, B
 mydata$Condition: Factor w/3 levels: a, b, c
 mydata$Trial: Factor w/12 levels: t01, t02, ...t12
 mydata$Accuracy: Factor w/2 levels: 0, 1

 Subject  Group  Trial  Condition  Accuracy
i01  A   t01 a   0
i01  A   t02 a   1
 ...
i01  A   t12 a   1
i01  A   t01 b   1
i01  A   t02 b   1
 ...
i01  A   t12 b   0
i01  A   t01 c   0
i01  A   t02 c   1
 ...
i01  A   t12 c   1
i02  B   t01 a   1
 ...

 First, I’m wondering if I have to calculate a % of accuracy for each
 subject and each condition and thus “remove” the variable “Trial” but
 “lose” data (power?) in the same time… or to take into account this
 variable in the analysis and in this case, how to do that?

 Second, I don’t know which function I’ve to choose (lmer, glm, glmer…)?

 Third, I’m not sure I proceed correctly to specify in this analysis that
 the responses all come from the same subject: within-subject design =
 …+(1|Subject) as I can do for a repeated measures ANOVA to analyze the
 effect of my different variables on a numeric one such as the response
 time: 
 test=aov(Int~Group*Condition+*Error(Subject/(Group*Condition)*),data=mydata)
 and here again how can I add the variable Trial if I don't work on an
 average reaction time for each subject in the different conditions?

 Below, examples of models I can write with glmer(),

 fit.1=glmer(Accuracy~Group* Condition
 +(1|Subject),data=mydata,family=binomial)

 fit.2=glmer(Accuracy~Group* Condition
 +(1|Subject)-1,data=mydata,family=binomial)   (“without intercept”)

 fit.3=glmer(Accuracy~Group* Condition +(1|Subject)+(1|Trial)...??


 I believed the analysis I've to conduct will be in the range of my
 qualifications then I realize it could be more complicated than that of
 course (ex GLMMs), I can hear do it as we do usually (=repeated measures
 ANOVA on a percentage of correct answers for each subject ??) as if there's
 only one way to follow but I think there's a lot, which one's revelant for
 my data, that's I want to find.

 Hope you can put me on the track,

 Best

 Suzon

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression

2014-06-14 Thread javad bayat
Dear all, I have to use Zelig package for doing logistic regression.
How can I use Zelig package for logistic regression?

I did this code by glm function:

glm1 = glm(kod~Curv+Elev+Out.c+Slope+Aspect,data=data,
   family=binomial)
summary(glm1)

But the results were not appropriate for my data.

Many thanks for your helps.







-- 
Best Regards
Javad Bayat
M.Sc. Environment Engineering
Shahid Beheshti (National) University (SBU)
Alternative Mail: bayat...@yahoo.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2014-06-14 Thread Suzen, Mehmet
You might want to read this vignette:
http://cran.r-project.org/web/packages/HSAUR/vignettes/Ch_logistic_regression_glm.pdf

On 14 June 2014 19:53, javad bayat j.bayat...@gmail.com wrote:
 Dear all, I have to use Zelig package for doing logistic regression.
 How can I use Zelig package for logistic regression?

 I did this code by glm function:

 glm1 = glm(kod~Curv+Elev+Out.c+Slope+Aspect,data=data,
family=binomial)
 summary(glm1)

 But the results were not appropriate for my data.

 Many thanks for your helps.







 --
 Best Regards
 Javad Bayat
 M.Sc. Environment Engineering
 Shahid Beheshti (National) University (SBU)
 Alternative Mail: bayat...@yahoo.com

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression help!

2014-05-03 Thread Si Qi L.
Hi guys, I have a trouble to solve the specificity and senstitivity
for a logistic regression model. I really need your help, would you
please help me out? :) Thank you!!

This is the model I constructed:

model=glm(Status ~ Gender.compl+ X2.4.times.per.month+ 
Once.a.month+Others+Council.tenant+Living.with.friends.family+Living.with.parents+Owner.occupier+Private.tenant+X1.year.to.2.years+X2.to.4.years+X3.months.to.a.year+Less.than.3.months+Over.4.years+Emp.compl+Reqloanamount+EmpNetMonthlyPay+Customer_Age+RTI+acc.compl+iic.compl+irb.compl+jbc.compl+jic.compl+jq.compl+kic.compl+lbc.compl+mbc.compl+njc.compl+or.compl+pq.compl+pr.compl+qic.compl+teb.compl+tpb.compl+vbc.compl+yzb.compl+zr.compl,
 data=learning.set1, family=binomial)

so how to compute the sensititvity and specitivity? Many thanks!

Siqi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression and FIML for treating missing values

2014-02-24 Thread Jesse Gervais
Hello there,


 Is it possible to do a logistic regression with R while 1) using the
full-information maximum likelihood (FIML) for treating missing values, and
2) having a complex sample design (Cluster, Weight, Stratum).



I looked at Lavaan, but, if I'm correct, it doesn't do FIML when using
categorical variables. If I'm right, OpenMx could do FIML, but it does not
take complex sample design into account.



Any thought?



Thank you!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression with 200K features in R?

2013-12-12 Thread Romeo Kienzler

Dear List,

I'm quite new to R and want to do logistic regression with a 200K 
feature data set (around 150 training examples).


I'm aware that I should use Naive Bayes but I have a more general 
question about the capability of R handling very high dimensional data.


Please consider the following R code where mygenestrain.tab is a 150 
by 20 matrix:


traindata - read.table('mygenestrain.tab');
mylogit - glm(V1 ~ ., data = traindata, family = binomial);

When executing this code I get the following error:

Error in terms.formula(formula, data = data) :
  allocMatrix: too many elements specified
Calls: glm ... model.frame - model.frame.default - terms - terms.formula
Execution halted

Is this because R can't handle 200K features or am I doing something 
completely wrong here?


Thanks a lot for your help!

best Regards,

Romeo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Eik Vettorazzi
it is simply because you can't do a regression with more predictors than
observations.

Cheers.

Am 12.12.2013 09:00, schrieb Romeo Kienzler:
 Dear List,
 
 I'm quite new to R and want to do logistic regression with a 200K
 feature data set (around 150 training examples).
 
 I'm aware that I should use Naive Bayes but I have a more general
 question about the capability of R handling very high dimensional data.
 
 Please consider the following R code where mygenestrain.tab is a 150
 by 20 matrix:
 
 traindata - read.table('mygenestrain.tab');
 mylogit - glm(V1 ~ ., data = traindata, family = binomial);
 
 When executing this code I get the following error:
 
 Error in terms.formula(formula, data = data) :
   allocMatrix: too many elements specified
 Calls: glm ... model.frame - model.frame.default - terms - terms.formula
 Execution halted
 
 Is this because R can't handle 200K features or am I doing something
 completely wrong here?
 
 Thanks a lot for your help!
 
 best Regards,
 
 Romeo
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790
--

Besuchen Sie uns auf: www.uke.de
_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Christian Gerloff (Vertreter des Vorsitzenden), 
Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik
_

SAVE PAPER - THINK BEFORE PRINTING

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Eik Vettorazzi
I thought so (with all the limitations due to collinearity and so on),
but actually there is a limit for the maximum size of an array which is
independent of your memory size and is due to the way arrays are
indexed. You can't create an object with more than 2^31-1 = 2147483647
elements.

https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html

cheers

Am 12.12.2013 12:34, schrieb Romeo Kienzler:
 ok, so 200K predictors an 10M observations would work?
 
 
 On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:
 it is simply because you can't do a regression with more predictors than
 observations.

 Cheers.

 Am 12.12.2013 09:00, schrieb Romeo Kienzler:
 Dear List,

 I'm quite new to R and want to do logistic regression with a 200K
 feature data set (around 150 training examples).

 I'm aware that I should use Naive Bayes but I have a more general
 question about the capability of R handling very high dimensional data.

 Please consider the following R code where mygenestrain.tab is a 150
 by 20 matrix:

 traindata - read.table('mygenestrain.tab');
 mylogit - glm(V1 ~ ., data = traindata, family = binomial);

 When executing this code I get the following error:

 Error in terms.formula(formula, data = data) :
allocMatrix: too many elements specified
 Calls: glm ... model.frame - model.frame.default - terms -
 terms.formula
 Execution halted

 Is this because R can't handle 200K features or am I doing something
 completely wrong here?

 Thanks a lot for your help!

 best Regards,

 Romeo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790
--

Besuchen Sie uns auf: www.uke.de
_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Christian Gerloff (Vertreter des Vorsitzenden), 
Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik
_

SAVE PAPER - THINK BEFORE PRINTING

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Duncan Murdoch

On 13-12-12 6:51 AM, Eik Vettorazzi wrote:

I thought so (with all the limitations due to collinearity and so on),
but actually there is a limit for the maximum size of an array which is
independent of your memory size and is due to the way arrays are
indexed. You can't create an object with more than 2^31-1 = 2147483647
elements.

https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html


That post is from 2007.  The limits were raised considerably when R 
3.0.0 was released, and it is now 2^48 for disk-based operations, 2^52 
for working in memory.


Duncan Murdoch




cheers

Am 12.12.2013 12:34, schrieb Romeo Kienzler:

ok, so 200K predictors an 10M observations would work?


On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:

it is simply because you can't do a regression with more predictors than
observations.

Cheers.

Am 12.12.2013 09:00, schrieb Romeo Kienzler:

Dear List,

I'm quite new to R and want to do logistic regression with a 200K
feature data set (around 150 training examples).

I'm aware that I should use Naive Bayes but I have a more general
question about the capability of R handling very high dimensional data.

Please consider the following R code where mygenestrain.tab is a 150
by 20 matrix:

traindata - read.table('mygenestrain.tab');
mylogit - glm(V1 ~ ., data = traindata, family = binomial);

When executing this code I get the following error:

Error in terms.formula(formula, data = data) :
allocMatrix: too many elements specified
Calls: glm ... model.frame - model.frame.default - terms -
terms.formula
Execution halted

Is this because R can't handle 200K features or am I doing something
completely wrong here?

Thanks a lot for your help!

best Regards,

Romeo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Eik Vettorazzi
thanks Duncan for this clarification.
A double precision matrix with 2e11 elements (as the op wanted) would
need about 1.5 TB memory, that's more than a standard (windows 64bit)
computer can handle.

Cheers.

Am 12.12.2013 13:00, schrieb Duncan Murdoch:
 On 13-12-12 6:51 AM, Eik Vettorazzi wrote:
 I thought so (with all the limitations due to collinearity and so on),
 but actually there is a limit for the maximum size of an array which is
 independent of your memory size and is due to the way arrays are
 indexed. You can't create an object with more than 2^31-1 = 2147483647
 elements.

 https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html
 
 That post is from 2007.  The limits were raised considerably when R
 3.0.0 was released, and it is now 2^48 for disk-based operations, 2^52
 for working in memory.
 
 Duncan Murdoch
 
 

 cheers

 Am 12.12.2013 12:34, schrieb Romeo Kienzler:
 ok, so 200K predictors an 10M observations would work?


 On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:
 it is simply because you can't do a regression with more predictors
 than
 observations.

 Cheers.

 Am 12.12.2013 09:00, schrieb Romeo Kienzler:
 Dear List,

 I'm quite new to R and want to do logistic regression with a 200K
 feature data set (around 150 training examples).

 I'm aware that I should use Naive Bayes but I have a more general
 question about the capability of R handling very high dimensional
 data.

 Please consider the following R code where mygenestrain.tab is a 150
 by 20 matrix:

 traindata - read.table('mygenestrain.tab');
 mylogit - glm(V1 ~ ., data = traindata, family = binomial);

 When executing this code I get the following error:

 Error in terms.formula(formula, data = data) :
 allocMatrix: too many elements specified
 Calls: glm ... model.frame - model.frame.default - terms -
 terms.formula
 Execution halted

 Is this because R can't handle 200K features or am I doing something
 completely wrong here?

 Thanks a lot for your help!

 best Regards,

 Romeo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 

-- 
Eik Vettorazzi

Department of Medical Biometry and Epidemiology
University Medical Center Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790
--

Besuchen Sie uns auf: www.uke.de
_

Universitätsklinikum Hamburg-Eppendorf; Körperschaft des öffentlichen Rechts; 
Gerichtsstand: Hamburg
Vorstandsmitglieder: Prof. Dr. Christian Gerloff (Vertreter des Vorsitzenden), 
Prof. Dr. Dr. Uwe Koch-Gromus, Joachim Prölß, Rainer Schoppik
_

SAVE PAPER - THINK BEFORE PRINTING

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Duncan Murdoch

On 12/12/2013 7:08 AM, Eik Vettorazzi wrote:

thanks Duncan for this clarification.
A double precision matrix with 2e11 elements (as the op wanted) would
need about 1.5 TB memory, that's more than a standard (windows 64bit)
computer can handle.


According to Microsoft's Memory Limits web page (currently at 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366778%28v=vs.85%29.aspx#memory_limits, 
but these things tend to move around), the limit is 8 TB for virtual 
memory.(The same page lists a variety of smaller physical memory 
limits, depending on the Windows version, but R doesn't need physical 
memory, virtual is good enough. )


R would be very slow if it was working with objects bigger than physical 
memory, but it could conceivably work.


Duncan Murdoch


Cheers.

Am 12.12.2013 13:00, schrieb Duncan Murdoch:
 On 13-12-12 6:51 AM, Eik Vettorazzi wrote:
 I thought so (with all the limitations due to collinearity and so on),
 but actually there is a limit for the maximum size of an array which is
 independent of your memory size and is due to the way arrays are
 indexed. You can't create an object with more than 2^31-1 = 2147483647
 elements.

 https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html

 That post is from 2007.  The limits were raised considerably when R
 3.0.0 was released, and it is now 2^48 for disk-based operations, 2^52
 for working in memory.

 Duncan Murdoch



 cheers

 Am 12.12.2013 12:34, schrieb Romeo Kienzler:
 ok, so 200K predictors an 10M observations would work?


 On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:
 it is simply because you can't do a regression with more predictors
 than
 observations.

 Cheers.

 Am 12.12.2013 09:00, schrieb Romeo Kienzler:
 Dear List,

 I'm quite new to R and want to do logistic regression with a 200K
 feature data set (around 150 training examples).

 I'm aware that I should use Naive Bayes but I have a more general
 question about the capability of R handling very high dimensional
 data.

 Please consider the following R code where mygenestrain.tab is a 150
 by 20 matrix:

 traindata - read.table('mygenestrain.tab');
 mylogit - glm(V1 ~ ., data = traindata, family = binomial);

 When executing this code I get the following error:

 Error in terms.formula(formula, data = data) :
 allocMatrix: too many elements specified
 Calls: glm ... model.frame - model.frame.default - terms -
 terms.formula
 Execution halted

 Is this because R can't handle 200K features or am I doing something
 completely wrong here?

 Thanks a lot for your help!

 best Regards,

 Romeo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Romeo Kienzler

ok, so 200K predictors an 10M observations would work?


On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:

it is simply because you can't do a regression with more predictors than
observations.

Cheers.

Am 12.12.2013 09:00, schrieb Romeo Kienzler:

Dear List,

I'm quite new to R and want to do logistic regression with a 200K
feature data set (around 150 training examples).

I'm aware that I should use Naive Bayes but I have a more general
question about the capability of R handling very high dimensional data.

Please consider the following R code where mygenestrain.tab is a 150
by 20 matrix:

traindata - read.table('mygenestrain.tab');
mylogit - glm(V1 ~ ., data = traindata, family = binomial);

When executing this code I get the following error:

Error in terms.formula(formula, data = data) :
   allocMatrix: too many elements specified
Calls: glm ... model.frame - model.frame.default - terms - terms.formula
Execution halted

Is this because R can't handle 200K features or am I doing something
completely wrong here?

Thanks a lot for your help!

best Regards,

Romeo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with 200K features in R?

2013-12-12 Thread Romeo Kienzler

Dear Eik,

thank you so much for your help!

best Regards,

Romeo

On 12/12/2013 12:51 PM, Eik Vettorazzi wrote:

I thought so (with all the limitations due to collinearity and so on),
but actually there is a limit for the maximum size of an array which is
independent of your memory size and is due to the way arrays are
indexed. You can't create an object with more than 2^31-1 = 2147483647
elements.

https://stat.ethz.ch/pipermail/r-help/2007-June/133238.html

cheers

Am 12.12.2013 12:34, schrieb Romeo Kienzler:

ok, so 200K predictors an 10M observations would work?


On 12/12/2013 12:12 PM, Eik Vettorazzi wrote:

it is simply because you can't do a regression with more predictors than
observations.

Cheers.

Am 12.12.2013 09:00, schrieb Romeo Kienzler:

Dear List,

I'm quite new to R and want to do logistic regression with a 200K
feature data set (around 150 training examples).

I'm aware that I should use Naive Bayes but I have a more general
question about the capability of R handling very high dimensional data.

Please consider the following R code where mygenestrain.tab is a 150
by 20 matrix:

traindata - read.table('mygenestrain.tab');
mylogit - glm(V1 ~ ., data = traindata, family = binomial);

When executing this code I get the following error:

Error in terms.formula(formula, data = data) :
allocMatrix: too many elements specified
Calls: glm ... model.frame - model.frame.default - terms -
terms.formula
Execution halted

Is this because R can't handle 200K features or am I doing something
completely wrong here?

Thanks a lot for your help!

best Regards,

Romeo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression over LOOCV

2013-10-18 Thread Nicolás Sánchez
Hello all.

I have this code:

myLOOCV - function(myformula, data) {
Y - all.vars(myformula)[1]
Scores- numeric(length(data[,1]))
for (i in 1:length(data[,1])) {
train - data[-i,]
test - data[i,]
myModel - lrm(myformula, train)
Scores[i] - predict(myModel, test,type=mean)
}
factor(Scores,labels=levels(data[,Y]))
}

When I call the method myLOOCV, myLOOCV(value, data), I have this error:

Error in factor(Scores, labels = levels(data[, Y])) :
  invalid 'labels'; length 4 should be 1 or 180

I don't understand it! Could anyone solve it?

Thanks in advance.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression

2013-04-19 Thread Endy BlackEndy
Dear colleagues I have a couple of problems related with binary logistic
regression.

The first problem is how to compute Pearson and Likelihood chi-squeared
tests for grouped data.

For the same form of data set how to compute sensitivity, specificity and
related measures.

When I speak about grouped data I mean data of the following form


Alcohol.Consum Malformed NoMalformed

   0  4817066

 1  3814464

  1-2  5   788

  3-5  1   126

  =6 1 37

(This data set has been taken from the book “Categorical data analysis by
Agresti”)

The second question is the following:is it correct to upload a grouped data
set to an ungrouped one?

The upload is achieved with the aid of the following routine

Createdataframe - function(d){

  f1 = f2 = numeric(0)

  v = d[,1]

  for(j in 2:3){

 for(i in 1:dim(d)[1]){

f1 = c( f1, rep( levels(v)[v[i]], d[i,j] ) )

 }

 f2 = c( f2, rep( 3-j, sum(d[,j]) ) )

  }

  df=data.frame(f1,f2)

  return(df)

}

My finally question is why SPSS computes the Hosmer-Lemeshow test while the
routine listed below gives NaN. (One has to put g=8 in order to get a
numeric value)

The data set is (it is also has been taken from the same book mentioned
above)


FlightNo Temp ThermalDisast

  1   66   0

  2   70   1

  3   69   0

  4   68   0

  5   67   0

  6   72   0

  7   73   0

  8   70   0

   9  57   1

 10  63   1

 11  70   1

 12  78   0

 13  67   0

 14  53   1

 15  67   0

 16 750

 17 700

 18 810

 19 760

 20 790

 21 751

 22 760

 23 58 1

and the routine used is

hosmerlemeshow = function(obj, g=10) {

  # first, check to see if we fed in the right kind of object

   stopifnot(family(obj)$family==binomial  family(obj)$link==logit)

   y = obj$model[[1]]

   # the double bracket (above) gets the index of items within an object

   if (is.factor(y))

  y = as.numeric(y)==2

  yhat = obj$fitted.values

  cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)

  obs = xtabs(cbind(1 - y, y) ~ cutyhat)

  expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)

  if (any(expect  5))

# warning(Some expected counts are less than 5. Use smaller number
of groups)

chisq = sum((obs - expect)^2/expect)

P = 1 - pchisq(chisq, g - 2)

cat(H-L2 statistic,round(chisq,4), and its p value,P,\n)

# by returning an object of class htest, the function will
perform like the

   # built-in hypothesis tests

   return(structure(list(

  method = c(paste(Hosmer and Lemeshow goodness-of-fit test with, g,
bins, sep= )),

  data.name = deparse(substitute(obj)),

  statistic = c(X2=chisq),

   parameter = c(df=g-2),

   p.value = P

  ), class='htest'))

  return(list(chisq=chisq,p.value=P))

}

I found this routine in the internet.

Thank you for your cooperation in advance. Any suggestion and/or solution
will be greatly appreciated.

With regards
Endy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression

2013-04-19 Thread Endy BlackEndy
Please read the attach file.
Thank you
Endy
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2013-04-19 Thread David Winsemius

On Apr 19, 2013, at 11:45 AM, Endy BlackEndy wrote:

 Please read the attach file.

Please re-read : http://www.r-project.org/mail.html#instructions

 Thank you
 Endy
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression

2013-04-19 Thread Endy BlackEndy
Dear colleagues I have a couple of problems related with binary logistic
regression.

  The first problem is how to compute Pearson and Likelihood chi-squeared
tests for grouped data.

 For the same form of data set how to compute sensitivity, specificity and
related measures.

 When I speak about grouped data I mean data of the following form



  Alcohol.Consum   Malformed NoMalformed

0
   4817066

  1
3814464

   1-2
5788

   3-5
1126

  =6
1  37



(This data set has been taken from the book “Categorical data analysis by
Agresti”)



   The second question is the following:is it correct to upload a grouped
data set to an ungrouped one?

The upload is achieved with the aid of the following routine



Createdataframe - function(d){

 f1 = f2 = numeric(0)

 v = d[,1]

 for(j in 2:3){

  for(i in 1:dim(d)[1]){

   f1 = c( f1, rep( levels(v)[v[i]], d[i,j] ) )

  }

  f2 = c( f2, rep( 3-j, sum(d[,j]) ) )

 }

 df=data.frame(f1,f2)

 return(df)

}



My finally question is why SPSS computes the Hosmer-Lemeshow test while
the routine listed below gives NaN. (One has to put g=8 in order to get a
numeric value)



The data set is (it is also has been taken from the same book mentioned
above)



FlightNo   TempThermalDisast

1  660

2  701

3  690

4  680

5  670

6  720

7  730

8  700

9  571

10631

11701

12780

13670

14531

15670

16750

17700

18810

19760

20790

21751

22760

23581



and the routine used is



hosmerlemeshow = function(obj, g=10) {

# first, check to see if we fed in the right kind of object

   stopifnot(family(obj)$family==binomial  family(obj)$link==logit)

   y = obj$model[[1]]

   # the double bracket (above) gets the index of items within an object

   if (is.factor(y))

  y = as.numeric(y)==2

   yhat = obj$fitted.values

   cutyhat=cut(yhat,quantile(yhat,0:g/g),include.lowest=TRUE)



   obs = xtabs(cbind(1 - y, y) ~ cutyhat)

   expect = xtabs(cbind(1 - yhat, yhat) ~ cutyhat)

   if (any(expect  5))

   #   warning(Some expected counts are less than 5. Use smaller number of
groups)

   chisq = sum((obs - expect)^2/expect)

   P = 1 - pchisq(chisq, g - 2)

   cat(H-L2 statistic,round(chisq,4), and its p value,P,\n)

   # by returning an object of class htest, the function will perform
like the

   # built-in hypothesis tests

   return(structure(list(

  method = c(paste(Hosmer and Lemeshow goodness-of-fit test with, g,
bins, sep= )),

  data.name = deparse(substitute(obj)),

  statistic = c(X2=chisq),

  parameter = c(df=g-2),

  p.value = P

   ), class='htest'))

   return(list(chisq=chisq,p.value=P))

}



Thank you for your cooperation in advance. Any suggestion and/or solution
will be greatly appreciated.



With regards

Endy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression

2013-04-14 Thread Endy BlackEndy
I have a data set to be analyzed using to binary logistic regression. The
data set is iin grouped form. My question is: how I can compute
Hosmer-Lemeshow test and measures like sensitivity and specificity? Any
suggestion will be greatly appreciated.

Thank you

Endy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression

2013-04-14 Thread Jose Iparraguirre
Endy,

See the package ResourceSelection for the HL test and the package caret for the 
sensitivity and specificity measures.
Regards,

Jose Iparraguirre
Chief Economist
Age UK, London


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Endy BlackEndy [pert...@gmail.com]
Sent: 14 April 2013 19:05
To: R-Help
Subject: [R] Logistic regression

I have a data set to be analyzed using to binary logistic regression. The
data set is iin grouped form. My question is: how I can compute
Hosmer-Lemeshow test and measures like sensitivity and specificity? Any
suggestion will be greatly appreciated.

Thank you

Endy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Please donate to the Syria Crisis Appeal by text or online:

To donate £5 by mobile, text SYRIA to 70800.  To donate online, please visit 

http://www.ageinternational.org.uk/syria

Over one million refugees are desperately in need of water, food, healthcare, 
warm clothing, 
blankets and shelter; Age International urgently needs your support to help 
affected older refugees.


Age International is a subsidiary charity of Age UK and a member of the 
Disasters Emergency Committee (DEC).  
The DEC launches and co-ordinates national fundraising appeals for public 
donations on behalf of its member agencies.

Texts cost £5 plus one standard rate message.  Age International will receive a 
minimum of £4.96.  
More info at ageinternational.org.uk/SyriaTerms



 

---
Age UK is a registered charity and company limited by guarantee, (registered 
charity number 1128267, registered company number 6825798). 
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed 
Representative of Age UK Enterprises Limited, Age UK is an Introducer 
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth 
Access for the purposes of introducing potential annuity and health 
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit 
Solutions Limited and Simplyhealth Access are all authorised and 
regulated by the Financial Services Authority. 
--

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are 
addressed. If you receive a message in error, please advise the sender and 
delete immediately.

Except where this email is sent in the usual course of our business, any 
opinions expressed in this email are those of the author and do not 
necessarily reflect the opinions of Age UK or its subsidiaries and associated 
companies. Age UK monitors all e-mail transmissions passing 
through its network and may block or modify mails which are deemed to be 
unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 
272786) and their trading and other associated companies merged 
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to 
improving the lives of people in later life.  The three national 
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help 
the Aged in these nations to form three registered charities: 
Age Scotland, Age NI, Age Cymru.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression with Offset value

2012-11-05 Thread Lucas Chaparro
Dear R friends.
I´m trying to fit a Logistic Regression using glm( family='binomial').
Here is the model:
*model-glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp,
offset=(log(1/off)), data=mydata, family='binomial')*

mydata has 76820 observations.
The response variable f_ocur) is a 0-1.
This data is a SAMPLE of a bigger dataset, so the idea of setting the
offset is to account that the data used here represents a sample of the
real data to be analyzed.

For some reason the offset is not working. When I run this model I get a
result, but when I run the same model but without the offset I get the
exact result than the previous model I was expecting a different result but
no... there is no difference.
Am I doing something wrong? Should the offset be with the linear
predictors? like this:
*model-glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp+**
offset(log(1/off))**, data=mydata, family='binomial')*

O

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression with Offset value

2012-11-05 Thread Lucas Chaparro
Dear R friends.
I´m trying to fit a Logistic Regression using glm( family='binomial').
Here is the model:
*model-glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp,
offset=(log(1/off)), data=mydata, family='binomial')*

mydata has 76820 observations.
The response variable f_ocur) is a 0-1.
This data is a SAMPLE of a bigger dataset, so the idea of setting the
offset is to account that the data used here represents a sample of the
real data to be analyzed.

For some reason the offset is not working. When I run this model I get a
result, but when I run the same model but without the offset I get the
exact result than the previous model I was expecting a different result but
no... there is no difference.
Am I doing something wrong? Should the offset be with the linear
predictors? like this:
*model-glm(f_ocur~altitud+UTM_X+UTM_Y+j_sin+j_cos+temp_res+pp+**
offset(log(1/off))**, data=mydata, family='binomial')*

Once the model is ready, I´d like to use it in a new data. The new data
would be the data to validate this model, this data has de the same
columns, my idea is to use:
*validate-predict(model, newdata=data2, type='response')*
And here comes my question, does the predict function takes into
consideration the *offset *used to create the model? if not, what should I
do in order to get the correct probabilities in the new data?

I´d really appreciate if anyone could help me.
Thank you.

Lucas.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression/Cut point? predict ??

2012-10-20 Thread Adel Powell
I am new to R and I am trying to do a monte carlo simulation where I
generate data and interject error then test various cut points; however, my
output was garbage (at x equal zero, I did not get .50)
I am basically testing the performance of classifiers.

Here is the code:
n - 1000; # Sample size

fitglm - function(sigma,tau){
x - rnorm(n,0,sigma)
intercept - 0
beta - 5
   * ystar - intercept+beta*x*
   * z - rbinom(n,1,plogis(ystar))**# I believe plogis accepts the a
+bx augments and return the  e^x/(1+e^x)  which is then used to generate 0
and 1 data*
xerr - x + rnorm(n,0,tau)# error is added here
model-glm(z ~ xerr, family=binomial(logit))
int-coef(model)[1]
slope-coef(model)[2]
pred-predict(model)  #this gives me the a+bx data for new error?  I
know I can add type= response to get the probab. but only e^x not *e^x/(1+e^x)
*

pi1hat-length(z[which(z==1)]/length(z)) My cut point is calculated  is
the proportion of 0s to 1.
pi0hat-length(z[which(z==0)]/length(z))

cutmid - log(pi0hat/pi1hat)
pred-ifelse(predcutmid,1,0) * I am not sure if I need to compare
these two. I think this is an error.
*
accuracy-length(which(pred==z))/length(z)
accuracy

rocpreds-prediction(pred,z)
auc-performance(rocpreds,auc)@y.values

output-c(int,slope,cutmid,accuracy,auc)
names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC)
return(output)

}

y-fitglm(.05,1)
y

nreps - 500;
output-data.frame(matrix(rep(NA),nreps,6,ncol=6))

mysigma-.5
mytau-.1

i-1

for(j in 1:nreps) {
output[j,1:5]-fitglm(mysigma,mytau)
output[j,6]-j
}

names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC,Iteration)

apply(output,2, mean)
apply(output,2, var)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression/Cut point? predict ??

2012-10-20 Thread Simon Knapp
What do you mean by at x equal zero?

On Sun, Oct 21, 2012 at 8:37 AM, Adel Powell powella...@gmail.com wrote:
 I am new to R and I am trying to do a monte carlo simulation where I
 generate data and interject error then test various cut points; however, my
 output was garbage (at x equal zero, I did not get .50)
 I am basically testing the performance of classifiers.

 Here is the code:
 n - 1000; # Sample size

 fitglm - function(sigma,tau){
 x - rnorm(n,0,sigma)
 intercept - 0
 beta - 5
* ystar - intercept+beta*x*
* z - rbinom(n,1,plogis(ystar))**# I believe plogis accepts the a
 +bx augments and return the  e^x/(1+e^x)  which is then used to generate 0
 and 1 data*
 xerr - x + rnorm(n,0,tau)# error is added here
 model-glm(z ~ xerr, family=binomial(logit))
 int-coef(model)[1]
 slope-coef(model)[2]
 pred-predict(model)  #this gives me the a+bx data for new error?  I
 know I can add type= response to get the probab. but only e^x not *e^x/(1+e^x)
 *

 pi1hat-length(z[which(z==1)]/length(z)) My cut point is calculated  is
 the proportion of 0s to 1.
 pi0hat-length(z[which(z==0)]/length(z))

 cutmid - log(pi0hat/pi1hat)
 pred-ifelse(predcutmid,1,0) * I am not sure if I need to compare
 these two. I think this is an error.
 *
 accuracy-length(which(pred==z))/length(z)
 accuracy

 rocpreds-prediction(pred,z)
 auc-performance(rocpreds,auc)@y.values

 output-c(int,slope,cutmid,accuracy,auc)
 names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC)
 return(output)

 }

 y-fitglm(.05,1)
 y

 nreps - 500;
 output-data.frame(matrix(rep(NA),nreps,6,ncol=6))

 mysigma-.5
 mytau-.1

 i-1

 for(j in 1:nreps) {
 output[j,1:5]-fitglm(mysigma,mytau)
 output[j,6]-j
 }

 names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC,Iteration)

 apply(output,2, mean)
 apply(output,2, var)

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression X^2 test with large sample size (fwd)

2012-07-31 Thread M Pomati


Does anyone know of any X^2 tests to compare the fit of logistic models 
which factor out the sample size? I'm dealing with a very large sample and 
I fear the significant X^2 test I get when adding a variable to the model 
is simply a result of the sample size (200,000 cases).

I'd rather use the whole dataset instead of taking (small) random samples 
as it is highly skewed. I've seen things like Phi and Cramer's V for 
crosstabs but I'm not sure whether they have been used before on logistic 
regression, if there are better ones and if there are any packages.


Many thanks

Marco


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression X^2 test with large sample size (fwd)

2012-07-31 Thread Marc Schwartz
On Jul 31, 2012, at 10:35 AM, M Pomati marco.pom...@bristol.ac.uk wrote:

 
 
 Does anyone know of any X^2 tests to compare the fit of logistic models 
 which factor out the sample size? I'm dealing with a very large sample and 
 I fear the significant X^2 test I get when adding a variable to the model 
 is simply a result of the sample size (200,000 cases).
 
 I'd rather use the whole dataset instead of taking (small) random samples 
 as it is highly skewed. I've seen things like Phi and Cramer's V for 
 crosstabs but I'm not sure whether they have been used before on logistic 
 regression, if there are better ones and if there are any packages.
 
 
 Many thanks
 
 Marco



Sounds like you are bordering on some type of stepwise approach to including or 
not including covariates in the model. You can search the list archives for a 
myriad of discussions as to why that is a poor approach.

You have the luxury of a large sample. You also have the challenge of 
interpreting covariates that appear to be statistically significant, but may 
have a rather small *effect size* in context. That is where subject matter 
experts need to provide input as to interpretation of the contextual 
significance of the variable, as opposed to the statistical significance of 
that same variable.

A general approach, is to simply pre-specify your model based upon rather 
simple considerations. Also, you need to determine if your goal for the model 
is prediction or explanation. 

What is the incidence of your 'event' in the sample? If it is say 10%, then you 
should have around 20,000 events. The rule of thumb for logistic regression is 
to have around 20 events per covariate degree of freedom (df) to minimize the 
risk of over-fitting the model to your dataset. A continuous covariate is 1 df, 
a k-level factor is k-1 df. So with 20,000 events, your model could feasibly 
have 1,000 covariate df's. I am guessing that you don't have that much 
independent data to begin with.

So, pre-specfy your model on the full dataset and stick with it. Interact with 
subject matter experts on the interpretation of the model.

BTW, this question is really about statistical modeling generally, not really R 
specific. Such queries are best posed to general statistical lists/forums such 
as Stack Exchange. I would also point you to Frank Harrell's book, Regression 
Modeling Strategies.

Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression X^2 test with large sample size (fwd)

2012-07-31 Thread M Pomati
Marc, thank you very much for your help.
I've posted in on

http://math.stackexchange.com/questions/177252/x2-tests-to-compare-the-fit-of-large-samples-logistic-models

and added details.

Many thanks

Marco

--On 31 July 2012 11:50 -0500 Marc Schwartz marc_schwa...@me.com wrote:

 On Jul 31, 2012, at 10:35 AM, M Pomati marco.pom...@bristol.ac.uk wrote:

 
 
  Does anyone know of any X^2 tests to compare the fit of logistic models
  which factor out the sample size? I'm dealing with a very large sample 
and
  I fear the significant X^2 test I get when adding a variable to the 
model
  is simply a result of the sample size (200,000 cases).
 
  I'd rather use the whole dataset instead of taking (small) random 
samples
  as it is highly skewed. I've seen things like Phi and Cramer's V for
  crosstabs but I'm not sure whether they have been used before on 
logistic
  regression, if there are better ones and if there are any packages.
 
 
  Many thanks
 
  Marco



 Sounds like you are bordering on some type of stepwise approach to 
including or not including covariates in the model. You can search the list 
archives for a myriad of discussions as to why that is a poor approach.

 You have the luxury of a large sample. You also have the challenge of 
interpreting covariates that appear to be statistically significant, but 
may have a rather small *effect size* in context. That is where subject 
matter experts need to provide input as to interpretation of the contextual 
significance of the variable, as opposed to the statistical significance of 
that same variable.

 A general approach, is to simply pre-specify your model based upon rather 
simple considerations. Also, you need to determine if your goal for the 
model is prediction or explanation.

 What is the incidence of your 'event' in the sample? If it is say 10%, 
then you should have around 20,000 events. The rule of thumb for logistic 
regression is to have around 20 events per covariate degree of freedom (df) 
to minimize the risk of over-fitting the model to your dataset. A 
continuous covariate is 1 df, a k-level factor is k-1 df. So with 20,000 
events, your model could feasibly have 1,000 covariate df's. I am guessing 
that you don't have that much independent data to begin with.

 So, pre-specfy your model on the full dataset and stick with it. Interact 
with subject matter experts on the interpretation of the model.

 BTW, this question is really about statistical modeling generally, not 
really R specific. Such queries are best posed to general statistical 
lists/forums such as Stack Exchange. I would also point you to Frank 
Harrell's book, Regression Modeling Strategies.

 Regards,

 Marc Schwartz






--
M Pomati
University of Bristol
School for Policy Studies
8 Priory Road
Office:10B
Bristol BS8 1TZ, UK
http://www.bristol.ac.uk/sps/research/centres/poverty

 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression X^2 test with large sample size (fwd)

2012-07-31 Thread David Winsemius


On Jul 31, 2012, at 10:25 AM, M Pomati wrote:


Marc, thank you very much for your help.
I've posted in on

http://math.stackexchange.com/questions/177252/x2-tests-to-compare-the-fit-of-large-samples-logistic-models 



and added details.


I think you might have gotten a more statistically knowledgeable  
audience at:


http://stats.stackexchange.com/

(And I suggested to the moderators at math-SE that it be migrated.)

--
David.



Many thanks

Marco

--On 31 July 2012 11:50 -0500 Marc Schwartz marc_schwa...@me.com  
wrote:


On Jul 31, 2012, at 10:35 AM, M Pomati marco.pom...@bristol.ac.uk  
wrote:


Does anyone know of any X^2 tests to compare the fit of logistic  
models
which factor out the sample size? I'm dealing with a very large  
sample and
I fear the significant X^2 test I get when adding a variable to  
the model

is simply a result of the sample size (200,000 cases).

I'd rather use the whole dataset instead of taking (small) random  
samples

as it is highly skewed. I've seen things like Phi and Cramer's V for
crosstabs but I'm not sure whether they have been used before on  
logistic

regression, if there are better ones and if there are any packages.


Many thanks

Marco



Sounds like you are bordering on some type of stepwise approach to
including or not including covariates in the model. You can search  
the list
archives for a myriad of discussions as to why that is a poor  
approach.


You have the luxury of a large sample. You also have the challenge of
interpreting covariates that appear to be statistically significant,  
but
may have a rather small *effect size* in context. That is where  
subject
matter experts need to provide input as to interpretation of the  
contextual
significance of the variable, as opposed to the statistical  
significance of

that same variable.


A general approach, is to simply pre-specify your model based upon  
rather
simple considerations. Also, you need to determine if your goal for  
the

model is prediction or explanation.


What is the incidence of your 'event' in the sample? If it is say  
10%,
then you should have around 20,000 events. The rule of thumb for  
logistic
regression is to have around 20 events per covariate degree of  
freedom (df)

to minimize the risk of over-fitting the model to your dataset. A
continuous covariate is 1 df, a k-level factor is k-1 df. So with  
20,000
events, your model could feasibly have 1,000 covariate df's. I am  
guessing

that you don't have that much independent data to begin with.


So, pre-specfy your model on the full dataset and stick with it.  
Interact

with subject matter experts on the interpretation of the model.


BTW, this question is really about statistical modeling generally,  
not

really R specific. Such queries are best posed to general statistical
lists/forums such as Stack Exchange. I would also point you to Frank
Harrell's book, Regression Modeling Strategies.


Regards,

Marc Schwartz


--
M Pomati
University of Bristol




David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic Regression with SSlogis but y == 0 ?!

2012-06-13 Thread pluto
Hello there!

I got some data with x and y values. there are some y == 0. This is a
problem for the selfstarting regression model SSlogis.
The regression works if I use a non selfstarting model. The formula is the
same.  But this needs very detailed information of the starting list. I dont
have this Information all the time.

An easy solution for selfstart model could be to add 1E-100 to the y==0
values. That works fine, but the Regressioncoefficiants differ to the
original ones a bit. Sometimes this is problematic.

So I wonder if it is possible to copy the code of SSlogis and rewrite the
failing passage. I found this passage, but I dont have any Idea, where and
how to create a new SSlogis2.
 
I guess this part of one SSlogis function causes the errormessage:
z - xy[[y]]
if (min(z) = 0) {
z - z - 1.05 * min(z)
}
z - z/(1.05 * max(z))
xy[[z]] - log(z/(1 - z))
aux - coef(lm(x ~ z, xy))

--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-Regression-with-SSlogis-but-y-0-tp4633269.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-04 Thread David Winsemius


On Apr 3, 2012, at 9:25 PM, Melrose2012 wrote:

I am trying to plot the logistic regression of a dataset (# of  
living flies
vs days the flies are alive) and then fit a best-fit line to this  
data.


Here is my code:
plot(fflies$living~fflies$day,xlab=Number of Days,ylab=Number of  
Fruit

Flies,main=Number of Living Fruit Flies vs Day,pch=16)
alive - (fflies$living)
dead - (fflies$living[1]-alive)
glm.fit - glm(cbind(alive,dead)~fflies$day,family=binomial)
summary(glm.fit)
lines(sort(fflies$day),fitted(glm.fit) [order (fflies$day)],col =
red,lwd=2)
lines(glm.fit,col = red,lwd=2)

My problem is that, while I am pretty sure that I did the 'glm'  
command
correctly, when I try to plot this as a best-fit line on my data, it  
does

not fit (clearly on a different scale somehow).

Can anyone enlighten me about what I am doing wrong?  Should I be  
scaling
one of these somehow?  I've tried various types of scaling but  
nothing plots

the line on top of the plot (no matter which I scale).


The parameter estimate is on the log(odds) scale. Two problems: A)  
Your plot is not even on the odds scale, much less the log(odds)  
scale. B) Your model assumes that the log(odds) for an event will  
increase linearly with days. However your event == alive seems  
reversed from what I would have expected it to be. I would have  
expected dead within an interval to be the event that might increase  
in probability as time increased. I also speculated that the problem  
(which was not described at all) might better fit within the framework  
of survival analysis rather than logistic regression?


--
David.



Thanks so much for whatever help you can give me!

Cheers,
Melissa

--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-tp4530651p4530651.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression

2012-04-03 Thread Melrose2012
I am trying to plot the logistic regression of a dataset (# of living flies
vs days the flies are alive) and then fit a best-fit line to this data.

Here is my code:
plot(fflies$living~fflies$day,xlab=Number of Days,ylab=Number of Fruit
Flies,main=Number of Living Fruit Flies vs Day,pch=16)
alive - (fflies$living)
dead - (fflies$living[1]-alive)
glm.fit - glm(cbind(alive,dead)~fflies$day,family=binomial)
summary(glm.fit)
lines(sort(fflies$day),fitted(glm.fit) [order (fflies$day)],col =
red,lwd=2)
lines(glm.fit,col = red,lwd=2)

My problem is that, while I am pretty sure that I did the 'glm' command
correctly, when I try to plot this as a best-fit line on my data, it does
not fit (clearly on a different scale somehow).

Can anyone enlighten me about what I am doing wrong?  Should I be scaling
one of these somehow?  I've tried various types of scaling but nothing plots
the line on top of the plot (no matter which I scale).

Thanks so much for whatever help you can give me!

Cheers,
Melissa  

--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-tp4530651p4530651.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-03 Thread David Winsemius


On Apr 3, 2012, at 9:25 PM, Melrose2012 wrote:

I am trying to plot the logistic regression of a dataset (# of  
living flies
vs days the flies are alive) and then fit a best-fit line to this  
data.


Here is my code:
plot(fflies$living~fflies$day,xlab=Number of Days,ylab=Number of  
Fruit

Flies,main=Number of Living Fruit Flies vs Day,pch=16)
alive - (fflies$living)
dead - (fflies$living[1]-alive)
glm.fit - glm(cbind(alive,dead)~fflies$day,family=binomial)
summary(glm.fit)
lines(sort(fflies$day),fitted(glm.fit) [order (fflies$day)],col =
red,lwd=2)
lines(glm.fit,col = red,lwd=2)


Is this homework?



PLEASE do read the posting guide http://www.R-project.org/posting-guide.html


--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-03 Thread Melrose2012
No, this is related to my own data.

Yes, 'fflies' the dataset - here I am working with two columns:  # of fruit
flies alive vs # of days these flies are alive.

There is no error, it's just that the best-fit line does not plot nicely on
top of my data (see figure attached).
http://r.789695.n4.nabble.com/file/n4530804/fflies1.jpeg 

I then tried to log the raw data as follows and plot:
log.living - log(fflies$living)
plot(log.living~fflies$day,xlab=Number of Days,ylab=Number of Fruit
Flies,main=Number of Living Fruit Flies vs Day,pch=16)
http://r.789695.n4.nabble.com/file/n4530804/fflies2.jpeg 

Any help you can give me would be greatly appreciated.  I am new to R, so
I'm sorry for my beginner's ignorance in learning this program!  :-)

Cheers,
Melissa




--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-tp4530651p4530804.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-03 Thread Bert Gunter
Get help. You do not understand glm's. What do you think the fitted
values are? -- Hint: they are *not* an estimate of the number of
living fruit flies.

-- Bert

On Tue, Apr 3, 2012 at 6:25 PM, Melrose2012
melissa.patric...@stonybrook.edu wrote:
 I am trying to plot the logistic regression of a dataset (# of living flies
 vs days the flies are alive) and then fit a best-fit line to this data.

 Here is my code:
 plot(fflies$living~fflies$day,xlab=Number of Days,ylab=Number of Fruit
 Flies,main=Number of Living Fruit Flies vs Day,pch=16)
 alive - (fflies$living)
 dead - (fflies$living[1]-alive)
 glm.fit - glm(cbind(alive,dead)~fflies$day,family=binomial)
 summary(glm.fit)
 lines(sort(fflies$day),fitted(glm.fit) [order (fflies$day)],col =
 red,lwd=2)
 lines(glm.fit,col = red,lwd=2)

 My problem is that, while I am pretty sure that I did the 'glm' command
 correctly, when I try to plot this as a best-fit line on my data, it does
 not fit (clearly on a different scale somehow).

 Can anyone enlighten me about what I am doing wrong?  Should I be scaling
 one of these somehow?  I've tried various types of scaling but nothing plots
 the line on top of the plot (no matter which I scale).

 Thanks so much for whatever help you can give me!

 Cheers,
 Melissa

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/logistic-regression-tp4530651p4530651.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-03 Thread Melrose2012
Thank you for your reply.  

I do understand that I am working in log space with the default link
function of the binomial being logit.  My problem is, I thought that they
way I had written the code, when I did the lines command, it should plot
the best-fit line (found by 'glm') on top of my graph.  As you can see,
however, this clearly did not work and I am off by several factors of
magnitude.  I am not sure if the problem lies in the way I have done the
'glm' or in the plotting.  

If I understand the function correctly, when I do 'glm' of family=binomial,
I am working in log space (hence the link function default being logit.) 
If this is correct, then doesn't either my data have to be converted to log
space also or the results of my 'glm.fit' be converted to linear space (i.e.
using 'inv.logit' from the boot library)?  This was my thought process
behind plotting the data as log (I tried several other obviously useless
iterations of various scalings.)  It appears that you can not do an
inv.logit on the result from the 'glm' command.

If I try to do this:   inv.logit(glm.fit)  
I get an error saying:  Error in plogis(q, location, scale, lower.tail,
log.p) : Non-numeric argument to mathematical function

I'm pretty sure this is a limitation in my understanding of exactly what R
is doing in this logistic regression and I am having a hard time finding
help online or in the R help files that helps me to decipher this problem.  

Can anyone tell me if I have set up the 'glm' wrong or if the error is in
how I am trying to plot the best-fit line on the data?

I am new to these sorts of statistics and data manipulation, so I
acknowledge my ignorance and appreciate whatever help anyone can give me. 

Thank you again!

Cheers,
Melissa

--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-tp4530651p4530959.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-03 Thread Jorge I Velez
Hi Melissa,

I would highly encourage you to read [1].  It would be extremely beneficial
for your understanding of the type of models you should use in a situation
like yours (count data).

Best regards,
Jorge.-

[1]  cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf



On Wed, Apr 4, 2012 at 1:10 AM, Melrose2012  wrote:

 Thank you for your reply.

 I do understand that I am working in log space with the default link
 function of the binomial being logit.  My problem is, I thought that they
 way I had written the code, when I did the lines command, it should plot
 the best-fit line (found by 'glm') on top of my graph.  As you can see,
 however, this clearly did not work and I am off by several factors of
 magnitude.  I am not sure if the problem lies in the way I have done the
 'glm' or in the plotting.

 If I understand the function correctly, when I do 'glm' of family=binomial,
 I am working in log space (hence the link function default being logit.)
 If this is correct, then doesn't either my data have to be converted to log
 space also or the results of my 'glm.fit' be converted to linear space
 (i.e.
 using 'inv.logit' from the boot library)?  This was my thought process
 behind plotting the data as log (I tried several other obviously useless
 iterations of various scalings.)  It appears that you can not do an
 inv.logit on the result from the 'glm' command.

 If I try to do this:   inv.logit(glm.fit)
 I get an error saying:  Error in plogis(q, location, scale, lower.tail,
 log.p) : Non-numeric argument to mathematical function

 I'm pretty sure this is a limitation in my understanding of exactly what R
 is doing in this logistic regression and I am having a hard time finding
 help online or in the R help files that helps me to decipher this problem.

 Can anyone tell me if I have set up the 'glm' wrong or if the error is in
 how I am trying to plot the best-fit line on the data?

 I am new to these sorts of statistics and data manipulation, so I
 acknowledge my ignorance and appreciate whatever help anyone can give me.

 Thank you again!

 Cheers,
 Melissa

 --
 View this message in context:
 http://r.789695.n4.nabble.com/logistic-regression-tp4530651p4530959.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression

2012-04-03 Thread Jack Tanner
Melrose2012 melissa.patrician at stonybrook.edu writes:

 
 alive - (fflies$living)
 dead - (fflies$living[1]-alive)
 glm.fit - glm(cbind(alive,dead)~fflies$day,family=binomial)
 summary(glm.fit)

Your call to glm() is not doing what you think it's doing. What you want to do
is probably closer to 

glm.fit - glm(living ~ day, data=fflies, family=binomial)

Where ffiles$living has exactly two values, e.g., alive or dead or some
other pair.

After you do this, you want to examine glm.fit very carefully. Make sure you can
interpret all of the output of summary(glm.fit). Until you do, there's no point
in trying to plot.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression

2012-03-28 Thread carlb1
Hi

does anyone know how to do a logistic regression in R? 

any help appreciated 

Carl

--
View this message in context: 
http://r.789695.n4.nabble.com/Logistic-regression-tp4512658p4512658.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression

2012-03-28 Thread Sarah Goslee
On Wed, Mar 28, 2012 at 11:49 AM, carlb1 carl19...@hotmail.com wrote:
 Hi

 does anyone know how to do a logistic regression in R?

I do. And doubtless many other folks on R-help do too.

 any help appreciated

You should probably start here:
http://www.rseek.org/?cx=010923144343702598753%3Aboaz1reyxd4q=logistic+regressionsa=Search+functions%2C+lists%2C+and+morecof=FORID%3A11siteurl=rseek.org%2Fref=

And then move on to here:
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression using package sem

2012-03-17 Thread Kino Aguilar
Hi!

I constructed a model where most of the variables are linearly related to
each other. However, the dependent variable is dichotomous, which means I
should not try to fit a linear relationship between the other variables,
which are continuos, and this variable. I'm wondering how I would have to
go about addressing this issue in sem?

Additionally, this post is relevant to my question:
http://r.789695.n4.nabble.com/Link-functions-in-SEM-td859182.html . The way
I understood this post is that I should use hetcor to compute the
correlations between the variables and then feed these correlations into
sem. While the partial regression coefficients of the continuos variables
associated with the dichotomous variable did change, so did the partial
regression coefficients of all of my other relationships, which should not
be the case. In addition, I am not sure how I should interpret the partial
regression coefficients of the association between the continuos and
dichotomous variable, i.e., do they represent odds?

Thank you in advance.

Best,
~Kino

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression Coding Help

2012-03-14 Thread Craig P O'Connell


Hello. 

I am beginning to analyze my work and have realized that a simple chi-square 
analysis will not suffice for my research, with one notable reason is that data 
are not discrete.  Since my data fit the assumptions of a logistic regression, 
I am moving forward with this analysis.  With that said, I am a beginner with 
R and would grealty appreciate any help!  

Essentially, the point of my work is to determine if sharks are more sensitive 
to repellents based on certain environmental parameters.  

Therefore an incredibly simplified version of my bullsharkdata .txt file would 
look like this (Note:  (1) = low density/visibility and (2) = high 
density/visibility; and behavior = (1) or avoidance behavior) : 
Obs  Density   Visibility  Behavior   Data 
1      1            1         1           
0.9   
2      2            1         1           
0.1 
3      1            2         
1           0.3 
4      2            2         1           0.8 

Here was my attempt at coding: 
bullsharks - read.table(bullsharkdata.txt, header=T, as.is=T) 
#bullsharks is what I named it 
bullsharks$Obs 
bullsharks$Density 
bullsharks$Visibility 
bullsharks$Behavior 
bullsharks$Data 
#or to view all data 
bullsharks 

library(mlogit) 
bullsharks[1:2,] 
bullsharks$Density-as.factor(bullsharks$Density) 
mldata-mlogit.data(bullsharks, varying=NULL, choice=Density, shape=wide) 
mlogit.model - mlogit(Density~1|Visibility+Behavior, data = mldata, 
reflevel=1) 
summary(mlogit.model) 

However, I get an error at the mlogit.model stage.  Is there something wrong 
with my data or with my code?  

Thank you and any help would be incredibly appreciated! 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic regression with non-gaussian random effects

2012-02-13 Thread Line Skotte
Hi,
does anyone know of an implementation of generalized linear models with random 
effects, where the random effects are non-gaussian?

Actually, what I need is to do a logistic regression (or binomial regression) 
where the linear predictor in addition to fixed effects and gaussian random 
effects also has a term

b*z

where z is a random effect variable with p(z=1)=p(z=-1)=0.5 and b is a 
parameter of interest. 

I am very grateful to you for any help or comments.

Sincerely,
Line
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2012-02-07 Thread Terry Therneau
I'm surprised not to see the simple answer: glm models return the MLE
estimate.
fit - glm(y ~ x1 + x2 +  , family='binomial')

There is no need for special packages, this is a standard part of R.

Terry Therneau

 begin included message --
On 02/06/2012 03:08 PM, Ana wrote:
 I am looking for R packages that can make a Logistic Regression model
 with parameter estimation by Maximum Likelihood Estimation.


 Many thanks for helping out.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
Hi Ana,

I worked out some maximum likelihood estimates for logistic regression
in
R on my website: http://www.netstorm.be/home/lrm

Best regards

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression

2012-02-06 Thread Ana
I am looking for R packages that can make a Logistic Regression model
with parameter estimation by Maximum Likelihood Estimation.


Many thanks for helping out.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2012-02-06 Thread Sarah Goslee
Hi,

On Mon, Feb 6, 2012 at 10:08 AM, Ana rrast...@gmail.com wrote:
 I am looking for R packages that can make a Logistic Regression model
 with parameter estimation by Maximum Likelihood Estimation.

How are you looking? Did you perhaps try Google

https://www.google.com/#q=r-help+Logistic+Regression+Maximum+Likelihood+Estimation

or rseek.org

http://www.rseek.org/?cx=010923144343702598753%3Aboaz1reyxd4newwindow=1q=Logistic+Regression+Maximum+Likelihood+Estimationsa=Search+functions%2C+lists%2C+and+morecof=FORID%3A11siteurl=rseek.org%2F

or one of the other R-centric search functions out there, like the
ones built right into R?

How did what you found not meet your needs?

Sarah


 Many thanks for helping out.



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2012-02-06 Thread tw
On 02/06/2012 03:08 PM, Ana wrote:
 I am looking for R packages that can make a Logistic Regression model
 with parameter estimation by Maximum Likelihood Estimation.


 Many thanks for helping out.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
Hi Ana,

I worked out some maximum likelihood estimates for logistic regression in
R on my website: http://www.netstorm.be/home/lrm

Best regards

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression with genetic component

2011-12-04 Thread Ben Bolker
Danielle Duncan dlduncan2 at alaska.edu writes:

 Greetings, I have a question that I'd like to get input on. I have a
 classic toxicology study where I artificially fertilized and exposed
 embryos to a chemical and counted defects. In addition, I kept track of
 male-female pairs that I used to artificially fertilize and generate
 embryos with. I need to use logistic regression to model the response, but
 also check that the genetics of the pairings did or did not have an effect
 on the response. My data looks a bit like this:
 
 response matrix chemical concentration  Genetic Matrix
 Present AbsentMale Female
 2 152   0.13 a 1
 6 121  1 a 2
 21 92  2 b 3
 24 89  5 b 4
 0141 10 c 5
 5 95 15 c  6
 
 R code:
 
 DA-cbind(Present, Absent)
 glm-(DA ~ chemical concentration)
 
 If I do glm-(DA ~ chemical concentration + Male + Female, I get every
 possible combination, but I only want specific pairs. So, I am thinking
 about doing:
 
 MF-cbind(Male, Female)
 glm-(DA ~ chemical concentration + MF)


You're on the right track.  paste() is probably what
you want, although you can also use interaction() to
get the interactions and then droplevels() to get
rid of the unobserved crosses.

d - read.table(textConnection(
Present Absent   conc   Male Female
2 152   0.13 a 1
6 121  1 a 2
21 92  2 b 3
24 89  5 b 4
0141 10 c 5
5 95 15 c  6),
header=TRUE)

Either of these should give you what you want:
  
d - droplevels(transform(d,cross=interaction(Male,Female)))
levels(d$cross)

d - transform(d,cross=paste(Male,Female,sep=.))
levels(d$cross)

You should be a little careful -- if each cross is exposed
only to a single concentration, and if you treat cross as
a fixed effect, you will overparameterize your model.  If
you treat it as a random effect, e.g. using glmer in the
lme4 package:

glmer(cbind(Present,Absent)~conc+(1|cross),data=d)

you will effectively be fitting a model for overdispersion
(see the example in ?cbpp).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression with genetic component

2011-12-03 Thread Danielle Duncan
Greetings, I have a question that I'd like to get input on. I have a
classic toxicology study where I artificially fertilized and exposed
embryos to a chemical and counted defects. In addition, I kept track of
male-female pairs that I used to artificially fertilize and generate
embryos with. I need to use logistic regression to model the response, but
also check that the genetics of the pairings did or did not have an effect
on the response. My data looks a bit like this:

response matrix chemical concentration  Genetic Matrix
Present AbsentMale Female
2 152   0.13 a 1
6 121  1 a 2
21 92  2 b 3
24 89  5 b 4
0141 10 c 5
5 95 15 c  6

R code:

DA-cbind(Present, Absent)
glm-(DA ~ chemical concentration)

If I do glm-(DA ~ chemical concentration + Male + Female, I get every
possible combination, but I only want specific pairs. So, I am thinking
about doing:

MF-cbind(Male, Female)
glm-(DA ~ chemical concentration + MF)

Is this correct? Any help on how to model this would be greatly
appreciated! Thanks in advance!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-02 Thread Patrick Breheny

On 12/01/2011 08:00 PM, Ben quant wrote:

The data I am using is the last file called l_yx.RData at this link (the
second file contains the plots from earlier):
http://scientia.crescat.net/static/ben/


The logistic regression model you are fitting assumes a linear 
relationship between x and the log odds of y; that does not seem to be 
the case for your data.  To illustrate:


x - l_yx[,x]
y - l_yx[,y]
ind1 - x = .002
ind2 - (x  .002  x = .0065)
ind3 - (x  .0065  x = .13)
ind4 - (x  .0065  x = .13)

 summary(glm(y[ind1]~x[ind1],family=binomial))
...
Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  -2.791740.02633 -106.03   2e-16 ***
x[ind1] 354.98852   22.78190   15.58   2e-16 ***

 summary(glm(y[ind2]~x[ind2],family=binomial))
Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  -2.158050.02966 -72.766   2e-16 ***
x[ind2] -59.929346.51650  -9.197   2e-16 ***

 summary(glm(y[ind3]~x[ind3],family=binomial))
...
Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept) -2.367206   0.007781 -304.22   2e-16 ***
x[ind3] 18.104314   0.346562   52.24   2e-16 ***

 summary(glm(y[ind4]~x[ind4],family=binomial))
...
Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -1.315110.08549 -15.383   2e-16 ***
x[ind4]  0.062610.08784   0.7130.476

To summarize, the relationship between x and the log odds of y appears 
to vary dramatically in both magnitude and direction depending on which 
interval of x's range we're looking at.  Trying to summarize this 
complicated pattern with a single line is leading to the fitted 
probabilities near 0 and 1 you are observing (note that only 0.1% of the 
data is in region 4 above, although region 4 accounts for 99.1% of the 
range of x).


--
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
below did not go through.

Hello,

I'm new'ish to R, and very new to glm. I've read a lot about my issue:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

...including:

http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html
http://r.789695.n4.nabble.com/glm-fit-quot-fitted-probabilities-numerically-0-or-1-occurred-quot-td849242.html
(note that I never found: MASS4 pp.197-8  However, Ted's post was quite
helpful.)

This is a common question, sorry. Because it is a common issue I am posting
everything I know about the issue and how I think I am not falling into the
same trap at the others (but I must be due to some reason I am not yet
aware of).

From the two links above I gather that my warning glm.fit: fitted
probabilities numerically 0 or 1 occurred arises from a perfect fit
situation (i.e. the issue where all the high value x's (predictor
variables) are Y=1 (response=1) or the other way around). I don't feel my
data has this issue. Please point out how it does!

The list post instructions state that I can attach pdf's, so I attached
plots of my data right before I do the call to glm.

The attachments are plots of my data stored in variable l_yx (as can be
seen in the axis names):
My response (vertical axis) by row index (horizontal axis):
 plot(l_yx[,1],type='h')
My predictor variable (vertical axis) by row index index (horizontal axis):
 plot(l_yx[,2],type='h')

 So here is more info on my data frame/data (in case you can't see my pdf
attachments):
 unique(l_yx[,1])
[1] 0 1
 mean(l_yx[,2])
[1] 0.01123699
 max(l_yx[,2])
[1] 14.66518
 min(l_yx[,2])
[1] 0
 attributes(l_yx)
$dim
[1] 690303  2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] y x


With the above data I do:
 l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link=logit))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Why am I getting this warning when I have data points of varying values for
y=1 and y=0?  In other words, I don't think I have the linear separation
issue discussed in one of the links I provided.

PS - Then I do this and I get a odds ratio a crazy size:
 l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds
$coefficients[2]
 l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the
coeffcients
 l_exp_coef
   x
3161.781

So for one unit increase in the predictor variable I get 3160.781%
(3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
How do I correct for this issue? (I tried multiplying the predictor
variables by a constant and the odds ratio goes down, but the warning above
still persists and shouldn't the odds ratio be predictor variable size
independent?)

Thank you for your help!

Ben

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread peter dalgaard

On Dec 1, 2011, at 18:54 , Ben quant wrote:

 Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
 below did not go through.

Still not there. Sometimes it's because your mailer doesn't label them with the 
appropriate mime-type (e.g. as application/octet-stream, which is arbitrary 
binary). Anyways, see below

[snip]
 
 With the above data I do:
l_logit = glm(y~x, data=as.data.frame(l_yx),
 family=binomial(link=logit))
 Warning message:
 glm.fit: fitted probabilities numerically 0 or 1 occurred
 
 Why am I getting this warning when I have data points of varying values for
 y=1 and y=0?  In other words, I don't think I have the linear separation
 issue discussed in one of the links I provided.

I bet that you do... You can get the warning without that effect (one of my own 
examples is  the probability of menarche in a data set that includes infants 
and old age pensioners), but not with a huge odds ratio as well. Take a look at 

d - as.data.frame(l_yx) 
with(d, y[order(x)])

if it comes out as all zeros followed by all ones or vice versa, then you have 
the problem.


 
 PS - Then I do this and I get a odds ratio a crazy size:
l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds
 $coefficients[2]
l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the
 coeffcients
l_exp_coef
   x
 3161.781
 
 So for one unit increase in the predictor variable I get 3160.781%
 (3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
 How do I correct for this issue? (I tried multiplying the predictor
 variables by a constant and the odds ratio goes down, but the warning above
 still persists and shouldn't the odds ratio be predictor variable size
 independent?)


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Thank you for the feedback, but my data looks fine to me. Please tell me if
I'm not understanding.

I followed your instructions and here is a sample of the first 500 values :
(info on 'd' is below that)

  d - as.data.frame(l_yx)
 x = with(d, y[order(x)])
 x[1:500] # I have 1's and 0's dispersed throughout
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
[301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

# I get the warning still
 l_df = as.data.frame(l_yx)
 l_logit = glm(y~x, data=l_df, family=binomial(link=logit))

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

# some info on 'd' above:

 d[1:500,1]
  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 d[1:500,2]
  [1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01
0.00e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02
2.080511e-02 1.642404e-01 3.703720e-03 8.901981e-02 1.260415e-03
 [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03
0.00e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02
2.954641e-04 1.434859e-03 6.964976e-04 0.00e+00 1.202162e-02
 [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04
1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01
3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
 [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02
2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02
0.00e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
 [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.774966e-02
2.433198e-02 1.371257e-02 6.732339e-02 9.747677e-03 8.180594e-03
1.882473e-01 1.682596e-03 1.410170e-02 8.429676e-03 3.520589e-01
 [76] 1.865160e-02 2.389941e-02 3.301558e-02 4.896605e-03 6.826244e-03
9.151337e-02 1.722270e-02 1.764039e-01 3.554063e-02 7.338532e-03
9.577793e-03 3.632366e-03 1.422400e-02 3.020497e-02 1.797096e-02
 [91] 2.577899e-03 1.338365e-01 8.202223e-03 1.088335e-02 2.137259e-02
6.829797e-03 3.104860e-04 8.927160e-03 5.626881e-02 1.197682e-02
6.140553e-02 2.729364e-02 1.377913e-02 3.018277e-03 1.188304e-02
[106] 2.029268e-03 2.108815e-02 1.765503e-03 2.449253e-02 3.677046e-03
1.013463e-02 4.286642e-02 1.238931e-02 3.072090e-03 1.509613e-02
3.609885e-02 5.537268e-03 3.151614e-02 3.703148e-03 1.409401e-01
[121] 1.473365e-02 9.160461e-03 1.035099e-01 3.005865e-02 1.332199e-02
6.936535e-03 2.433539e-02 4.935373e-03 4.822740e-03 1.597643e-03
3.805619e-03 1.092683e-02 1.760635e-01 5.565614e-03 7.739213e-04
[136] 4.529198e-03 5.110359e-03 7.391258e-02 5.018207e-03 2.071417e-02
1.825787e-02 9.141405e-03 1.078386e-01 2.171470e-04 7.164583e-03
2.522077e-02 1.994428e-03 6.044653e-03 1.778078e-04 5.797706e-03
[151] 1.526778e-02 1.595897e-02 1.995627e-01 1.946735e-03 6.711582e-02
2.722350e-02 3.127499e-02 1.074904e-01 2.477414e-03 5.015375e-02
3.417810e-02 2.046643e-02 1.644832e-02 5.220166e-03 1.217752e-02
[166] 1.187352e-02 1.634016e-02 2.349568e-03 3.451811e-02 2.593540e-03
6.868595e-03 1.311236e-02 6.457757e-03 2.942391e-04 1.628352e-02
8.288831e-03 3.170856e-04 

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread peter dalgaard

On Dec 1, 2011, at 21:32 , Ben quant wrote:

 Thank you for the feedback, but my data looks fine to me. Please tell me if 
 I'm not understanding.

Hum, then maybe it really is a case of a transition region being short relative 
to the range of your data. Notice that the warning is just that: a warning. I 
do notice that the distribution of your x values is rather extreme -- you 
stated a range of 0--14 and a mean of 0.01. And after all, an odds ratio of 
3000 per unit is only a tad over a doubling per 0.1 units.

Have a look at  range(x[y==0]) and range(x[y==1]). 


 
 I followed your instructions and here is a sample of the first 500 values : 
 (info on 'd' is below that)
 
   d - as.data.frame(l_yx) 
  x = with(d, y[order(x)])
  x[1:500] # I have 1's and 0's dispersed throughout 
   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
 [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 
 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 
 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
 [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 
 # I get the warning still
  l_df = as.data.frame(l_yx)
  l_logit = glm(y~x, data=l_df, family=binomial(link=logit))
 
 Warning message:
 glm.fit: fitted probabilities numerically 0 or 1 occurred 
 
 # some info on 'd' above:
 
  d[1:500,1]
   [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  d[1:500,2]
   [1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01 
 0.00e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02 2.080511e-02 
 1.642404e-01 3.703720e-03  8.901981e-02 1.260415e-03
  [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03 
 0.00e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02 2.954641e-04 
 1.434859e-03 6.964976e-04 0.00e+00 1.202162e-02
  [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04 
 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01 3.293581e-02 
 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
  [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02 
 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02 0.00e+00 
 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
  [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.774966e-02 
 2.433198e-02 1.371257e-02 6.732339e-02 9.747677e-03 8.180594e-03 1.882473e-01 
 1.682596e-03 1.410170e-02 8.429676e-03 3.520589e-01
  [76] 1.865160e-02 2.389941e-02 3.301558e-02 4.896605e-03 6.826244e-03 
 9.151337e-02 1.722270e-02 1.764039e-01 3.554063e-02 7.338532e-03 9.577793e-03 
 3.632366e-03 1.422400e-02 3.020497e-02 1.797096e-02
  [91] 2.577899e-03 1.338365e-01 8.202223e-03 1.088335e-02 2.137259e-02 
 6.829797e-03 3.104860e-04 8.927160e-03 5.626881e-02 1.197682e-02 6.140553e-02 
 2.729364e-02 1.377913e-02 3.018277e-03 1.188304e-02
 [106] 2.029268e-03 2.108815e-02 1.765503e-03 2.449253e-02 3.677046e-03 
 1.013463e-02 4.286642e-02 1.238931e-02 3.072090e-03 1.509613e-02 3.609885e-02 
 5.537268e-03 3.151614e-02 3.703148e-03 1.409401e-01
 [121] 1.473365e-02 9.160461e-03 1.035099e-01 3.005865e-02 1.332199e-02 
 6.936535e-03 2.433539e-02 4.935373e-03 4.822740e-03 1.597643e-03 3.805619e-03 
 1.092683e-02 1.760635e-01 

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Here you go:

 attach(as.data.frame(l_yx))
 range(x[y==1])
[1] -22500.746.
  range(x[y==0])
[1] -10076.5303653.0228

How do I know what is acceptable?

Also, here are the screen shots of my data that I tried to send earlier
(two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you,

Ben

On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard pda...@gmail.com wrote:


 On Dec 1, 2011, at 21:32 , Ben quant wrote:

  Thank you for the feedback, but my data looks fine to me. Please tell me
 if I'm not understanding.

 Hum, then maybe it really is a case of a transition region being short
 relative to the range of your data. Notice that the warning is just that: a
 warning. I do notice that the distribution of your x values is rather
 extreme -- you stated a range of 0--14 and a mean of 0.01. And after all,
 an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.

 Have a look at  range(x[y==0]) and range(x[y==1]).


 
  I followed your instructions and here is a sample of the first 500
 values : (info on 'd' is below that)
 
d - as.data.frame(l_yx)
   x = with(d, y[order(x)])
   x[1:500] # I have 1's and 0's dispersed throughout
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
  [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 
  # I get the warning still
   l_df = as.data.frame(l_yx)
   l_logit = glm(y~x, data=l_df, family=binomial(link=logit))
 
  Warning message:
  glm.fit: fitted probabilities numerically 0 or 1 occurred
 
  # some info on 'd' above:
 
   d[1:500,1]
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   d[1:500,2]
[1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01
 0.00e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02
 2.080511e-02 1.642404e-01 3.703720e-03  8.901981e-02 1.260415e-03
   [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03
 0.00e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02
 2.954641e-04 1.434859e-03 6.964976e-04 0.00e+00 1.202162e-02
   [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04
 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01
 3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
   [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02
 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02
 0.00e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
   [61] 5.258664e-02 1.213126e-02 1.493275e-02 8.152657e-03 1.774966e-02
 2.433198e-02 1.371257e-02 6.732339e-02 9.747677e-03 8.180594e-03
 1.882473e-01 1.682596e-03 1.410170e-02 8.429676e-03 3.520589e-01
   [76] 1.865160e-02 2.389941e-02 3.301558e-02 4.896605e-03 6.826244e-03
 9.151337e-02 1.722270e-02 1.764039e-01 3.554063e-02 7.338532e-03
 9.577793e-03 3.632366e-03 1.422400e-02 3.020497e-02 1.797096e-02
   [91] 2.577899e-03 1.338365e-01 8.202223e-03 1.088335e-02 2.137259e-02
 6.829797e-03 3.104860e-04 8.927160e-03 5.626881e-02 1.197682e-02
 6.140553e-02 2.729364e-02 

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Oops! Please ignore my last post. I mistakenly gave you different data I
was testing with. This is the correct data:

Here you go:

 attach(as.data.frame(l_yx))
  range(x[y==0])
[1]  0.0 14.66518
 range(x[y==1])
[1]  0.0 13.49791

How do I know what is acceptable?

Also, here are the screen shots of my data that I tried to send earlier
(two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you,

Ben

On Thu, Dec 1, 2011 at 3:07 PM, Ben quant ccqu...@gmail.com wrote:

 Here you go:

  attach(as.data.frame(l_yx))
  range(x[y==1])
 [1] -22500.746.
   range(x[y==0])
 [1] -10076.5303653.0228

 How do I know what is acceptable?

 Also, here are the screen shots of my data that I tried to send earlier
 (two screen shots, two pages):
 http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

 Thank you,

 Ben


 On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard pda...@gmail.com wrote:


 On Dec 1, 2011, at 21:32 , Ben quant wrote:

  Thank you for the feedback, but my data looks fine to me. Please tell
 me if I'm not understanding.

 Hum, then maybe it really is a case of a transition region being short
 relative to the range of your data. Notice that the warning is just that: a
 warning. I do notice that the distribution of your x values is rather
 extreme -- you stated a range of 0--14 and a mean of 0.01. And after all,
 an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.

 Have a look at  range(x[y==0]) and range(x[y==1]).


 
  I followed your instructions and here is a sample of the first 500
 values : (info on 'd' is below that)
 
d - as.data.frame(l_yx)
   x = with(d, y[order(x)])
   x[1:500] # I have 1's and 0's dispersed throughout
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
  [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 
  # I get the warning still
   l_df = as.data.frame(l_yx)
   l_logit = glm(y~x, data=l_df, family=binomial(link=logit))
 
  Warning message:
  glm.fit: fitted probabilities numerically 0 or 1 occurred
 
  # some info on 'd' above:
 
   d[1:500,1]
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   d[1:500,2]
[1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01
 0.00e+00 5.577064e-02 7.753926e-03 4.018553e-03 4.760918e-02
 2.080511e-02 1.642404e-01 3.703720e-03  8.901981e-02 1.260415e-03
   [16] 2.202523e-02 3.750940e-02 4.441975e-04 9.351171e-03 8.374567e-03
 0.00e+00 8.440448e-02 5.081017e-01 2.538640e-05 1.806017e-02
 2.954641e-04 1.434859e-03 6.964976e-04 0.00e+00 1.202162e-02
   [31] 3.420300e-03 4.276100e-02 1.457324e-02 4.140121e-03 1.349180e-04
 1.525292e-03 4.817502e-02 9.515717e-03 2.232953e-02 1.227908e-01
 3.293581e-02 1.454352e-02 1.176011e-03 6.274138e-02 2.879205e-02
   [46] 6.900926e-03 1.414648e-04 3.446349e-02 8.807174e-03 3.549332e-02
 2.828509e-03 2.935003e-02 7.162872e-03 5.650050e-03 1.221191e-02
 0.00e+00 2.126334e-02 2.052020e-02 7.542409e-02 2.586269e-04
   [61] 5.258664e-02 1.213126e-02 

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
I'm not proposing this as a permanent solution, just investigating the
warning. I zeroed out the three outliers and received no warning. Can
someone tell me why I am getting no warning now?

I did this 3 times to get rid of the 3 outliers:
mx_dims = arrayInd(which.max(l_yx), dim(l_yx))
l_yx[mx_dims] = 0

Now this does not produce an warning:
l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link=logit))

Can someone tell me why occurred?

Also, again, here are the screen shots of my data that I tried to send
earlier (two screen shots, two pages):
http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

Thank you for your help,

Ben

On Thu, Dec 1, 2011 at 3:25 PM, Ben quant ccqu...@gmail.com wrote:

 Oops! Please ignore my last post. I mistakenly gave you different data I
 was testing with. This is the correct data:

 Here you go:

  attach(as.data.frame(l_yx))
   range(x[y==0])
 [1]  0.0 14.66518
  range(x[y==1])
 [1]  0.0 13.49791


 How do I know what is acceptable?

 Also, here are the screen shots of my data that I tried to send earlier
 (two screen shots, two pages):
 http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

 Thank you,

 Ben

 On Thu, Dec 1, 2011 at 3:07 PM, Ben quant ccqu...@gmail.com wrote:

 Here you go:

  attach(as.data.frame(l_yx))
  range(x[y==1])
 [1] -22500.746.
   range(x[y==0])
 [1] -10076.5303653.0228

 How do I know what is acceptable?

 Also, here are the screen shots of my data that I tried to send earlier
 (two screen shots, two pages):
 http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf

 Thank you,

 Ben


 On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard pda...@gmail.com wrote:


 On Dec 1, 2011, at 21:32 , Ben quant wrote:

  Thank you for the feedback, but my data looks fine to me. Please tell
 me if I'm not understanding.

 Hum, then maybe it really is a case of a transition region being short
 relative to the range of your data. Notice that the warning is just that: a
 warning. I do notice that the distribution of your x values is rather
 extreme -- you stated a range of 0--14 and a mean of 0.01. And after all,
 an odds ratio of 3000 per unit is only a tad over a doubling per 0.1 units.

 Have a look at  range(x[y==0]) and range(x[y==1]).


 
  I followed your instructions and here is a sample of the first 500
 values : (info on 'd' is below that)
 
d - as.data.frame(l_yx)
   x = with(d, y[order(x)])
   x[1:500] # I have 1's and 0's dispersed throughout
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
  [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
 
  # I get the warning still
   l_df = as.data.frame(l_yx)
   l_logit = glm(y~x, data=l_df, family=binomial(link=logit))
 
  Warning message:
  glm.fit: fitted probabilities numerically 0 or 1 occurred
 
  # some info on 'd' above:
 
   d[1:500,1]
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [301] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   d[1:500,2]
[1] 3.023160e-03 7.932130e-02 0.00e+00 1.779657e-02 1.608374e-01
 0.00e+00 5.577064e-02 7.753926e-03 

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread peter dalgaard

On Dec 1, 2011, at 23:43 , Ben quant wrote:

 I'm not proposing this as a permanent solution, just investigating the 
 warning. I zeroed out the three outliers and received no warning. Can someone 
 tell me why I am getting no warning now?

It's easier to explain why you got the warning before. If the OR for a one unit 
change is 3000, the OR for a 14 unit change is on the order of 10^48 and that 
causes over/underflow in the conversion to probabilities.

I'm still baffled at how you can get that model fitted to your data, though. 
One thing is that you can have situations where there are fitted probabilities 
of one corresponding to data that are all one and/or fitted zeros where data 
are zero, but you seem to have cases where you have both zeros and ones at both 
ends of the range of x. Fitting a zero to a one or vice versa would make the 
likelihood zero, so you'd expect that the algorithm would find a better set of 
parameters rather quickly. Perhaps the extremely large number of observations 
that you have has something to do with it? 

You'll get the warning if the fitted zeros or ones occur at any point of the 
iterative procedure. Maybe it isn't actually true for the final model, but that 
wouldn't seem consistent with the OR that you cited.

Anyways, your real problem lies with the distribution of the x values. I'd want 
to try transforming it to something more sane. Taking logarithms is the obvious 
idea, but you'd need to find out what to do about the zeros -- perhaps log(x + 
1e-4) ? Or maybe just cut the outliers down to size with pmin(x,1).

 
 I did this 3 times to get rid of the 3 outliers:
 mx_dims = arrayInd(which.max(l_yx), dim(l_yx))
 l_yx[mx_dims] = 0
 
 Now this does not produce an warning:
 l_logit = glm(y~x, data=as.data.frame(l_yx), family=binomial(link=logit)) 
 
 Can someone tell me why occurred?
 
 Also, again, here are the screen shots of my data that I tried to send 
 earlier (two screen shots, two pages):
 http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
 
 Thank you for your help,
 
 Ben
 
 On Thu, Dec 1, 2011 at 3:25 PM, Ben quant ccqu...@gmail.com wrote:
 Oops! Please ignore my last post. I mistakenly gave you different data I was 
 testing with. This is the correct data:
 
 Here you go:
 
  attach(as.data.frame(l_yx))
   range(x[y==0])
 [1]  0.0 14.66518
  range(x[y==1])
 [1]  0.0 13.49791
 
 
 How do I know what is acceptable? 
 
 Also, here are the screen shots of my data that I tried to send earlier (two 
 screen shots, two pages):
 http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
 
 Thank you,
 
 Ben
 
 On Thu, Dec 1, 2011 at 3:07 PM, Ben quant ccqu...@gmail.com wrote:
 Here you go:
 
  attach(as.data.frame(l_yx))
  range(x[y==1])
 [1] -22500.746.
   range(x[y==0])
 [1] -10076.5303653.0228
 
 How do I know what is acceptable? 
 
 Also, here are the screen shots of my data that I tried to send earlier (two 
 screen shots, two pages):
 http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
 
 Thank you,
 
 Ben
 
 
 On Thu, Dec 1, 2011 at 2:24 PM, peter dalgaard pda...@gmail.com wrote:
 
 On Dec 1, 2011, at 21:32 , Ben quant wrote:
 
  Thank you for the feedback, but my data looks fine to me. Please tell me if 
  I'm not understanding.
 
 Hum, then maybe it really is a case of a transition region being short 
 relative to the range of your data. Notice that the warning is just that: a 
 warning. I do notice that the distribution of your x values is rather extreme 
 -- you stated a range of 0--14 and a mean of 0.01. And after all, an odds 
 ratio of 3000 per unit is only a tad over a doubling per 0.1 units.
 
 Have a look at  range(x[y==0]) and range(x[y==1]).
 
 
 
  I followed your instructions and here is a sample of the first 500 values : 
  (info on 'd' is below that)
 
d - as.data.frame(l_yx)
   x = with(d, y[order(x)])
   x[1:500] # I have 1's and 0's dispersed throughout
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
  0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0
  [101] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
  0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [201] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 
  0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 
  1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0
  [301] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
  0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 
  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [401] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 
  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
  0 0 0 0 0 0 0 0 0 0 

Re: [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

2011-12-01 Thread Ben quant
Thank you so much for your help.

The data I am using is the last file called l_yx.RData at this link (the
second file contains the plots from earlier):
http://scientia.crescat.net/static/ben/

Seems like the warning went away with pmin(x,1) but now the OR is over
15k.  If I multiple my x's by 1000 I get a much more realistic OR. So I
guess this brings me to a much different question: aren't OR's comparable
between factors/data? In this case they don't seem to be. However, with
different data the OR's only change a very small amount (+8.0e-4) when I
multiply the x's by 1000. I don't understand.

Anyways, here is a run with the raw data and a run with your suggestion
(pmin(x,1)) that removed the error:

 l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link=logit))

 l_logit

Call:  glm(formula = y ~ x, family = binomial(link = logit), data =
as.data.frame(l_yx))

Coefficients:
(Intercept)x
 -2.2938.059

Degrees of Freedom: 690302 Total (i.e. Null);  690301 Residual
Null Deviance:  448800
Residual Deviance: 447100   AIC: 447100

 l_exp_coef = exp(l_logit$coefficients)[2]

 l_exp_coef
   x
3161.781

 dim(l_yx)
[1] 690303  2

 l_yx = cbind(l_yx[,1],pmin(l_yx[,2],1))

 dim(l_yx)
[1] 690303  2

 colnames(l_yx) = c('y','x')

 mean(l_yx[,2])
[1] 0.01117248

 range(l_yx[,2])
[1] 0 1

 head(l_yx[,2])
[1] 0.00302316 0.07932130 0. 0.01779657 0.16083735 0.

 unique(l_yx[,1])
[1] 0 1

 l_logit = glm(y~x, data=as.data.frame(l_yx),
family=binomial(link=logit))

 l_logit

Call:  glm(formula = y ~ x, family = binomial(link = logit), data =
as.data.frame(l_yx))

Coefficients:
(Intercept)x
 -2.3129.662

Degrees of Freedom: 690302 Total (i.e. Null);  690301 Residual
Null Deviance:  448800
Residual Deviance: 446800   AIC: 446800

 l_exp_coef = exp(l_logit$coefficients)[2]

 l_exp_coef
   x
15709.52


Thanks,

Ben



On Thu, Dec 1, 2011 at 4:32 PM, peter dalgaard pda...@gmail.com wrote:


 On Dec 1, 2011, at 23:43 , Ben quant wrote:

  I'm not proposing this as a permanent solution, just investigating the
 warning. I zeroed out the three outliers and received no warning. Can
 someone tell me why I am getting no warning now?

 It's easier to explain why you got the warning before. If the OR for a one
 unit change is 3000, the OR for a 14 unit change is on the order of 10^48
 and that causes over/underflow in the conversion to probabilities.

 I'm still baffled at how you can get that model fitted to your data,
 though. One thing is that you can have situations where there are fitted
 probabilities of one corresponding to data that are all one and/or fitted
 zeros where data are zero, but you seem to have cases where you have both
 zeros and ones at both ends of the range of x. Fitting a zero to a one or
 vice versa would make the likelihood zero, so you'd expect that the
 algorithm would find a better set of parameters rather quickly. Perhaps the
 extremely large number of observations that you have has something to do
 with it?

 You'll get the warning if the fitted zeros or ones occur at any point of
 the iterative procedure. Maybe it isn't actually true for the final model,
 but that wouldn't seem consistent with the OR that you cited.

 Anyways, your real problem lies with the distribution of the x values. I'd
 want to try transforming it to something more sane. Taking logarithms is
 the obvious idea, but you'd need to find out what to do about the zeros --
 perhaps log(x + 1e-4) ? Or maybe just cut the outliers down to size with
 pmin(x,1).

 
  I did this 3 times to get rid of the 3 outliers:
  mx_dims = arrayInd(which.max(l_yx), dim(l_yx))
  l_yx[mx_dims] = 0
 
  Now this does not produce an warning:
  l_logit = glm(y~x, data=as.data.frame(l_yx),
 family=binomial(link=logit))
 
  Can someone tell me why occurred?
 
  Also, again, here are the screen shots of my data that I tried to send
 earlier (two screen shots, two pages):
  http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
 
  Thank you for your help,
 
  Ben
 
  On Thu, Dec 1, 2011 at 3:25 PM, Ben quant ccqu...@gmail.com wrote:
  Oops! Please ignore my last post. I mistakenly gave you different data I
 was testing with. This is the correct data:
 
  Here you go:
 
   attach(as.data.frame(l_yx))
range(x[y==0])
  [1]  0.0 14.66518
   range(x[y==1])
  [1]  0.0 13.49791
 
 
  How do I know what is acceptable?
 
  Also, here are the screen shots of my data that I tried to send earlier
 (two screen shots, two pages):
  http://scientia.crescat.net/static/ben/warn%20num%200%20or%201.pdf
 
  Thank you,
 
  Ben
 
  On Thu, Dec 1, 2011 at 3:07 PM, Ben quant ccqu...@gmail.com wrote:
  Here you go:
 
   attach(as.data.frame(l_yx))
   range(x[y==1])
  [1] -22500.746.
range(x[y==0])
  [1] -10076.5303653.0228
 
  How do I know what is acceptable?
 
  Also, here are the screen shots of my data that I tried to send earlier
 (two 

[R] logistic regression by glm

2011-11-20 Thread tujchl
HI 

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) + 
as.factor(2281517), family = binomial())

Deviance Residuals: 
Min 1Q Median 3Q Max 
-1.5370 -1.0431 -0.9416 1.3065 1.4331 

Coefficients:
Estimate Std. Error z value Pr(|z|) 
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349 
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) + 
as.factor(7161521) - 1, family = binomial())

Deviance Residuals: 
Min 1Q Median 3Q Max 
-1.5370 -1.0431 -0.9416 1.3065 1.4331 

Coefficients:
Estimate Std. Error z value Pr(|z|) 
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663 
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349 
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?

Thank you in advance 


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread Uwe Ligges



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an 
intercept unless you assume that it is exactly zero for the first levels 
of your factors. Think about the contrasts you are interested in. Looks 
like not the default?


Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread Uwe Ligges



On 20.11.2011 16:58, 屠鞠传礼 wrote:

Thank you Ligges :)
one more question:
my response value diagnostic have 4 levels (0, 1, 2 and 3), so I use it like 
this:
as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)
Is it all right?



Uhh. 4 levels? Than I doubt logistic regression is the right tool for 
you. Please revisit the theory first: It is intended for 2 levels...



Uwe Ligges










在 2011-11-20 23:45:23,Uwe Liggeslig...@statistik.tu-dortmund.de  写道:



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an
intercept unless you assume that it is exactly zero for the first levels
of your factors. Think about the contrasts you are interested in. Looks
like not the default?

Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread Uwe Ligges



On 20.11.2011 17:27, 屠鞠传礼 wrote:

I worried it too, Do you have idear that what tools I can use?



Depends on your aims - what you want to do with the fitted model.
A multinomial model, some kind of discriminant analysis (lda, qda), tree 
based methods, svm and so son come to mind. You probably want to discuss 
this on some statistics mailing list/forum or among local experts rather 
than on the R list. Since this is actually not that R releated.


Uwe Ligges








在 2011-11-21 00:13:26,Uwe Liggeslig...@statistik.tu-dortmund.de  写道:



On 20.11.2011 16:58, 屠鞠传礼 wrote:

Thank you Ligges :)
one more question:
my response value diagnostic have 4 levels (0, 1, 2 and 3), so I use it like 
this:
as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)
Is it all right?



Uhh. 4 levels? Than I doubt logistic regression is the right tool for
you. Please revisit the theory first: It is intended for 2 levels...


Uwe Ligges










在 2011-11-20 23:45:23,Uwe Liggeslig...@statistik.tu-dortmund.de   写道:



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both response and
predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but look this:
Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an
intercept unless you assume that it is exactly zero for the first levels
of your factors. Think about the contrasts you are interested in. Looks
like not the default?

Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread 屠鞠传礼
I worried it too, Do you have idear that what tools I can use?





ÔÚ 2011-11-21 00:13:26£¬Uwe Ligges lig...@statistik.tu-dortmund.de дµÀ£º


On 20.11.2011 16:58, ÍÀ¾Ï´«Àñ wrote:
 Thank you Ligges :)
 one more question:
 my response value diagnostic have 4 levels (0, 1, 2 and 3), so I use it 
 like this:
 as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)
 Is it all right?


Uhh. 4 levels? Than I doubt logistic regression is the right tool for 
you. Please revisit the theory first: It is intended for 2 levels...


Uwe Ligges









 ÔÚ 2011-11-20 23:45:23£¬Uwe Liggeslig...@statistik.tu-dortmund.de  дµÀ£º


 On 20.11.2011 12:46, tujchl wrote:
 HI

 I use glm in R to do logistic regression. and treat both response and
 predictor as factor
 In my first try:

 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
 as.factor(2281517), family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
 as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
 as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
 as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 678.55 on 498 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 And I remodel it and *want no intercept*:
 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
 as.factor(7161521) - 1, family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
 as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
 as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
 as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
 as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 691.76 on 499 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 *As show above in my second model it return no intercept but look this:
 Model1:
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 Model2:
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

 They are exactly the same. Could you please tell me what happen?

 Actually it does not make sense to estimate the model without an
 intercept unless you assume that it is exactly zero for the first levels
 of your factors. Think about the contrasts you are interested in. Looks
 like not the default?

 Uwe Ligges



 Thank you in advance


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread 屠鞠传礼
Thank you Ligges :)
one more question:
my response value diagnostic have 4 levels (0, 1, 2 and 3), so I use it like 
this:
as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517) 
Is it all right?
 



ÔÚ 2011-11-20 23:45:23£¬Uwe Ligges lig...@statistik.tu-dortmund.de дµÀ£º


On 20.11.2011 12:46, tujchl wrote:
 HI

 I use glm in R to do logistic regression. and treat both response and
 predictor as factor
 In my first try:

 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
 as.factor(2281517), family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
 as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
 as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
 as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 678.55 on 498 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 And I remodel it and *want no intercept*:
 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
 as.factor(7161521) - 1, family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
 as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
 as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
 as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
 as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 691.76 on 499 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 *As show above in my second model it return no intercept but look this:
 Model1:
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 Model2:
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

 They are exactly the same. Could you please tell me what happen?

Actually it does not make sense to estimate the model without an 
intercept unless you assume that it is exactly zero for the first levels 
of your factors. Think about the contrasts you are interested in. Looks 
like not the default?

Uwe Ligges



 Thank you in advance


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression by glm

2011-11-20 Thread 屠鞠传礼
Thank you very much :)
I search on net and find sometimes response value in logistic model can have 
more than 2 values, and the way of this kinds of regression is called Ordinal 
Logistic Regression. and even we can caculate it by the same way I mean glm in 
R.
here are some references:
1. http://en.wikipedia.org/wiki/Ordered_logit
2. http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node62.html
above two tell us what is Ordinal Logistic Regression.
3. http://www.ats.ucla.edu/stat/r/dae/ologit.htm
this show that we can use glm to model it





ÔÚ 2011-11-21 00:56:33£¬Uwe Ligges lig...@statistik.tu-dortmund.de дµÀ£º


On 20.11.2011 17:27, ÍÀ¾Ï´«Àñ wrote:
 I worried it too, Do you have idear that what tools I can use?


Depends on your aims - what you want to do with the fitted model.
A multinomial model, some kind of discriminant analysis (lda, qda), tree 
based methods, svm and so son come to mind. You probably want to discuss 
this on some statistics mailing list/forum or among local experts rather 
than on the R list. Since this is actually not that R releated.

Uwe Ligges







 ÔÚ 2011-11-21 00:13:26£¬Uwe Liggeslig...@statistik.tu-dortmund.de  дµÀ£º


 On 20.11.2011 16:58, ÍÀ¾Ï´«Àñ wrote:
 Thank you Ligges :)
 one more question:
 my response value diagnostic have 4 levels (0, 1, 2 and 3), so I use it 
 like this:
 as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)
 Is it all right?


 Uhh. 4 levels? Than I doubt logistic regression is the right tool for
 you. Please revisit the theory first: It is intended for 2 levels...


 Uwe Ligges









 ÔÚ 2011-11-20 23:45:23£¬Uwe Liggeslig...@statistik.tu-dortmund.de   
 дµÀ£º


 On 20.11.2011 12:46, tujchl wrote:
 HI

 I use glm in R to do logistic regression. and treat both response and
 predictor as factor
 In my first try:

 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
 as.factor(2281517), family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
 as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
 as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
 as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 678.55 on 498 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 And I remodel it and *want no intercept*:
 ***
 Call:
 glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
 as.factor(7161521) - 1, family = binomial())

 Deviance Residuals:
 Min 1Q Median 3Q Max
 -1.5370 -1.0431 -0.9416 1.3065 1.4331

 Coefficients:
 Estimate Std. Error z value Pr(|z|)
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
 as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
 as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
 as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
 as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
 ---
 Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯ 0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1

 (Dispersion parameter for binomial family taken to be 1)

 Null deviance: 691.76 on 499 degrees of freedom
 Residual deviance: 671.20 on 494 degrees of freedom
 AIC: 681.2

 Number of Fisher Scoring iterations: 4
 ***

 *As show above in my second model it return no intercept but look this:
 Model1:
 (Intercept) -0.58363 0.27948 -2.088 0.0368 *
 Model2:
 as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

 They are exactly the same. Could you please tell me what happen?

 Actually it does not make sense to estimate the model without an
 intercept unless you assume that it is exactly zero for the first levels
 of your factors. Think about the contrasts you are interested in. Looks
 like not the default?

 Uwe Ligges



 Thank you in advance


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] logistic regression by glm

2011-11-20 Thread David Winsemius


On Nov 20, 2011, at 7:26 PM, 屠鞠传礼 wrote:


Thank you very much :)
I search on net and find sometimes response value in logistic model  
can have more than 2 values, and the way of this kinds of regression  
is called Ordinal Logistic Regression. and even we can caculate it  
by the same way I mean glm in R.

here are some references:
1. http://en.wikipedia.org/wiki/Ordered_logit
2. http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node62.html
above two tell us what is Ordinal Logistic Regression.
3. http://www.ats.ucla.edu/stat/r/dae/ologit.htm
this show that we can use glm to model it


When I looked through the UCLA code it appeared they were using the  
Design package (now superseded by the `rms` package) and that the  
function was `lrm` rather than `glm`. In addition to Harrell's  
excellent text which has a full chapter on this topic you might also  
want to look at Laura Thompson's Companion to Agresti's text:


https://home.comcast.net/~lthompson221/Splusdiscrete2.pdf

--
David.



ÔÚ 2011-11-21 00:56:33£¬Uwe Ligges lig...@statistik.tu- 
dortmund.de дµÀ£º



On 20.11.2011 17:27, ÍÀ¾Ï´«Àñ wrote:

I worried it too, Do you have idear that what tools I can use?



Depends on your aims - what you want to do with the fitted model.
A multinomial model, some kind of discriminant analysis (lda, qda),  
tree
based methods, svm and so son come to mind. You probably want to  
discuss
this on some statistics mailing list/forum or among local experts  
rather

than on the R list. Since this is actually not that R releated.

Uwe Ligges








ÔÚ 2011-11-21 00:13:26£¬Uwe Liggeslig...@statistik.tu- 
dortmund.de  дµÀ£º



On 20.11.2011 16:58, ÍÀ¾Ï´«Àñ wrote:

Thank you Ligges :)
one more question:
my response value diagnostic have 4 levels (0, 1, 2 and 3), so  
I use it like this:

as.factor(diagnostic) ~ as.factor(7161521) +as.factor(2281517)
Is it all right?



Uhh. 4 levels? Than I doubt logistic regression is the right tool  
for
you. Please revisit the theory first: It is intended for 2  
levels...



Uwe Ligges










ÔÚ 2011-11-20 23:45:23£¬Uwe Liggeslig...@statistik.tu-dortmun 
d.de   дµÀ£º



On 20.11.2011 12:46, tujchl wrote:

HI

I use glm in R to do logistic regression. and treat both  
response and

predictor as factor
In my first try:

***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(7161521) +
as.factor(2281517), family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
as.factor(7161521)2 1.39811 0.66618 2.099 0.0358 *
as.factor(7161521)3 0.28192 0.83255 0.339 0.7349
as.factor(2281517)2 -1.11284 0.63692 -1.747 0.0806 .
as.factor(2281517)3 -0.02286 0.80708 -0.028 0.9774
---
Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯  
0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1


(Dispersion parameter for binomial family taken to be 1)

Null deviance: 678.55 on 498 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

And I remodel it and *want no intercept*:
***
Call:
glm(formula = as.factor(diagnostic) ~ as.factor(2281517) +
as.factor(7161521) - 1, family = binomial())

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5370 -1.0431 -0.9416 1.3065 1.4331

Coefficients:
Estimate Std. Error z value Pr(|z|)
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 *
as.factor(2281517)2 -1.6965 0.6751 -2.513 0.0120 *
as.factor(2281517)3 -0.6065 0.8325 -0.728 0.4663
as.factor(7161521)2 1.3981 0.6662 2.099 0.0358 *
as.factor(7161521)3 0.2819 0.8325 0.339 0.7349
---
Signif. codes: 0 ¡®***¡¯ 0.001 ¡®**¡¯ 0.01 ¡®*¡¯  
0.05 ¡®.¡¯ 0.1 ¡® ¡¯ 1


(Dispersion parameter for binomial family taken to be 1)

Null deviance: 691.76 on 499 degrees of freedom
Residual deviance: 671.20 on 494 degrees of freedom
AIC: 681.2

Number of Fisher Scoring iterations: 4
***

*As show above in my second model it return no intercept but  
look this:

Model1:
(Intercept) -0.58363 0.27948 -2.088 0.0368 *
Model2:
as.factor(2281517)1 -0.5836 0.2795 -2.088 0.0368 **

They are exactly the same. Could you please tell me what happen?


Actually it does not make sense to estimate the model without an
intercept unless you assume that it is exactly zero for the  
first levels
of your factors. Think about the contrasts you are interested  
in. Looks

like not the default?

Uwe Ligges




Thank you in advance


--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-regression-by-glm-tp4088471p4088471.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list

[R] logistic Regression using lmer

2011-11-12 Thread arunkumar1111
Hi

can we perform logistic regression using lmer function. Please help me with
this

--
View this message in context: 
http://r.789695.n4.nabble.com/logistic-Regression-using-lmer-tp4034056p4034056.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic Regression using lmer

2011-11-12 Thread Ben Bolker
arunkumar akpbond007 at gmail.com writes:

 
 Hi
 
 can we perform logistic regression using lmer function. Please help me with
 this
 

  Yes.  
  
  library(lme4)
  glmer([reponse]~[fixed effects (covariates)]+(1|[grouping variable]),
data=[data frame], family=binomial)

  Further questions on this topic should probably go to the
r-sig-mixed-models mailing list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-26 Thread RAJ
Can I atleast get help with what pacakge to use for logistic
regression with all possible models and do prediction. I know i can
use regsubsets but i am not sure if it has any prediction functions to
go with it.

Thanks

On Oct 25, 6:54 pm, RAJ dheerajathr...@gmail.com wrote:
 Hello,

 I am pretty new to R, I have always used SAS and SAS products. My
 target variable is binary ('Y' and 'N') and i have about 14 predictor
 variables. My goal is to compare different variable selection methods
 like Forward, Backward, All possible subsests. I am using
 misclassification rate to pick the winner method.

 This is what i have as of now,

 Reg - glm (Graduation ~., DFtrain,family=binomial(link=logit))
                 step - extractAIC(Reg, direction=forward)
                 pred - predict(Reg, DFtest,type=response)
                 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
 This program actually works but I needed to check to make sure am
 doing this right. Also, I am getting the same misclassification rates
 for all different methods.

 I also tried to use

 Reg - leaps(Graduation ~., DFtrain)
                 pred - predict(Reg, DFtest,type=response)
                 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
                 #print(summary(mis))
 which doesnt work

 and

 Reg - regsubsets(Graduation ~., DFtrain)
                 pred - predict(Reg, DFtest,type=response)
                 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
                 #print(summary(mis))

 The Regsubsets will work but the 'predict' function does not work with
 it. Is there any other way to do predictions when using regsubsets

 Any help is appreciated.

 Thanks,

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-26 Thread Steve_Friedman
Try the glm package

Steve Friedman Ph. D.
Ecologist  / Spatial Statistical Analyst
Everglades and Dry Tortugas National Park
950 N Krome Ave (3rd Floor)
Homestead, Florida 33034

steve_fried...@nps.gov
Office (305) 224 - 4282
Fax (305) 224 - 4147

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-26 Thread Steve Lianoglou
Hi,

On Wed, Oct 26, 2011 at 12:35 PM, RAJ dheerajathr...@gmail.com wrote:
 Can I atleast get help with what pacakge to use for logistic
 regression with all possible models and do prediction. I know i can
 use regsubsets but i am not sure if it has any prediction functions to
 go with it.

Maybe you could try glmnet instead.

It doesn't give you all possible models, but rather the best one at
a given value for the penalty (lambda) parameter.

HTH,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-26 Thread Weidong Gu
Check glmulti package for all subset selection.

Weidong Gu

On Wed, Oct 26, 2011 at 12:35 PM, RAJ dheerajathr...@gmail.com wrote:
 Can I atleast get help with what pacakge to use for logistic
 regression with all possible models and do prediction. I know i can
 use regsubsets but i am not sure if it has any prediction functions to
 go with it.

 Thanks

 On Oct 25, 6:54 pm, RAJ dheerajathr...@gmail.com wrote:
 Hello,

 I am pretty new to R, I have always used SAS and SAS products. My
 target variable is binary ('Y' and 'N') and i have about 14 predictor
 variables. My goal is to compare different variable selection methods
 like Forward, Backward, All possible subsests. I am using
 misclassification rate to pick the winner method.

 This is what i have as of now,

 Reg - glm (Graduation ~., DFtrain,family=binomial(link=logit))
                 step - extractAIC(Reg, direction=forward)
                 pred - predict(Reg, DFtest,type=response)
                 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
 This program actually works but I needed to check to make sure am
 doing this right. Also, I am getting the same misclassification rates
 for all different methods.

 I also tried to use

 Reg - leaps(Graduation ~., DFtrain)
                 pred - predict(Reg, DFtest,type=response)
                 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
                 #print(summary(mis))
 which doesnt work

 and

 Reg - regsubsets(Graduation ~., DFtrain)
                 pred - predict(Reg, DFtest,type=response)
                 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
                 #print(summary(mis))

 The Regsubsets will work but the 'predict' function does not work with
 it. Is there any other way to do predictions when using regsubsets

 Any help is appreciated.

 Thanks,

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-26 Thread Bert Gunter
You mean the glm()  _function_ in the stats package.

?glm

(just to avoid confusion)

-- Bert

On Wed, Oct 26, 2011 at 10:31 AM, steve_fried...@nps.gov wrote:

 Try the glm package

 Steve Friedman Ph. D.
 Ecologist  / Spatial Statistical Analyst
 Everglades and Dry Tortugas National Park
 950 N Krome Ave (3rd Floor)
 Homestead, Florida 33034

 steve_fried...@nps.gov
 Office (305) 224 - 4282
 Fax (305) 224 - 4147

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-26 Thread Marc Schwartz
The reason that you are not likely getting replies is that what you propose to 
do is considered a poor way of building models. 

You need to get out of the SAS Mindset.

I would suggest you obtain a copy of Frank Harrell's book:

  http://www.amazon.com/exec/obidos/ASIN/0387952322/

and then consider using his 'rms' package on CRAN to engage in modeling 
building strategies and validation.

Regards,

Marc Schwartz

On Oct 26, 2011, at 11:35 AM, RAJ wrote:

 Can I atleast get help with what pacakge to use for logistic
 regression with all possible models and do prediction. I know i can
 use regsubsets but i am not sure if it has any prediction functions to
 go with it.
 
 Thanks
 
 On Oct 25, 6:54 pm, RAJ dheerajathr...@gmail.com wrote:
 Hello,
 
 I am pretty new to R, I have always used SAS and SAS products. My
 target variable is binary ('Y' and 'N') and i have about 14 predictor
 variables. My goal is to compare different variable selection methods
 like Forward, Backward, All possible subsests. I am using
 misclassification rate to pick the winner method.
 
 This is what i have as of now,
 
 Reg - glm (Graduation ~., DFtrain,family=binomial(link=logit))
 step - extractAIC(Reg, direction=forward)
 pred - predict(Reg, DFtest,type=response)
 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
 This program actually works but I needed to check to make sure am
 doing this right. Also, I am getting the same misclassification rates
 for all different methods.
 
 I also tried to use
 
 Reg - leaps(Graduation ~., DFtrain)
 pred - predict(Reg, DFtest,type=response)
 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
 #print(summary(mis))
 which doesnt work
 
 and
 
 Reg - regsubsets(Graduation ~., DFtrain)
 pred - predict(Reg, DFtest,type=response)
 mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
 #print(summary(mis))
 
 The Regsubsets will work but the 'predict' function does not work with
 it. Is there any other way to do predictions when using regsubsets
 
 Any help is appreciated.
 
 Thanks,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Logistic Regression - Variable Selection Methods With Prediction

2011-10-25 Thread RAJ
Hello,

I am pretty new to R, I have always used SAS and SAS products. My
target variable is binary ('Y' and 'N') and i have about 14 predictor
variables. My goal is to compare different variable selection methods
like Forward, Backward, All possible subsests. I am using
misclassification rate to pick the winner method.

This is what i have as of now,

Reg - glm (Graduation ~., DFtrain,family=binomial(link=logit))
step - extractAIC(Reg, direction=forward)
pred - predict(Reg, DFtest,type=response)
mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
This program actually works but I needed to check to make sure am
doing this right. Also, I am getting the same misclassification rates
for all different methods.

I also tried to use

Reg - leaps(Graduation ~., DFtrain)
pred - predict(Reg, DFtest,type=response)
mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
#print(summary(mis))
which doesnt work

and

Reg - regsubsets(Graduation ~., DFtrain)
pred - predict(Reg, DFtest,type=response)
mis - mean({pred  0.5} != {DFtest[,Graduation] == Y})
#print(summary(mis))

The Regsubsets will work but the 'predict' function does not work with
it. Is there any other way to do predictions when using regsubsets

Any help is appreciated.

Thanks,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >