Re: [R] Logistic regression for large data
summary(Base) would show if one of columns of Base was read as character data instead of the expected numeric. That could cause an explosion in the number of dummy variables, hence a huge design matrix. -Bill On Fri, Nov 11, 2022 at 11:30 PM George Brida wrote: > Dear R users, > > I have a database called Base.csv (attached to this email) which > contains 13 columns and 8257 rows and whose the first 8 columns are dummy > variables which take 1 or 0. The problem is when I wrote the following > instructions to do a logistic regression , R runs for hours and hours > without giving an output: > > Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") > > fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) > > Apparently, there is not enough memory to have the requested output. Is > there any other function for logistic regression that handle large data and > return output in reasonable time. > > Many thanks > > Kind regards > > George > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression for large data
Hi George, I did not get an attachment. My first step would be to try simplifying things. Do all of these work? fit_1=glm(Base[,2]~Base[,1],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,10],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,11],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,12],family=binomial(link="logit")) fit_1=glm(Base[,2]~Base[,13],family=binomial(link="logit")) This is not a large dataset. That said, if your computer is nearly out of memory, even a small dataset might be too much. It might have plenty of physical memory, but also lots of (open files, cookies, applications, other stuff) that eat memory. Regards, Tim -Original Message- From: R-help On Behalf Of George Brida Sent: Friday, November 11, 2022 4:17 PM To: r-help@r-project.org Subject: [R] Logistic regression for large data [External Email] Dear R users, I have a database called Base.csv (attached to this email) which contains 13 columns and 8257 rows and whose the first 8 columns are dummy variables which take 1 or 0. The problem is when I wrote the following instructions to do a logistic regression , R runs for hours and hours without giving an output: Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) Apparently, there is not enough memory to have the requested output. Is there any other function for logistic regression that handle large data and return output in reasonable time. Many thanks Kind regards George __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7Cb0af80b8620648fcc1ab08dac47fb19e%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638038350110240752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7Csdata=oxeFwGCpH%2B9Ha%2BDFaWRygEcvOJ2O6AngSKNhMwE%2FczI%3Dreserved=0 PLEASE do read the posting guide https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7Cb0af80b8620648fcc1ab08dac47fb19e%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638038350110240752%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000%7C%7C%7Csdata=FtLW709kbnkMLzylkRRtR1Y%2Fw5oehodb0dmS8DqwGig%3Dreserved=0 and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Logistic regression for large data
That’s not a large data set. Something else besides memory limits is going on. You should post output of summary(Base). — David Sent from my iPhone > On Nov 11, 2022, at 11:29 PM, George Brida wrote: > > Dear R users, > > I have a database called Base.csv (attached to this email) which > contains 13 columns and 8257 rows and whose the first 8 columns are dummy > variables which take 1 or 0. The problem is when I wrote the following > instructions to do a logistic regression , R runs for hours and hours > without giving an output: > > Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") > fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) > > Apparently, there is not enough memory to have the requested output. Is > there any other function for logistic regression that handle large data and > return output in reasonable time. > > Many thanks > > Kind regards > > George > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Logistic regression for large data
Dear R users, I have a database called Base.csv (attached to this email) which contains 13 columns and 8257 rows and whose the first 8 columns are dummy variables which take 1 or 0. The problem is when I wrote the following instructions to do a logistic regression , R runs for hours and hours without giving an output: Base=read.csv("C:\\Users\\HP\\Desktop\\New\\Base.csv",header=FALSE,sep=";") fit_1=glm(Base[,2]~Base[,1]+Base[,10]+Base[,11]+Base[,12]+Base[,13],family=binomial(link="logit")) Apparently, there is not enough memory to have the requested output. Is there any other function for logistic regression that handle large data and return output in reasonable time. Many thanks Kind regards George __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.