Hi Zoraida. I am not expert in R glms but I think the glm call just does logistic regression. For the binary case, this is the same as sklearn.linear_model.LogisticRegression.
Just a wild guess: Did you use clf.decision function results as input to roc_auc_score? If you use clf.predict results, you score will be much lower than it should be. In newer versions of scikit-learn, this is done automatically if you use GridSearchCV or cross_val_score for scoring your model and you use the "scoring" parameter. I don't understand the last part of your question. What do you find hard to follow with scikit-learn? Indeed, the implementation of LogisticRegression is a bit tricky as it calls LibLinear, but I'm not sure you are asking about the code. Cheers, Andy On 10/06/2014 03:10 PM, ZORAIDA HIDALGO SANCHEZ wrote: > Hi all, > > I know the subject is ugly but I don¹t really know how to call it. > > I am newbie with all this machine learning techniques and what I do most > of the time is to follow a ³try and error² approach. I now this method has > some inconvenients but for now > is what I am able to do. > > I am working with text on a classification problem. My pipeline is: > TfidfVectorizer, feature selection with f_classif/Chi and the final > classifier(I have tried lot of different classifiers). Unfortunately, the > results that I am getting are very poor. The measurement that I am using > is the AUC. The best result has been an AUC of 62(I have tried without > doing feature selection too). > > Using same dataset but using R I have obtain an AUC of 0.90. In the > process, I am using frequencies obtained with Scikit(I process the > frequencies using TfidfVectorizer and later I store the resulting dataset > on a csv). No feature selection is used and the classifier is a logistic > regression: > > out.glm.1 <- glm(equat, data=dataset[,c(input, target)], > family=binomial(link="logit²)) > > Is there someone that could tell me how to ³replicate² this with Scikit? > And more, someone knows any resource ³easy to follow² where I can > understand the underlying implementation > on both libraries? In general, I found that Scikit has links to the source > of the implementation(I mean, the original papers). On the other hand, I > found R documentation very difficult to follow(parameters explanation) and > there aren¹t too much details on the implementation. > > Thanks in advance. > > > ________________________________ > > Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, > puede contener información privilegiada o confidencial y es para uso > exclusivo de la persona o entidad de destino. Si no es usted. el destinatario > indicado, queda notificado de que la lectura, utilización, divulgación y/o > copia sin autorización puede estar prohibida en virtud de la legislación > vigente. Si ha recibido este mensaje por error, le rogamos que nos lo > comunique inmediatamente por esta misma vía y proceda a su destrucción. > > The information contained in this transmission is privileged and confidential > information intended only for the use of the individual or entity named > above. If the reader of this message is not the intended recipient, you are > hereby notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this transmission > in error, do not read it. Please immediately reply to the sender that you > have received this communication in error and then delete it. > > Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, > pode conter informação privilegiada ou confidencial e é para uso exclusivo da > pessoa ou entidade de destino. Se não é vossa senhoria o destinatário > indicado, fica notificado de que a leitura, utilização, divulgação e/ou cópia > sem autorização pode estar proibida em virtude da legislação vigente. Se > recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente > por esta mesma via e proceda a sua destruição > > ------------------------------------------------------------------------------ > Slashdot TV. Videos for Nerds. Stuff that Matters. > http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Slashdot TV. Videos for Nerds. Stuff that Matters. http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
