Hi Zoraida.

I am not expert in R glms but I think the glm call just does logistic 
regression.
For the binary case, this is the same as 
sklearn.linear_model.LogisticRegression.

Just a wild guess: Did you use clf.decision function results as input to 
roc_auc_score?
If you use clf.predict results, you score will be much lower than it 
should be.
In newer versions of scikit-learn, this is done automatically if you use 
GridSearchCV or cross_val_score
for scoring your model and you use the "scoring" parameter.

I don't understand the last part of your question. What do you find hard 
to follow with scikit-learn?
Indeed, the implementation of LogisticRegression is a bit tricky as it 
calls LibLinear, but I'm not sure you are asking about the code.

Cheers,
Andy



On 10/06/2014 03:10 PM, ZORAIDA HIDALGO SANCHEZ wrote:
> Hi all,
>
> I know the subject is ugly but I don¹t really know how to call it.
>
> I am newbie with all this machine learning techniques and what I do most
> of the time is to follow a ³try and error² approach. I now this method has
> some inconvenients but for now
> is what I am able to do.
>
> I am working with text on a classification problem. My pipeline is:
> TfidfVectorizer, feature selection with f_classif/Chi and the final
> classifier(I have tried lot of different classifiers). Unfortunately, the
> results that I am getting are very poor. The measurement that I am using
> is the AUC. The best result has been an AUC of 62(I have tried without
> doing feature selection too).
>
> Using same dataset but using R I have obtain an AUC of 0.90. In the
> process, I am using frequencies obtained with Scikit(I process the
> frequencies using TfidfVectorizer and later I store the resulting dataset
> on a csv). No feature selection is used and  the classifier is a logistic
> regression:
>
>     out.glm.1 <- glm(equat, data=dataset[,c(input, target)],
> family=binomial(link="logit²))
>
> Is there someone that could tell me how to ³replicate² this with Scikit?
> And more, someone knows any resource ³easy to follow² where I can
> understand the underlying implementation
> on both libraries? In general, I found that Scikit has links to the source
> of the implementation(I mean, the original papers). On the other hand, I
> found R documentation very difficult to follow(parameters explanation) and
> there aren¹t too much details on the implementation.
>
> Thanks in advance.
>
>
> ________________________________
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, 
> puede contener información privilegiada o confidencial y es para uso 
> exclusivo de la persona o entidad de destino. Si no es usted. el destinatario 
> indicado, queda notificado de que la lectura, utilización, divulgación y/o 
> copia sin autorización puede estar prohibida en virtud de la legislación 
> vigente. Si ha recibido este mensaje por error, le rogamos que nos lo 
> comunique inmediatamente por esta misma vía y proceda a su destrucción.
>
> The information contained in this transmission is privileged and confidential 
> information intended only for the use of the individual or entity named 
> above. If the reader of this message is not the intended recipient, you are 
> hereby notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this transmission 
> in error, do not read it. Please immediately reply to the sender that you 
> have received this communication in error and then delete it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, 
> pode conter informação privilegiada ou confidencial e é para uso exclusivo da 
> pessoa ou entidade de destino. Se não é vossa senhoria o destinatário 
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou cópia 
> sem autorização pode estar proibida em virtude da legislação vigente. Se 
> recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente 
> por esta mesma via e proceda a sua destruição
>
> ------------------------------------------------------------------------------
> Slashdot TV.  Videos for Nerds.  Stuff that Matters.
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to