Hi Zoraida,

can you provide a code snippet (e.g. upload it to gist.github.com) that
illustrates the problem -- especially how you evaluate the goodness of the
predictions (both R and scikit-learn)?
Its pretty difficult to argue about the issue without seeing what you
actually do. The difference between AUC 0.9 and 0.6 is huge -- so I guess
either the default hyperparameters are a poor choice or there is a glitch
in your experimental setup.

Both scikit-learn and R (glmnet) should be thoroughly documented. ML tools
have come a long way and are very robust and usable these days but they are
not completely fire-and-forget**.

best,
 Peter


** sorry for the military term but I lack a good alternative

2014-10-06 15:10 GMT+02:00 ZORAIDA HIDALGO SANCHEZ <
[email protected]>:

> Hi all,
>
> I know the subject is ugly but I don¹t really know how to call it.
>
> I am newbie with all this machine learning techniques and what I do most
> of the time is to follow a ³try and error² approach. I now this method has
> some inconvenients but for now
> is what I am able to do.
>
> I am working with text on a classification problem. My pipeline is:
> TfidfVectorizer, feature selection with f_classif/Chi and the final
> classifier(I have tried lot of different classifiers). Unfortunately, the
> results that I am getting are very poor. The measurement that I am using
> is the AUC. The best result has been an AUC of 62(I have tried without
> doing feature selection too).
>
> Using same dataset but using R I have obtain an AUC of 0.90. In the
> process, I am using frequencies obtained with Scikit(I process the
> frequencies using TfidfVectorizer and later I store the resulting dataset
> on a csv). No feature selection is used and  the classifier is a logistic
> regression:
>
>    out.glm.1 <- glm(equat, data=dataset[,c(input, target)],
> family=binomial(link="logit²))
>
> Is there someone that could tell me how to ³replicate² this with Scikit?
> And more, someone knows any resource ³easy to follow² where I can
> understand the underlying implementation
> on both libraries? In general, I found that Scikit has links to the source
> of the implementation(I mean, the original papers). On the other hand, I
> found R documentation very difficult to follow(parameters explanation) and
> there aren¹t too much details on the implementation.
>
> Thanks in advance.
>
>
> ________________________________
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.  Videos for Nerds.  Stuff that Matters.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



-- 
Peter Prettenhofer
------------------------------------------------------------------------------
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to