Hi all,

I know the subject is ugly but I don¹t really know how to call it.

I am newbie with all this machine learning techniques and what I do most
of the time is to follow a ³try and error² approach. I now this method has
some inconvenients but for now
is what I am able to do.

I am working with text on a classification problem. My pipeline is:
TfidfVectorizer, feature selection with f_classif/Chi and the final
classifier(I have tried lot of different classifiers). Unfortunately, the
results that I am getting are very poor. The measurement that I am using
is the AUC. The best result has been an AUC of 62(I have tried without
doing feature selection too).

Using same dataset but using R I have obtain an AUC of 0.90. In the
process, I am using frequencies obtained with Scikit(I process the
frequencies using TfidfVectorizer and later I store the resulting dataset
on a csv). No feature selection is used and  the classifier is a logistic
regression:

   out.glm.1 <- glm(equat, data=dataset[,c(input, target)],
family=binomial(link="logit²))

Is there someone that could tell me how to ³replicate² this with Scikit?
And more, someone knows any resource ³easy to follow² where I can
understand the underlying implementation
on both libraries? In general, I found that Scikit has links to the source
of the implementation(I mean, the original papers). On the other hand, I
found R documentation very difficult to follow(parameters explanation) and
there aren¹t too much details on the implementation.

Thanks in advance.


________________________________

Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede 
contener información privilegiada o confidencial y es para uso exclusivo de la 
persona o entidad de destino. Si no es usted. el destinatario indicado, queda 
notificado de que la lectura, utilización, divulgación y/o copia sin 
autorización puede estar prohibida en virtud de la legislación vigente. Si ha 
recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente 
por esta misma vía y proceda a su destrucción.

The information contained in this transmission is privileged and confidential 
information intended only for the use of the individual or entity named above. 
If the reader of this message is not the intended recipient, you are hereby 
notified that any dissemination, distribution or copying of this communication 
is strictly prohibited. If you have received this transmission in error, do not 
read it. Please immediately reply to the sender that you have received this 
communication in error and then delete it.

Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode 
conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa 
ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica 
notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização 
pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem 
por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e 
proceda a sua destruição

------------------------------------------------------------------------------
Slashdot TV.  Videos for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to