Here is a possibly useful comment of larsmans on stackoverflow about exactly this procedure
http://stackoverflow.com/questions/26604175/how-to-predict-a-continuous-dependent-variable-that-expresses-target-class-proba/26614131#comment41846816_26614131 On Mon, Oct 10, 2016 at 4:04 PM, Sean Violante <sean.viola...@gmail.com> wrote: > sorry yes there was a misunderstanding: > > I meant for each feature configuration you should pass in two rows (one > for the positive cases and one for the negative) > and the sample weight being the corresponding count for that configuration > and class > > and I am saying that the total count is important because you could have > a situation where > one feature combination occurs 10 times and another feature combination > 1000 times > > > > > > On Mon, Oct 10, 2016 at 3:48 PM, Raphael C <drr...@gmail.com> wrote: > >> On 10 October 2016 at 12:22, Sean Violante <sean.viola...@gmail.com> >> wrote: >> > no ( but please check !) >> > >> > sample weights should be the counts for the respective label (0/1) >> > >> > [ I am actually puzzled about the glm help file - proportions loses how >> > often an input data 'row' was present relative to the other - though you >> > could do this by repeating the row 'n' times] >> >> I think we might be talking at cross purposes. >> >> I have a matrix X where each row is a feature vector. I also have an >> array y where y[i] is a real number between 0 and 1. I would like to >> build a regression model that predicts the y values given the X rows. >> >> Now each y[i] value in fact comes from simply counting the number of >> positive labelled elements in a particular set (set i) and dividing by >> the number of elements in that set. So I can easily fit this into the >> model given by the R package glm by replacing each y[i] value by a >> pair of "Number of positives" and "Number of negatives" (this is case >> 2 in the docs I quoted) or using case 3 which asks for the y[i] plus >> the total number of elements in set i. >> >> I don't see how a single integer for sample_weight[i] would cover this >> information but I am sure I must have misunderstood. At best it seems >> it could cover the number of positive values but this is missing half >> the information. >> >> Raphael >> >> > >> > On Mon, Oct 10, 2016 at 1:15 PM, Raphael C <drr...@gmail.com> wrote: >> >> >> >> How do I use sample_weight for my use case? >> >> >> >> In my case is "y" an array of 0s and 1s and sample_weight then an >> >> array real numbers between 0 and 1 where I should make sure to set >> >> sample_weight[i]= 0 when y[i] = 0? >> >> >> >> Raphael >> >> >> >> On 10 October 2016 at 12:08, Sean Violante <sean.viola...@gmail.com> >> >> wrote: >> >> > should be the sample weight function in fit >> >> > >> >> > >> >> > http://scikit-learn.org/stable/modules/generated/sklearn. >> linear_model.LogisticRegression.html >> >> > >> >> > On Mon, Oct 10, 2016 at 1:03 PM, Raphael C <drr...@gmail.com> wrote: >> >> >> >> >> >> I just noticed this about the glm package in R. >> >> >> http://stats.stackexchange.com/a/26779/53128 >> >> >> >> >> >> " >> >> >> The glm function in R allows 3 ways to specify the formula for a >> >> >> logistic regression model. >> >> >> >> >> >> The most common is that each row of the data frame represents a >> single >> >> >> observation and the response variable is either 0 or 1 (or a factor >> >> >> with 2 levels, or other varibale with only 2 unique values). >> >> >> >> >> >> Another option is to use a 2 column matrix as the response variable >> >> >> with the first column being the counts of 'successes' and the second >> >> >> column being the counts of 'failures'. >> >> >> >> >> >> You can also specify the response as a proportion between 0 and 1, >> >> >> then specify another column as the 'weight' that gives the total >> >> >> number that the proportion is from (so a response of 0.3 and a >> weight >> >> >> of 10 is the same as 3 'successes' and 7 'failures')." >> >> >> >> >> >> Either of the last two options would do for me. Does scikit-learn >> >> >> support either of these last two options? >> >> >> >> >> >> Raphael >> >> >> >> >> >> On 10 October 2016 at 11:55, Raphael C <drr...@gmail.com> wrote: >> >> >> > I am trying to perform regression where my dependent variable is >> >> >> > constrained to be between 0 and 1. This constraint comes from the >> >> >> > fact >> >> >> > that it represents a count proportion. That is counts in some >> >> >> > category >> >> >> > divided by a total count. >> >> >> > >> >> >> > In the literature it seems that one common way to tackle this is >> to >> >> >> > use logistic regression. However, it appears that in scikit learn >> >> >> > logistic regression is only available as a classifier >> >> >> > >> >> >> > >> >> >> > (http://scikit-learn.org/stable/modules/generated/sklearn. >> linear_model.LogisticRegression.html >> >> >> > ) . Is that right? >> >> >> > >> >> >> > Is there another way to perform regression using scikit learn >> where >> >> >> > the dependent variable is a count proportion? >> >> >> > >> >> >> > Thanks for any help. >> >> >> > >> >> >> > Raphael >> >> >> _______________________________________________ >> >> >> scikit-learn mailing list >> >> >> scikit-learn@python.org >> >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > scikit-learn mailing list >> >> > scikit-learn@python.org >> >> > https://mail.python.org/mailman/listinfo/scikit-learn >> >> > >> >> _______________________________________________ >> >> scikit-learn mailing list >> >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> > >> > >> > >> > _______________________________________________ >> > scikit-learn mailing list >> > scikit-learn@python.org >> > https://mail.python.org/mailman/listinfo/scikit-learn >> > >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn