no ( but please check !) sample weights should be the counts for the respective label (0/1)
[ I am actually puzzled about the glm help file - proportions loses how often an input data 'row' was present relative to the other - though you could do this by repeating the row 'n' times] On Mon, Oct 10, 2016 at 1:15 PM, Raphael C <drr...@gmail.com> wrote: > How do I use sample_weight for my use case? > > In my case is "y" an array of 0s and 1s and sample_weight then an > array real numbers between 0 and 1 where I should make sure to set > sample_weight[i]= 0 when y[i] = 0? > > Raphael > > On 10 October 2016 at 12:08, Sean Violante <sean.viola...@gmail.com> > wrote: > > should be the sample weight function in fit > > > > http://scikit-learn.org/stable/modules/generated/sklearn.linear_model. > LogisticRegression.html > > > > On Mon, Oct 10, 2016 at 1:03 PM, Raphael C <drr...@gmail.com> wrote: > >> > >> I just noticed this about the glm package in R. > >> http://stats.stackexchange.com/a/26779/53128 > >> > >> " > >> The glm function in R allows 3 ways to specify the formula for a > >> logistic regression model. > >> > >> The most common is that each row of the data frame represents a single > >> observation and the response variable is either 0 or 1 (or a factor > >> with 2 levels, or other varibale with only 2 unique values). > >> > >> Another option is to use a 2 column matrix as the response variable > >> with the first column being the counts of 'successes' and the second > >> column being the counts of 'failures'. > >> > >> You can also specify the response as a proportion between 0 and 1, > >> then specify another column as the 'weight' that gives the total > >> number that the proportion is from (so a response of 0.3 and a weight > >> of 10 is the same as 3 'successes' and 7 'failures')." > >> > >> Either of the last two options would do for me. Does scikit-learn > >> support either of these last two options? > >> > >> Raphael > >> > >> On 10 October 2016 at 11:55, Raphael C <drr...@gmail.com> wrote: > >> > I am trying to perform regression where my dependent variable is > >> > constrained to be between 0 and 1. This constraint comes from the fact > >> > that it represents a count proportion. That is counts in some category > >> > divided by a total count. > >> > > >> > In the literature it seems that one common way to tackle this is to > >> > use logistic regression. However, it appears that in scikit learn > >> > logistic regression is only available as a classifier > >> > > >> > (http://scikit-learn.org/stable/modules/generated/ > sklearn.linear_model.LogisticRegression.html > >> > ) . Is that right? > >> > > >> > Is there another way to perform regression using scikit learn where > >> > the dependent variable is a count proportion? > >> > > >> > Thanks for any help. > >> > > >> > Raphael > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn