Hi Mathieu, Thanks for the tip. I now recall seeing that trick in the past.
Would it be useful for me to explicitly implement this behavior instead of falling back to the dense cholesky solver, and create a PR? Or is this 'better' fixed at a lower level (liblinear?) ? Thanks, Cory On Tue, Apr 1, 2014 at 10:03 PM, Mathieu Blondel <[email protected]>wrote: > Hi Cory, > > The lack of sample_weight support in sparse solvers is a known issue, see > https://github.com/scikit-learn/scikit-learn/issues/1190 > > In the meantime, I see two solutions. As described in the above issue, one > solution is to multiply each x_i and y_i in your training set by the square > root of its sample weight. This will be exactly equivalent to using sample > weights and will allow you to use fast sparse solvers like "sparse_cg" or > "lsqr". The second solution is to use SGDRegressor(loss="squared"), which > should readily support sample_weight. > > HTH, > Mathieu > > > On Wed, Apr 2, 2014 at 9:18 AM, Cory Dolphin <[email protected]> wrote: > >> Hello, >> >> I am trying to perform ridge regression on a relatively large data set 70 >> million examples 24 million very sparse features. >> >> E.G. I have created an X matrix with dimensions (73725855, 24652292), an >> associated y vector with dimensions (73725855,), and a sample_weights >> vector with identical dimensions ((73725855,)). >> >> In this case, the y vector is a rating, and the sample_weights describe >> how many times a given rating occurred. >> >> I need to use one of the sparse solvers, as the data set does not fit in >> memory as a dense matrix, however it seems that all of the sparse solvers >> do not accept a sample_weights vector. >> >> Does anyone have experience with weighted ridge regression on large >> sparse matrices? >> >> >> I am new to the world of machine learning, so please forgive me for any >> vocabulary mistakes! >> >> Thanks, >> Cory >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
