Hi Mathieu,

Thanks for the tip. I now recall seeing that trick in the past.

Would it be useful for me to explicitly implement this behavior instead of
falling back to the dense cholesky solver, and create a PR? Or is this
'better' fixed at a lower level (liblinear?) ?

Thanks,
Cory


On Tue, Apr 1, 2014 at 10:03 PM, Mathieu Blondel <[email protected]>wrote:

> Hi Cory,
>
> The lack of sample_weight support in sparse solvers is a known issue, see
> https://github.com/scikit-learn/scikit-learn/issues/1190
>
> In the meantime, I see two solutions. As described in the above issue, one
> solution is to multiply each x_i and y_i in your training set by the square
> root of its sample weight. This will be exactly equivalent to using sample
> weights and will allow you to use fast sparse solvers like "sparse_cg" or
> "lsqr". The second solution is to use SGDRegressor(loss="squared"), which
> should readily support sample_weight.
>
> HTH,
> Mathieu
>
>
> On Wed, Apr 2, 2014 at 9:18 AM, Cory Dolphin <[email protected]> wrote:
>
>> Hello,
>>
>> I am trying to perform ridge regression on a relatively large data set 70
>> million examples 24 million very sparse features.
>>
>> E.G. I have created an X matrix with dimensions (73725855, 24652292), an
>> associated y vector with dimensions (73725855,), and a sample_weights
>> vector with identical dimensions ((73725855,)).
>>
>> In this case, the y vector is a rating, and the sample_weights describe
>> how many times a given rating occurred.
>>
>> I need to use one of the sparse solvers, as the data set does not fit in
>> memory as a dense matrix, however it seems that all of the sparse solvers
>> do not accept a sample_weights vector.
>>
>> Does anyone have experience with weighted ridge regression on large
>> sparse matrices?
>>
>>
>> I am new to the world of machine learning, so please forgive me for any
>> vocabulary mistakes!
>>
>> Thanks,
>> Cory
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to