Re: Hivemall ffm classifier and class weights

Shadi Mari Fri, 01 Nov 2019 04:11:51 -0700

But class weighting doesnt need recalibration, isnt?

Shadi


On Friday, November 1, 2019, Makoto Yui <m...@apache.org> wrote:

> True. And, I think oversampling is better than class weighting for
> accuracy.
>
> Makoto
>
> 2019年11月1日(金) 19:37 Shadi Mari <shadim...@gmail.com>:
>
>> Thanks for the explanation Makoto.
>>
>> Class weight option much like what scikit provides is what am referring
>> to; seems its not yet implemented and thus “random” downsampling /
>> oversampling is the only option am left with.
>>
>> Best
>>
>> On Friday, November 1, 2019, Makoto Yui <yuin...@gmail.com> wrote:
>>
>>> class_weight is usually computed as follows:
>>> > class_weight_y = #samples / (#classes * count_of(y))
>>>
>>> In SQL, it can be computed in SQL as follows:
>>> -- For binary classification (#classes = 2)
>>>   select
>>>     count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight,
>>>     count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight
>>>   from
>>>     train
>>>
>>> We need to introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2
>>> -neg_weight=0.1" option while it's not implemented.
>>> https://github.com/scikit-learn/scikit-learn/blob/
>>> 0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_
>>> model/_sgd_fast.pyx#L719
>>>
>>> select
>>>   train_classifier(features, label, "-pos_weight=0.1 -neg_weight=0.2")
>>> from
>>>   train
>>>
>>> I'm doubting class weighing scheme that modifies gradient updates
>>> works as expected.
>>> Oversampling is more reasonable approach from Optimizer point of view.
>>>
>>> Here is example to apply oversampling in Hivemall:
>>>
>>> oversampling
>>> https://github.com/treasure-data/treasure-boxes/blob/
>>> master/machine-learning-box/ctr-prediction/queries/fm_train.sql#L10
>>> calibration
>>> https://github.com/treasure-data/treasure-boxes/blob/
>>> master/machine-learning-box/ctr-prediction/queries/fm_predict.sql#L24
>>> sampling rate computation
>>> https://github.com/treasure-data/treasure-boxes/blob/
>>> master/machine-learning-box/ctr-prediction/queries/downsampling_rate.sql
>>>
>>> Thanks,
>>> Makoto
>>>
>>> 2019年10月31日(木) 21:17 Shadi Mari <shadim...@gmail.com>:
>>> >
>>> > Hi Makoto,
>>> >
>>> > libffm and Xlearn does not support class weights. However ML.NET does!
>>> >
>>> > Class weighting is another approach to handle imbalance datasets other
>>> than downsampling/oversampling; its a sort of a cost-sensitive learning.
>>> >
>>> > I am not sure how SQL will help given than weights have to be
>>> incorporated into the loss function during training.
>>> >
>>> > Thanks
>>> >
>>> > On Thu, Oct 31, 2019 at 1:05 PM Makoto Yui <yuin...@gmail.com> wrote:
>>> >>
>>> >> Hi Mari,
>>> >>
>>> >> Do you have a specific example that does class weighting?
>>> >>
>>> >> libffm does not have such feature.
>>> >> https://github.com/ycjuan/libffm
>>> >>
>>> >> Sckit SGD [1] adjust y as follows:
>>> >> "n_samples / (n_classes * np.bincount(y))"
>>> >> [1] https://scikit-learn.org/stable/modules/generated/
>>> sklearn.linear_model.SGDClassifier.html
>>> >>
>>> >> I think this can easily be achieved using SQL.
>>> >>
>>> >> Thanks,
>>> >> Makoto
>>> >>
>>> >> 2019年10月31日(木) 19:38 Shadi Mari <shadim...@gmail.com>:
>>> >> >
>>> >> > Hello
>>> >> > I am having extremely imbalanced dataset and trying to find support
>>> for class weights in hivemall ffm classifier in specific, however couldnt
>>> find any mention in the docs.
>>> >> >
>>> >> > Is this feature supported, otherwise i had to go with negative
>>> downsampling.
>>> >> >
>>> >> > Please advise
>>> >> >
>>> >> > Thank you
>>>
>>

Re: Hivemall ffm classifier and class weights

Reply via email to