But class weighting doesnt need recalibration, isnt? Shadi
On Friday, November 1, 2019, Makoto Yui <m...@apache.org> wrote: > True. And, I think oversampling is better than class weighting for > accuracy. > > Makoto > > 2019年11月1日(金) 19:37 Shadi Mari <shadim...@gmail.com>: > >> Thanks for the explanation Makoto. >> >> Class weight option much like what scikit provides is what am referring >> to; seems its not yet implemented and thus “random” downsampling / >> oversampling is the only option am left with. >> >> Best >> >> On Friday, November 1, 2019, Makoto Yui <yuin...@gmail.com> wrote: >> >>> class_weight is usually computed as follows: >>> > class_weight_y = #samples / (#classes * count_of(y)) >>> >>> In SQL, it can be computed in SQL as follows: >>> -- For binary classification (#classes = 2) >>> select >>> count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight, >>> count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight >>> from >>> train >>> >>> We need to introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2 >>> -neg_weight=0.1" option while it's not implemented. >>> https://github.com/scikit-learn/scikit-learn/blob/ >>> 0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_ >>> model/_sgd_fast.pyx#L719 >>> >>> select >>> train_classifier(features, label, "-pos_weight=0.1 -neg_weight=0.2") >>> from >>> train >>> >>> I'm doubting class weighing scheme that modifies gradient updates >>> works as expected. >>> Oversampling is more reasonable approach from Optimizer point of view. >>> >>> Here is example to apply oversampling in Hivemall: >>> >>> oversampling >>> https://github.com/treasure-data/treasure-boxes/blob/ >>> master/machine-learning-box/ctr-prediction/queries/fm_train.sql#L10 >>> calibration >>> https://github.com/treasure-data/treasure-boxes/blob/ >>> master/machine-learning-box/ctr-prediction/queries/fm_predict.sql#L24 >>> sampling rate computation >>> https://github.com/treasure-data/treasure-boxes/blob/ >>> master/machine-learning-box/ctr-prediction/queries/downsampling_rate.sql >>> >>> Thanks, >>> Makoto >>> >>> 2019年10月31日(木) 21:17 Shadi Mari <shadim...@gmail.com>: >>> > >>> > Hi Makoto, >>> > >>> > libffm and Xlearn does not support class weights. However ML.NET does! >>> > >>> > Class weighting is another approach to handle imbalance datasets other >>> than downsampling/oversampling; its a sort of a cost-sensitive learning. >>> > >>> > I am not sure how SQL will help given than weights have to be >>> incorporated into the loss function during training. >>> > >>> > Thanks >>> > >>> > On Thu, Oct 31, 2019 at 1:05 PM Makoto Yui <yuin...@gmail.com> wrote: >>> >> >>> >> Hi Mari, >>> >> >>> >> Do you have a specific example that does class weighting? >>> >> >>> >> libffm does not have such feature. >>> >> https://github.com/ycjuan/libffm >>> >> >>> >> Sckit SGD [1] adjust y as follows: >>> >> "n_samples / (n_classes * np.bincount(y))" >>> >> [1] https://scikit-learn.org/stable/modules/generated/ >>> sklearn.linear_model.SGDClassifier.html >>> >> >>> >> I think this can easily be achieved using SQL. >>> >> >>> >> Thanks, >>> >> Makoto >>> >> >>> >> 2019年10月31日(木) 19:38 Shadi Mari <shadim...@gmail.com>: >>> >> > >>> >> > Hello >>> >> > I am having extremely imbalanced dataset and trying to find support >>> for class weights in hivemall ffm classifier in specific, however couldnt >>> find any mention in the docs. >>> >> > >>> >> > Is this feature supported, otherwise i had to go with negative >>> downsampling. >>> >> > >>> >> > Please advise >>> >> > >>> >> > Thank you >>> >>