[
https://issues.apache.org/jira/browse/HIVEMALL-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Makoto Yui updated HIVEMALL-284:
--------------------------------
Description:
[https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit]
[https://scikit-learn.org/dev/glossary.html#term-class-weight]
Introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2 -neg_weight=0.1" option.
[https://github.com/scikit-learn/scikit-learn/blob/0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_model/_sgd_fast.pyx#L719]
class_weight is computed in scikit as follows:
> class_weight_y = #samples / (#classes * count_of(y))
In SQL, it can be computed in SQL as follows:
{code:java}
-- For binary classification (#classes = 2)
WITH weights as (
select
count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight,
count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight
from
train
)
select
train_classifier(features, label, concat('-pos_weight=', pos_weight, '
-neg_weight=', neg_weight)
from
train l
cross join weights r{code}
was:
[https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit]
[https://scikit-learn.org/dev/glossary.html#term-class-weight]
Introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2 -neg_weight=0.1" option.
[https://github.com/scikit-learn/scikit-learn/blob/0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_model/_sgd_fast.pyx#L719]
class_weight is computed in scikit as follows:
> class_weight_y = #samples / (#classes * count_of(y))
In SQL, it can be computed in SQL as follows:
{code:java}
-- For binary classification (#classes = 2)
WITH weights as (
select
count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight,
count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight
from
train
)
select
train_classifier(features, label, concat('-pos_weight=', pos_weight, '
-neg_weight=", neg_weight)
from
train l
cross join weights r{code}
> Support class weighting in GeneralLearnerBase
> ---------------------------------------------
>
> Key: HIVEMALL-284
> URL: https://issues.apache.org/jira/browse/HIVEMALL-284
> Project: Hivemall
> Issue Type: New Feature
> Reporter: Makoto Yui
> Priority: Minor
> Fix For: 0.7.0
>
>
> [https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit]
> [https://scikit-learn.org/dev/glossary.html#term-class-weight]
> Introduce "-class_weight=[0.1,0.2]" or "-pos_weight=0.2 -neg_weight=0.1"
> option.
> [https://github.com/scikit-learn/scikit-learn/blob/0a7adef0058ef28c7a146734f38161f7c7c581af/sklearn/linear_model/_sgd_fast.pyx#L719]
> class_weight is computed in scikit as follows:
> > class_weight_y = #samples / (#classes * count_of(y))
> In SQL, it can be computed in SQL as follows:
> {code:java}
> -- For binary classification (#classes = 2)
> WITH weights as (
> select
> count(1) / 2 * sum(if(label=0, 1, 0) as neg_weight,
> count(1) / 2 * sum(if(label=1, 1, 0) as pos_weight
> from
> train
> )
> select
> train_classifier(features, label, concat('-pos_weight=', pos_weight, '
> -neg_weight=', neg_weight)
> from
> train l
> cross join weights r{code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)