[ 
https://issues.apache.org/jira/browse/MADLIB-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537837#comment-16537837
 ] 

ASF GitHub Bot commented on MADLIB-1205:
----------------------------------------

GitHub user orhankislal opened a pull request:

    https://github.com/apache/madlib/pull/289

    RF: Add impurity variable importance

    JIRA: MADLIB-1205
    
    This commit makes the following changes:
    - Add impurity variable importance for random forests.
    - Rename current cat_var_importance and con_var_importance measurements to
    oob_cat_var_importance and oob_con_var_importance.
    
    New impurity measurement is provided as impurity_var_importance, and 
supports
    grouping. It combines the importance values for both categorical and
    continuous features into a single array.
    
    Co-authored-by: Rahul Iyer <[email protected]>
    Co-authored-by: Jingyi Mei <[email protected]>
    Co-authored-by: Arvind Sridhar <[email protected]>
    Co-authored-by: Nandish Jayaram <[email protected]>

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib rf_gini_importance

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #289
    
----
commit 622d46a85f4264fdc94bd41dc66a23f1aa2c3ed6
Author: Rahul Iyer <riyer@...>
Date:   2018-07-10T00:34:33Z

    RF: Add impurity variable importance
    
    JIRA: MADLIB-1205
    
    This commit makes the following changes:
    - Add impurity variable importance for random forests.
    - Rename current cat_var_importance and con_var_importance measurements to
    oob_cat_var_importance and oob_con_var_importance.
    
    New impurity measurement is provided as impurity_var_importance, and 
supports
    grouping. It combines the importance values for both categorical and
    continuous features into a single array.
    
    Co-authored-by: Rahul Iyer <[email protected]>
    Co-authored-by: Jingyi Mei <[email protected]>
    Co-authored-by: Arvind Sridhar <[email protected]>
    Co-authored-by: Nandish Jayaram <[email protected]>

----


> Add gini importance to DT
> -------------------------
>
>                 Key: MADLIB-1205
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1205
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Decision Tree, Module: Random Forest
>            Reporter: Rahul Iyer
>            Assignee: Rahul Iyer
>            Priority: Major
>             Fix For: v1.15
>
>
> From the Breiman resource that we use for random forest:
> {quote}Gini importance
> {quote}
> {quote}Every time a split of a node is made on variable m the gini impurity 
> criterion for the two descendent nodes is less than the parent node. Adding 
> up the gini decreases for each individual variable over all trees in the 
> forest gives a fast variable importance that is often very consistent with 
> the permutation importance measure.
> {quote}
> We can add a similar measure in our DT code called as 
> {{impurity_variable_importance}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to