[
https://issues.apache.org/jira/browse/MADLIB-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rahul Iyer updated MADLIB-1246:
-------------------------------
Description:
>From the Breiman resource that we use for random forest:
{quote}Gini importance
{quote}
{quote}Every time a split of a node is made on variable m the gini impurity
criterion for the two descendent nodes is less than the parent node. Adding up
the gini decreases for each individual variable over all trees in the forest
gives a fast variable importance that is often very consistent with the
permutation importance measure.
{quote}
We can add a similar measure in our RF called as
{{impurity_variable_importance}} which would be the average
{{impurity_variable_importance}} across all trees.
was:
>From the Breiman resource that we use for random forest:
{quote}Gini importance
{quote}
{quote}Every time a split of a node is made on variable m the gini impurity
criterion for the two descendent nodes is less than the parent node. Adding up
the gini decreases for each individual variable over all trees in the forest
gives a fast variable importance that is often very consistent with the
permutation importance measure.
{quote}
We can add a similar measure in our DT code called as
{{impurity_variable_importance}}.
> Add impurity variable importance to RF
> --------------------------------------
>
> Key: MADLIB-1246
> URL: https://issues.apache.org/jira/browse/MADLIB-1246
> Project: Apache MADlib
> Issue Type: New Feature
> Components: Module: Random Forest
> Reporter: Rahul Iyer
> Assignee: Rahul Iyer
> Priority: Major
> Fix For: v1.15
>
>
> From the Breiman resource that we use for random forest:
> {quote}Gini importance
> {quote}
> {quote}Every time a split of a node is made on variable m the gini impurity
> criterion for the two descendent nodes is less than the parent node. Adding
> up the gini decreases for each individual variable over all trees in the
> forest gives a fast variable importance that is often very consistent with
> the permutation importance measure.
> {quote}
> We can add a similar measure in our RF called as
> {{impurity_variable_importance}} which would be the average
> {{impurity_variable_importance}} across all trees.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)