[
https://issues.apache.org/jira/browse/MADLIB-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan closed MADLIB-1225.
-----------------------------------
> Sporadic install check failures in random forest
> ------------------------------------------------
>
> Key: MADLIB-1225
> URL: https://issues.apache.org/jira/browse/MADLIB-1225
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Random Forest
> Reporter: Nandish Jayaram
> Priority: Major
> Fix For: v1.14
>
>
> Install check seems to fail for random forest sporadically. The failure
> happens for the test which deals with variable importance in the install
> check.
> The error in the log when a failure happens is:
> {code}
> SELECT
> assert(cat_var_importance[1] > con_var_importance[1], 'class should be
> important!'),
> assert(cat_var_importance[1] > cat_var_importance[2], 'class should be
> important!')
> FROM train_output_group;
> psql:/tmp/madlib.WW_EyD/recursive_partitioning/test/random_forest.sql_in.tmp:158:
> ERROR: Failed assertion: class should be important! (seg0 slice1
> 93e250c8-8924-4a80-5c68-1464f40b0395:25432 pid=91044)
> {code}
> The last RF install-check query that was run before the error was:
> {code}
> SELECT forest_train(
> 'dt_golf', -- source table
> 'train_output', -- output model table
> 'id', -- id column
> 'class::TEXT', -- response
> 'class, windy, temperature', -- features
> NULL, -- exclude columns
> NULL, -- no grouping
> 10, -- num of trees
> 1, -- num of random features
> TRUE, -- importance
> 3, -- num_permutations
> 10, -- max depth
> 1, -- min split
> 1, -- min bucket
> 8, -- number of bins per continuous variable
> 'max_surrogates=0',
> FALSE
> );
> SELECT * from train_output_summary;
> -[ RECORD 1 ]---------+--------------------------------
> method | forest_train
> is_classification | t
> source_table | dt_golf
> model_table | train_output
> id_col_name | id
> dependent_varname | class::TEXT
> independent_varnames | class,windy,temperature
> cat_features | class,windy
> con_features | temperature
> grouping_cols |
> num_trees | 10
> num_random_features | 1
> max_tree_depth | 10
> min_split | 1
> min_bucket | 1
> num_splits | 8
> verbose | f
> importance | t
> num_permutations | 3
> num_all_groups | 1
> num_failed_groups | 0
> total_rows_processed | 16
> total_rows_skipped | 0
> dependent_var_levels | "Don't Play","Play"
> dependent_var_type | text
> independent_var_types | text, boolean, double precision
> null_proxy | None
> SELECT * from train_output_group;
> -[ RECORD 1 ]------+---------------------------------------
> gid | 1
> success | t
> cat_n_levels | \{2,2}
> cat_levels_in_text | \{"Don't Play",Play,False,True}
> oob_error | 0.20000000000000000000
> cat_var_importance | \{0.0244444444444445,0.025487012987013}
> con_var_importance | \{0}
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)