[ https://issues.apache.org/jira/browse/MADLIB-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429156#comment-16429156 ]
Nandish Jayaram commented on MADLIB-1225: ----------------------------------------- The variable importance computation involves randomization inherently. So it is hard to reproduce this error consistently. But, I did try to run the offending query multiple times, and it looks like with the latest code, the failure happens around 4.3% of the time (26 failures in 600 runs). I tried to see if this is a result of some changes that have gone into RF/DT post 1.13 by running the same experiment with 1.13 code base. The failure happens there too, at around 4.8% (29 failures in 600 runs). The failures happen even when we tune various hyper params. For instance, I tried increasing the value of the following hyper params in the offending query: num_permutations and num_trees. I also tried to increase the number of rows in the training data. But, failure happens sporadically in all cases. > Sporadic install check failures in random forest > ------------------------------------------------ > > Key: MADLIB-1225 > URL: https://issues.apache.org/jira/browse/MADLIB-1225 > Project: Apache MADlib > Issue Type: Bug > Components: Module: Random Forest > Reporter: Nandish Jayaram > Priority: Major > Fix For: v1.14 > > > Install check seems to fail for random forest sporadically. The failure > happens for the test which deals with variable importance in the install > check. > The error in the log when a failure happens is: > {code} > SELECT > assert(cat_var_importance[1] > con_var_importance[1], 'class should be > important!'), > assert(cat_var_importance[1] > cat_var_importance[2], 'class should be > important!') > FROM train_output_group; > psql:/tmp/madlib.WW_EyD/recursive_partitioning/test/random_forest.sql_in.tmp:158: > ERROR: Failed assertion: class should be important! (seg0 slice1 > 93e250c8-8924-4a80-5c68-1464f40b0395:25432 pid=91044) > {code} > The last RF install-check query that was run before the error was: > {code} > SELECT forest_train( > 'dt_golf', -- source table > 'train_output', -- output model table > 'id', -- id column > 'class::TEXT', -- response > 'class, windy, temperature', -- features > NULL, -- exclude columns > NULL, -- no grouping > 10, -- num of trees > 1, -- num of random features > TRUE, -- importance > 3, -- num_permutations > 10, -- max depth > 1, -- min split > 1, -- min bucket > 8, -- number of bins per continuous variable > 'max_surrogates=0', > FALSE > ); > SELECT * from train_output_summary; > -[ RECORD 1 ]---------+-------------------------------- > method | forest_train > is_classification | t > source_table | dt_golf > model_table | train_output > id_col_name | id > dependent_varname | class::TEXT > independent_varnames | class,windy,temperature > cat_features | class,windy > con_features | temperature > grouping_cols | > num_trees | 10 > num_random_features | 1 > max_tree_depth | 10 > min_split | 1 > min_bucket | 1 > num_splits | 8 > verbose | f > importance | t > num_permutations | 3 > num_all_groups | 1 > num_failed_groups | 0 > total_rows_processed | 16 > total_rows_skipped | 0 > dependent_var_levels | "Don't Play","Play" > dependent_var_type | text > independent_var_types | text, boolean, double precision > null_proxy | None > SELECT * from train_output_group; > -[ RECORD 1 ]------+--------------------------------------- > gid | 1 > success | t > cat_n_levels | \{2,2} > cat_levels_in_text | \{"Don't Play",Play,False,True} > oob_error | 0.20000000000000000000 > cat_var_importance | \{0.0244444444444445,0.025487012987013} > con_var_importance | \{0} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)