[ https://issues.apache.org/jira/browse/MADLIB-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1288: ------------------------------------ Fix Version/s: (was: v2.0) v1.16 > Set max itemset size to 10 by default in assoc rules > ---------------------------------------------------- > > Key: MADLIB-1288 > URL: https://issues.apache.org/jira/browse/MADLIB-1288 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Association Rules > Reporter: Frank McQuillan > Priority: Major > Fix For: v1.16 > > > Story > As a data scientist, > I want to default itemset size to 10, > so that assoc rules does not run for a long time. > Details > We have had some complaints about how long assoc rules runs which could have > to do with the implementation, or wrong parameter settings by the user, but > may also be due to combinatorial explosion of number of generated rules. > The R param `maxlen` is default to 10 > https://cran.r-project.org/web/packages/arules/arules.pdf > see page 10 "apriori - mining associations with apriori" > which is the same as the madlib param `max_itemset_size` > http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html > "If the minimum support is chosen too low for the dataset, > then the algorithm will try to create an extremely large set of > itemsets/rules. This will result in > very long run time and eventually the process will run out of memory. To > prevent this, the default > maximal length of itemsets/rules is restricted to 10 items (via the parameter > element `maxlen=10`)..." > Interface > Stays the same. The allowed values for max_itemset_size are: > * any number 2 or more > * if not specified set to 10 (default) > * can also accept `ALL` as in input which means generate itemsets of all > sizes - this is the current behavior today in 1.15.1 > Acceptance > 1) Set `max_itemset_size` parameter to 100 and run a data set that creates > rules with more than 10 items. > 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule > size limit is respected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)