[ https://issues.apache.org/jira/browse/MADLIB-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan resolved MADLIB-1288. ------------------------------------- Resolution: Fixed LGTM , see PR for tests > Set max itemset size to 10 by default in assoc rules > ---------------------------------------------------- > > Key: MADLIB-1288 > URL: https://issues.apache.org/jira/browse/MADLIB-1288 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Association Rules > Reporter: Frank McQuillan > Priority: Minor > Fix For: v1.16 > > > Story > As a data scientist, > I want to default itemset size to 10, > so that assoc rules does not run for a long time. > Details > We have had some complaints about how long assoc rules runs which could have > to do with the implementation, or wrong parameter settings by the user, but > may also be due to combinatorial explosion of number of generated rules. > The R param `maxlen` is default to 10 > https://cran.r-project.org/web/packages/arules/arules.pdf > see page 10 "apriori - mining associations with apriori" > which is the same as the madlib param `max_itemset_size` > http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html > "If the minimum support is chosen too low for the dataset, > then the algorithm will try to create an extremely large set of > itemsets/rules. This will result in > very long run time and eventually the process will run out of memory. To > prevent this, the default > maximal length of itemsets/rules is restricted to 10 items (via the parameter > element `maxlen=10`)..." > Interface > Stays the same. The allowed values for max_itemset_size are: > * any number 2 or more > * if not specified set to 10 (default) > * if user wants all itemsets they can specify a big number like 1000 or 10000 > or whatever > Acceptance > 1) Set `max_itemset_size` parameter to 100 and run a data set that creates > rules with more than 10 items. > 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule > size limit of 10 is respected. > 3) Set `max_itemset_size` parameter to 10 and check it creates the same rules > as #2 above. -- This message was sent by Atlassian JIRA (v7.6.3#76005)