Frank McQuillan created MADLIB-1288: ---------------------------------------
Summary: Set max itemset size to 10 by default in assoc rules Key: MADLIB-1288 URL: https://issues.apache.org/jira/browse/MADLIB-1288 Project: Apache MADlib Issue Type: Improvement Components: Module: Association Rules Reporter: Frank McQuillan Fix For: v2.0 Story As a data scientist, I want to default itemset size to 10, so that assoc rules does not run for a long time. Details We have had some complaints about how long assoc rules runs which could have to do with the implementation, or wrong parameter settings by the user, but may also be due to combinatorial explosion of number of generated rules. The R param `maxlen` is default to 10 https://cran.r-project.org/web/packages/arules/arules.pdf see page 10 "apriori - mining associations with apriori" which is the same as the madlib param `max_itemset_size` http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html "If the minimum support is chosen too low for the dataset, then the algorithm will try to create an extremely large set of itemsets/rules. This will result in very long run time and eventually the process will run out of memory. To prevent this, the default maximal length of itemsets/rules is restricted to 10 items (via the parameter element `maxlen=10`)..." Acceptance 1) Set `max_itemset_size` parameter to 100 and run a data set that creates rules with more than 10 items. 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule size limit is respected. -- This message was sent by Atlassian JIRA (v7.6.3#76005)