[ 
https://issues.apache.org/jira/browse/MADLIB-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan resolved MADLIB-1288.
-------------------------------------
    Resolution: Fixed

LGTM , see PR for tests

> Set max itemset size to 10 by default in assoc rules
> ----------------------------------------------------
>
>                 Key: MADLIB-1288
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1288
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Association Rules
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> Story
> As a data scientist,
> I want to default itemset size to 10,
> so that assoc rules does not run for a long time.
> Details
> We have had some complaints about how long assoc rules runs which could have 
> to do with the implementation, or wrong parameter settings by the user, but 
> may also be due to combinatorial explosion of number of generated rules.  
> The R param `maxlen` is default to 10
> https://cran.r-project.org/web/packages/arules/arules.pdf
> see page 10 "apriori - mining associations with apriori"
> which is the same as the madlib param `max_itemset_size`
> http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html
> "If the minimum support is chosen too low for the dataset,
> then the algorithm will try to create an extremely large set of 
> itemsets/rules. This will result in
> very long run time and eventually the process will run out of memory. To 
> prevent this, the default
> maximal length of itemsets/rules is restricted to 10 items (via the parameter 
> element `maxlen=10`)..."
> Interface
> Stays the same.  The allowed values for max_itemset_size are:
> * any number 2 or more
> * if not specified set to 10 (default)
> * if user wants all itemsets they can specify a big number like 1000 or 10000 
> or whatever
> Acceptance
> 1) Set `max_itemset_size` parameter to 100 and run a data set that creates 
> rules with more than 10 items.
> 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule 
> size limit of 10 is respected.
> 3) Set `max_itemset_size` parameter to 10 and check it creates the same rules 
> as #2 above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to