Repository: madlib
Updated Branches:
  refs/heads/master d62e5516b -> e0f76db8b


add caution on run-times to assoc rules user docs re: max itemset size usage


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/e0f76db8
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/e0f76db8
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/e0f76db8

Branch: refs/heads/master
Commit: e0f76db8bf2d7ca478d972cef302939b6f2babb5
Parents: d62e551
Author: Frank McQuillan <fmcquil...@pivotal.io>
Authored: Tue Sep 18 15:02:18 2018 -0700
Committer: Frank McQuillan <fmcquil...@pivotal.io>
Committed: Tue Sep 18 15:02:18 2018 -0700

----------------------------------------------------------------------
 .../modules/assoc_rules/assoc_rules.sql_in         | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/madlib/blob/e0f76db8/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in 
b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
index ec3c330..bcd5464 100644
--- a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
+++ b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
@@ -161,6 +161,12 @@ Given a frequent itemset \f$ A \f$ generated from the 
Apriori algorithm, and all
 subsets \f$ B \f$ , we generate rules such that \f$ B \Rightarrow (A - B) \f$
 meets minimum confidence requirements.
 
+@note Beware of combinatorial explosion.  The Apriori algorithm can potentially
+generate a huge number of rules, even for fairly simple data sets, resulting
+in run-times that are unreasonably long.  To avoid this, it is recommended
+to cap the maximum itemset size to a small number to start with, then
+increase it gradually.  <em>Support</em> and <em>confidence</em> values are
+parameters that can also be used to control rule generation.
 
 @anchor syntax
 @par Function Syntax
@@ -257,14 +263,16 @@ This generates all association rules that satisfy the 
specified minimum
   \c conviction columns are calculated as described earlier.
   </dd>
 
-  <dt>verbose</dt>
+  <dt>verbose (optional)</dt>
   <dd>BOOLEAN, default: FALSE. Determines if details are printed for each 
iteration
   as the algorithm progresses.</dd>
 
-  <dt>max_itemset_size</dt>
+  <dt>max_itemset_size (optional)</dt>
   <dd>INTEGER, default: generate itemsets of all sizes. Determines the maximum 
size of frequent
   itemsets that are used for generating association rules. Must be 2 or more.
-  This parameter can be used to reduce run time for data sets where itemset 
size is large. </dd>
+  This parameter can be used to reduce run time for data sets where itemset 
size is large,
+  which is a common situation. If your query is not returning or is running 
too long,
+  try using a lower value for this parameter.</dd>
 </dl>
 
 
@@ -338,7 +346,8 @@ Result:
 (7 rows)
 </pre>
 
--# Limit association rules generated from itemsets of size at most 2:
+-# Limit association rules generated from itemsets of size at most 2.  This 
parameter is
+a good way to reduce long run times.
 <pre class="example">
 SELECT * FROM madlib.assoc_rules( .25,            -- Support
                                   .5,             -- Confidence

Reply via email to