[ 
https://issues.apache.org/jira/browse/SPARK-13385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng updated SPARK-13385:
---------------------------------
    Priority: Major  (was: Minor)

> Enable AssociationRules to generate consequents with user-defined lengths
> -------------------------------------------------------------------------
>
>                 Key: SPARK-13385
>                 URL: https://issues.apache.org/jira/browse/SPARK-13385
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.6.0
>            Reporter: zhengruifeng
>            Assignee: zhengruifeng
>         Attachments: rule-generation.pdf
>
>
> AssociationRules should generates all association rules with user-defined 
> iterations, no just rules which have a single item as the consequent.
> Such as:
> 39 804 ==> 413 743 819 #SUP: 1023 #CONF: 0.70117
> 39 743 ==> 413 804 819 #SUP: 1023 #CONF: 0.93939
> 39 413 ==> 743 804 819 #SUP: 1023 #CONF: 0.6007
> 819 ==> 39 413 743 804 #SUP: 1023 #CONF: 0.15418
> 804 ==> 39 413 743 819 #SUP: 1023 #CONF: 0.12997
> 743 ==> 39 413 804 819 #SUP: 1023 #CONF: 0.7276
> 39 ==> 413 743 804 819 #SUP: 1023 #CONF: 0.12874
> ...
> I have implemented it based on Apriori's Rule-Generation Algorithm:
> https://github.com/zhengruifeng/spark-rules
> It's compatible with fpm's APIs.
> import org.apache.spark.mllib.fpm._
> val data = sc.textFile("hdfs://ns1/whale/T40I10D100K.dat")
> val transactions = data.map(s => s.trim.split(' ')).persist()
> val fpg = new FPGrowth().setMinSupport(0.01)
> val model = fpg.run(transactions)
> val ar = new AprioriRules().setMinConfidence(0.1).setMaxConsequent(15)
> val results = ar.run(model.freqItemsets)
> and it output rule-generation infomation like this:
> 15/11/04 11:28:46 INFO AprioriRules: Candidates for 1-consequent rules : 
> 312917
> 15/11/04 11:28:58 INFO AprioriRules: Generated 1-consequent rules : 306703
> 15/11/04 11:29:10 INFO AprioriRules: Candidates for 2-consequent rules : 
> 707747
> 15/11/04 11:29:35 INFO AprioriRules: Generated 2-consequent rules : 704000
> 15/11/04 11:29:55 INFO AprioriRules: Candidates for 3-consequent rules : 
> 1020253
> 15/11/04 11:30:38 INFO AprioriRules: Generated 3-consequent rules : 1014002
> 15/11/04 11:31:14 INFO AprioriRules: Candidates for 4-consequent rules : 
> 972225
> 15/11/04 11:32:00 INFO AprioriRules: Generated 4-consequent rules : 956483
> 15/11/04 11:32:44 INFO AprioriRules: Candidates for 5-consequent rules : 
> 653749
> 15/11/04 11:33:32 INFO AprioriRules: Generated 5-consequent rules : 626993
> 15/11/04 11:34:07 INFO AprioriRules: Candidates for 6-consequent rules : 
> 331038
> 15/11/04 11:34:50 INFO AprioriRules: Generated 6-consequent rules : 314455
> 15/11/04 11:35:10 INFO AprioriRules: Candidates for 7-consequent rules : 
> 138490
> 15/11/04 11:35:43 INFO AprioriRules: Generated 7-consequent rules : 136260
> 15/11/04 11:35:57 INFO AprioriRules: Candidates for 8-consequent rules : 48567
> 15/11/04 11:36:14 INFO AprioriRules: Generated 8-consequent rules : 47331
> 15/11/04 11:36:24 INFO AprioriRules: Candidates for 9-consequent rules : 12430
> 15/11/04 11:36:33 INFO AprioriRules: Generated 9-consequent rules : 11925
> 15/11/04 11:36:37 INFO AprioriRules: Candidates for 10-consequent rules : 2211
> 15/11/04 11:36:47 INFO AprioriRules: Generated 10-consequent rules : 2064
> 15/11/04 11:36:55 INFO AprioriRules: Candidates for 11-consequent rules : 246
> 15/11/04 11:36:58 INFO AprioriRules: Generated 11-consequent rules : 219
> 15/11/04 11:37:00 INFO AprioriRules: Candidates for 12-consequent rules : 13
> 15/11/04 11:37:03 INFO AprioriRules: Generated 12-consequent rules : 11
> 15/11/04 11:37:03 INFO AprioriRules: Candidates for 13-consequent rules : 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to