[ 
https://issues.apache.org/jira/browse/SPARK-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-6381.
------------------------------
       Resolution: Duplicate
    Fix Version/s:     (was: 1.4.0)

(Don't set fix version, and 1.3.1 does not exist.)
Search JIRA first please. This was already implemented in SPARK-4001 as 
FP-growth. See also SPARK-2432.

> add Apriori algorithm to MLLib
> ------------------------------
>
>                 Key: SPARK-6381
>                 URL: https://issues.apache.org/jira/browse/SPARK-6381
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: zhangyouhua
>
> [~mengxr]
> There are many algorithms about association rule mining,for example FPGrowth, 
> Apriori and so on.these algorithms are classic 
> algorithms in machine learning, and there are very much usefully in big data 
> mining. Even the FPGrowth algorithm in spark 
> 1.3 version have implementation to solution big big data set, but it need 
> create FPTree before mining frequent item. so 
> while transition data is smaller and the data is sparse and minSupport is 
> bigger,wen can select Apriori  algorithms. 
> how Apriori algorithm parallelism?
> 1.Generates frequent items by filtering the input data using minimal support 
> level.
>   private def genFreqItems[Item: ClassTag]( data: RDD[Array[Item]],minCount: 
> Long,partitioner: Partitioner): Array[Item]
> 2.Generate frequent itemSets by building apriori, the extraction is done on 
> each partition.
>  2.1 create candidateSet by kFreqItems and k
>      private def createCandidateSet[Item: ClassTag]( kFreqItems: 
> Array[(Array[Item], Long)], k: Int)
>  2.2 create kFreqItems from candidateSet is generated by candidateSet
>      private def scanDataSet[Item: ClassTag](dataSet: 
> RDD[Array[Item]],candidateSet: Array[Array[Item]], minCount: Double): 
> RDD[(Array[Item], Long)]
>  2.3 filter dataSet by candidateSet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to