[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-25 Thread zhangyouhua (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258697#comment-14258697
 ] 

zhangyouhua commented on SPARK-4001:


apriori and FPGrowth are commonly algorithm,there are two mainly method.one is 
calculate frequently item set,another is calculate association rules.here we 
have implementate formar,the latter will be submit as soon as possible.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li
 Attachments: Distributed frequent item mining algorithm based on 
 Spark.pptx


 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-25 Thread zhangyouhua (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258695#comment-14258695
 ] 

zhangyouhua commented on SPARK-4001:


apriori and FPGrowth are commonly algorithm,there are two mainly method.one is 
calculate frequently item set,another is calculate association rules.here we 
have implementate formar,the latter will be submit as soon as possible.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li
 Attachments: Distributed frequent item mining algorithm based on 
 Spark.pptx


 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-25 Thread zhangyouhua (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14258696#comment-14258696
 ] 

zhangyouhua commented on SPARK-4001:


apriori and FPGrowth are commonly algorithm,there are two mainly method.one is 
calculate frequently item set,another is calculate association rules.here we 
have implementate formar,the latter will be submit as soon as possible.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li
 Attachments: Distributed frequent item mining algorithm based on 
 Spark.pptx


 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-05 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236501#comment-14236501
 ] 

Jacky Li commented on SPARK-4001:
-

Sure, Xiangrui. I will update it on next Monday while in office.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-03 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233067#comment-14233067
 ] 

Xiangrui Meng commented on SPARK-4001:
--

[~jackylk] Could you share some performance testing results?

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-11-26 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226562#comment-14226562
 ] 

Jacky Li commented on SPARK-4001:
-

Thanks for your suggestion, Daniel. 
Here is the current status.
1. Currently I have implemented apriori and fp-growth by referring to YAFIM 
(http://pasa-bigdata.nju.edu.cn/people/ronggu/pub/YAFIM_ParLearning.pdf) and 
PFP (http://dl.acm.org/citation.cfm?id=1454027) 
For apriori, currently there are two versions implemented, one using broadcast 
variable and another one using cartisian join of two RDD, I am testing them 
using mushroom and webdoc open dataset (http://fimi.ua.ac.be/data/) to check 
the performance of them before deciding which one to contribute to MLlib.
I have updated the code in the PR (https://github.com/apache/spark/pull/2847), 
you are welcome to check it and try in your use case.
2. For the input part, currently the apriori algo is taking  
{{RDD\[Array\[String\]\]}} as the input dataset, but not containing basket_id 
or user_id. I am not sure whether it can easily fit into your use case. Can you 
give more detail of how you want to use it in collaborative filtering contexts? 


 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-11-26 Thread Daniel Erenrich (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226793#comment-14226793
 ] 

Daniel Erenrich commented on SPARK-4001:


First a minor point: I'd suggest making it take an RDD of arrays of ints? 
Holding tons of strings around seems wasteful. The user can maintain a map from 
strings to ints. Or else can we just make the array contain comparables?

So the main issue is whether we are trying to do frequent itemset mining or 
association rule construction. I argue the latter is more common an operation 
and while the second requires the first there's no great extra cost to doing 
both. I'm actually unfamiliar with what can be done with just the former and 
not the latter.

If you equate baskets with users the connection between association rules and 
collaborative filtering becomes very clear.  I want to feed in someone's, say, 
movie viewing history and get reccomendations of the form you watched X and Y 
so you'll really like Z (where X,Y,Z is a frequent itemset).

The API could be made to match. Give me all the things this user bought in the 
format I described above and the prediction mode is here are all of things 
this person bought please apply as many rules as you can. If a user does care 
about the frequent item sets that would be additionally stored inside the model.

The alternative here is to make a set of frequent itemset miners and then make 
an association rule learner that takes their output. The only downside is that 
that suffers some perf loss (requiring an additional pass). I'll gladly write 
this version if we decide that's the way we should go.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-11-25 Thread Daniel Erenrich (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225728#comment-14225728
 ] 

Daniel Erenrich commented on SPARK-4001:


I was about to start coding something like this when I noticed this ticket. 
What's the status here?

Association rule algorithms in general (and apriori in particular) are useful 
in collaborative filtering contexts (which mllib already has code for). 

As far as library cohesiveness, my though here is that we can frame the inputs 
to look near identical to the matrix facotarization code though with 
(basket_id, item_id) instead of (user_id, item_id). That input format would be 
inefficient though (so maybe we'd support a second more natural input format. 
This though would sidestep the concern the sklearn folks had.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-21 Thread wangfei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178183#comment-14178183
 ] 

wangfei commented on SPARK-4001:




Thanks Sean Owen for explaining! Frequent itemset algorithm works by scanning 
the input data set, there is no probabilistic model in nature. 

To answer Xiangrui Meng’s earlier questions:
1.  These algorithm is used for finding major patterns / association rules 
in a data set. For a real use case, some analytic applications in telecom 
domain use them to find subscriber behavior from the data set combining service 
record, network traffic record, and demographic data. Please refer to this 
Chinese article for example:  http://www.ss-lw.com/wxxw-361.html
And, sometimes we use frequent itemset algorithm for preparing features input 
to other algorithm which selects feature and do other ML task like training a 
classifier, like this paper: http://dl.acm.org/citation.cfm?id=1401922, 

2.  Since Apriori is a basic algorithm for frequent itemset mining, I am 
not aware of any parallel implementation for it. But I think the algorithm fits 
Spark’s data parallel model since it only need to scan the input data set. And 
for FP-Growth, I do know there is a Parallel FP-Growth from Haoyuan Li: 
http://dl.acm.org/citation.cfm?id=1454027  . I think I probably will refer  to 
this paper to implement FP-Growth in Spark 

3.  The Apriori computation complexity is about O(N*k) where N is the 
number of item in input data and k is the depth of the frequent item tree to 
search. FP-Grwoth complexity is about O(N),  it is more efficient comparing to 
Apriori. For space efficiency, FP-growth is also more efficient than Apriori. 
But in case of smaller data and if frequent itemset is more, Apriori is more 
efficient. This is because FP-Growth need to construct a FP Tree out of the 
input data set, and it needs some time. And another advantage of Apriori is 
that it can output association rules while FP-Growth can not.

Although these two algorithms are basic algo (FP-Growth is more complex), I 
think it will be handy if mllib can include them since there is no frequent 
itemset mining algo in Spark yet, and especially in distributed environment. 
Please suggest how to handle this issue. Thanks a lot. 


 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-21 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178200#comment-14178200
 ] 

Jacky Li commented on SPARK-4001:
-


I accidentally use wangfei (my college) 's account to send the last comment, 
sorry for the inconvenience. 

Thanks @Sean Owen for explaining! Frequent itemset algorithm works by scanning 
the input data set, there is no probabilistic model in nature. 

To answer @Xiangrui Meng’s earlier questions:
1.  These algorithm is used for finding major patterns / association rules 
in a data set. For a real use case, some analytic applications in telecom 
domain use them to find subscriber behavior from the data set combining service 
record, network traffic record, and demographic data. Please refer to this 
Chinese article for example:  http://www.ss-lw.com/wxxw-361.html
And, sometimes we use frequent itemset algorithm for preparing features input 
to other algorithm which selects feature and do other ML task like training a 
classifier, like this paper: http://dl.acm.org/citation.cfm?id=1401922, 

2.  Since Apriori is a basic algorithm for frequent itemset mining, I am 
not aware of any parallel implementation for it. But I think the algorithm fits 
Spark’s data parallel model since it only need to scan the input data set. And 
for FP-Growth, I do know there is a Parallel FP-Growth from Haoyuan Li: 
http://dl.acm.org/citation.cfm?id=1454027  . I think I probably will refer  to 
this paper to implement FP-Growth in Spark 

3.  The Apriori computation complexity is about O(N*k) where N is the 
number of item in input data and k is the depth of the frequent item tree to 
search. FP-Grwoth complexity is about O(N),  it is more efficient comparing to 
Apriori. For space efficiency, FP-growth is also more efficient than Apriori. 
But in case of smaller data and if frequent itemset is more, Apriori is more 
efficient. This is because FP-Growth need to construct a FP Tree out of the 
input data set, and it needs some time. And another advantage of Apriori is 
that it can output association rules while FP-Growth can not.

Although these two algorithms are basic algo (FP-Growth is more complex), I 
think it will be handy if mllib can include them since there is no frequent 
itemset mining algo in Spark yet, and especially in distributed environment. 
Please suggest how to handle this issue. Thanks a lot. 

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177081#comment-14177081
 ] 

Xiangrui Meng commented on SPARK-4001:
--

[~jackylk] Could you provide references and comment on its use cases? I'm not 
familiar with the database context. How does it compare to freq-item 
algorithms, e.g., http://dl.acm.org/citation.cfm?id=762473 ?


 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177640#comment-14177640
 ] 

Xiangrui Meng commented on SPARK-4001:
--

No.

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177643#comment-14177643
 ] 

Jacky Li commented on SPARK-4001:
-

Maybe there is a misunderstand, I do not mean to use it in database by saying 
transactional data set. 
Apriori is the classic algorithm for freq-item mining,  and the frequent item 
sets determined by Apriori can be used to determine association rules which 
highlight general trends in the data set. This has applications in domains such 
as market basket analysis for cross-selling and up-selling.
 
I am not sure whether there is a freq-item algorithm in the mllib, so actually 
I am implementing Apriori and FP-Growth (not include in this JIRA thread). Will 
it be useful if contribute to mllib?

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177658#comment-14177658
 ] 

Xiangrui Meng commented on SPARK-4001:
--

I'm asking because I'm not very familiar with this algorithm. I need some 
references to understand the following:

1) how important/popular the algorithm is (paper, use cases)
2) whether it is straight-forward to implement it in parallel
3) what is the storage and computation complexity
4) whether there are alternatives and how they compare

For example, for the one I mentioned (A simple algorithm for finding frequent 
elements in streams and bags):

1) it has 400 references on google scholar / it finds frequent items that 
appears above a threshold in two passes
2) it is very easy to implement in parallel
3) storage is O(1/p) and two-passes, where p is threshold on the frequency

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177707#comment-14177707
 ] 

Sean Owen commented on SPARK-4001:
--

FWIW I do perceive Apriori to be *the* basic frequent itemset algorithm. I 
think this is the original paper -- at least it was on Wikipedia and looks like 
the right time / author: 
http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf  It is very simple, 
and probably what you'd cook up if you invented a solution to the problem: 
http://en.wikipedia.org/wiki/Apriori_algorithm

Frequent itemset is not quite the same as a frequent item algorithm. From a 
bunch of sets of items, it tries to determine which subsets occur frequently.

FP-Growth is the other itemset algorithm I have ever heard of. It's more 
sophisticated. I don't have a paper reference.

If you're going to implement frequent itemsets, I think these are the two to 
start with. That said I perceive frequent itemsets to be kind of 90s and I 
have never had to use it myself. That is not to say they don't have use, and 
hey they're simple. I suppose my problem with this type of technique is that 
it's not really telling you whether the set occurred unusually frequently, just 
that it did in absolute terms. There is not a probabilistic element to these.


 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177801#comment-14177801
 ] 

Xiangrui Meng commented on SPARK-4001:
--

Thanks [~srowen]! That's helpful! I searched on Google and found some 
discussion in scikit-learn:

1. https://github.com/scikit-learn/scikit-learn/issues/2662
2. https://github.com/scikit-learn/scikit-learn/issues/2872

They didn't accept the patch for couple reasons: a) relevance, b) integration 
with other components.

[~jackylk] Could you comment on the real use cases that you are aware of and 
the complexity of the algorithm?

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Jacky Li
Assignee: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176262#comment-14176262
 ] 

Apache Spark commented on SPARK-4001:
-

User 'jackylk' has created a pull request for this issue:
https://github.com/apache/spark/pull/2847

 Add Apriori algorithm to Spark MLlib
 

 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Jacky Li

 Apriori is the classic algorithm for frequent item set mining in a 
 transactional data set.  It will be useful if Apriori algorithm is added to 
 MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org