[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-12-26 Thread Jacky Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16729261#comment-16729261
 ] 

Jacky Li commented on SPARK-24630:
--

Actually I encountered this scenario earlier, so we have implemented some 
commands for using SparkStreaming(or StructureStreaming) on Apache CarbonData, 
you can refer to the StreamSQL section in this 
[doc|http://carbondata.apache.org/streaming-guide.html] for more detail.

It is good if Spark community can use similar or same syntax, if possible, then 
in future version of CarbonData can migrate to Spark's syntax.


> SPIP: Support SQLStreaming in Spark
> ---
>
> Key: SPARK-24630
> URL: https://issues.apache.org/jira/browse/SPARK-24630
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Jackey Lee
>Priority: Minor
>  Labels: SQLStreaming
> Attachments: SQLStreaming SPIP V2.pdf
>
>
> At present, KafkaSQL, Flink SQL(which is actually based on Calcite), 
> SQLStream, StormSQL all provide a stream type SQL interface, with which users 
> with little knowledge about streaming,  can easily develop a flow system 
> processing model. In Spark, we can also support SQL API based on 
> StructStreamig.
> To support for SQL Streaming, there are two key points: 
> 1, Analysis should be able to parse streaming type SQL. 
> 2, Analyzer should be able to map metadata information to the corresponding 
> Relation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24630) SPIP: Support SQLStreaming in Spark

2018-11-27 Thread Jacky Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701444#comment-16701444
 ] 

Jacky Li commented on SPARK-24630:
--

Beside the CREATE TABLE/STREAM to create the source and the sink, is there any 
syntax to manipulate the streaming job, like starting the job and stopping the 
stop the job? If I understand correctly, currently INSERT statement is proposed 
to kick off the structstreaming job, since this streaming job is continous, I 
am wondering is there a way to show/desc/stop it?


> SPIP: Support SQLStreaming in Spark
> ---
>
> Key: SPARK-24630
> URL: https://issues.apache.org/jira/browse/SPARK-24630
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Jackey Lee
>Priority: Minor
>  Labels: SQLStreaming
> Attachments: SQLStreaming SPIP.pdf
>
>
> At present, KafkaSQL, Flink SQL(which is actually based on Calcite), 
> SQLStream, StormSQL all provide a stream type SQL interface, with which users 
> with little knowledge about streaming,  can easily develop a flow system 
> processing model. In Spark, we can also support SQL API based on 
> StructStreamig.
> To support for SQL Streaming, there are two key points: 
> 1, Analysis should be able to parse streaming type SQL. 
> 2, Analyzer should be able to map metadata information to the corresponding 
> Relation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16316) dataframe except API returning wrong result in spark 1.5.0

2016-06-29 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated SPARK-16316:
-
Description: 
Version: spark 1.5.0
Use case:  use except API to do subtract between two dataframe

scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> dfa.except(dfb).count
res13: Long = 0

It should return 90 instead of 0

While following statement works fine
scala> dfa.except(dfb).rdd.count
res13: Long = 90

I guess the bug maybe somewhere in dataframe.count


  was:
Version: spark 1.5.0
Use case:  use except API to do subtract between two dataframe

scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> dfa.except(dfb).count
res13: Long = 0

It should return 90 instead of 0

While following statement works fine
scala> dfa.except(dfb).rdd.count
res13: Long = 0

I guess the bug maybe somewhere in dataframe.count



> dataframe except API returning wrong result in spark 1.5.0
> --
>
> Key: SPARK-16316
> URL: https://issues.apache.org/jira/browse/SPARK-16316
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Jacky Li
>
> Version: spark 1.5.0
> Use case:  use except API to do subtract between two dataframe
> scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
> dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]
> scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
> dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]
> scala> dfa.except(dfb).count
> res13: Long = 0
> It should return 90 instead of 0
> While following statement works fine
> scala> dfa.except(dfb).rdd.count
> res13: Long = 90
> I guess the bug maybe somewhere in dataframe.count



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16316) dataframe except API returning wrong result in spark 1.5.0

2016-06-29 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated SPARK-16316:
-
Description: 
Version: spark 1.5.0
Use case:  use except API to do subtract between two dataframe

scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> dfa.except(dfb).count
res13: Long = 0

It should return 90 instead of 0

While following statement works fine
scala> dfa.except(dfb).rdd.count
res13: Long = 0

I guess the bug maybe somewhere in dataframe.count


  was:
Version: spark 1.5.0
Use case:  use except API to do subtract between two dataframe

scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> dfa.except(dfb).count
res13: Long = 0

It should return 90 instead of 0



> dataframe except API returning wrong result in spark 1.5.0
> --
>
> Key: SPARK-16316
> URL: https://issues.apache.org/jira/browse/SPARK-16316
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Jacky Li
>
> Version: spark 1.5.0
> Use case:  use except API to do subtract between two dataframe
> scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
> dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]
> scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
> dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]
> scala> dfa.except(dfb).count
> res13: Long = 0
> It should return 90 instead of 0
> While following statement works fine
> scala> dfa.except(dfb).rdd.count
> res13: Long = 0
> I guess the bug maybe somewhere in dataframe.count



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16316) dataframe except API returning wrong result in spark 1.5.0

2016-06-29 Thread Jacky Li (JIRA)
Jacky Li created SPARK-16316:


 Summary: dataframe except API returning wrong result in spark 1.5.0
 Key: SPARK-16316
 URL: https://issues.apache.org/jira/browse/SPARK-16316
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Jacky Li


Version: spark 1.5.0
Use case:  use except API to do subtract between two dataframe

scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j")
dfa: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j")
dfb: org.apache.spark.sql.DataFrame = [i: int, j: int]

scala> dfa.except(dfb).count
res13: Long = 0

It should return 90 instead of 0




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6007) Add numRows param in DataFrame.show

2015-02-25 Thread Jacky Li (JIRA)
Jacky Li created SPARK-6007:
---

 Summary: Add numRows param in DataFrame.show
 Key: SPARK-6007
 URL: https://issues.apache.org/jira/browse/SPARK-6007
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.1
Reporter: Jacky Li
Priority: Minor
 Fix For: 1.3.0


Currently, DataFrame.show only takes 20 rows to show, it will be useful if the 
user can decide how many rows to show by passing it as a parameter in show()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5939) Make FPGrowth example app take parameters

2015-02-21 Thread Jacky Li (JIRA)
Jacky Li created SPARK-5939:
---

 Summary: Make FPGrowth example app take parameters
 Key: SPARK-5939
 URL: https://issues.apache.org/jira/browse/SPARK-5939
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.2.1
Reporter: Jacky Li
Priority: Minor
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4787) Resource unreleased during failure in SparkContext initialization

2014-12-08 Thread Jacky Li (JIRA)
Jacky Li created SPARK-4787:
---

 Summary: Resource unreleased during failure in SparkContext 
initialization
 Key: SPARK-4787
 URL: https://issues.apache.org/jira/browse/SPARK-4787
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
Reporter: Jacky Li
 Fix For: 1.3.0


When client creates a SparkContext, currently there are many val to initialize 
during object initialization. But when there is failure initializing these val, 
like throwing an exception, the resources in this SparkContext is not released 
properly. 
For example, SparkUI object is created and bind to the HTTP server during 
initialization using
{{ui.foreach(_.bind())}}
but if anything goes wrong after this code (say throwing an exception when 
creating DAGScheduler), the SparkUI server is not stopped, thus the port bind 
will fail again in the client when creating another SparkContext. So basically 
this leads to a situation that the client can not create another SparkContext 
in the same process, which I think it is not reasonable.

So, I suggest to refactor the SparkContext code to release resource when there 
is failure during in initialization.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-08 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated SPARK-4001:

Attachment: Distributed frequent item mining algorithm based on Spark.pptx

[~mengxr] please check the attached file, we have tested it using a open data 
set from http://fimi.ua.ac.be/data/ 
Currently our test cluster is small (4 nodes), we will test it in a larger 
cluster later, if required.

> Add Apriori algorithm to Spark MLlib
> 
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jacky Li
>Assignee: Jacky Li
> Attachments: Distributed frequent item mining algorithm based on 
> Spark.pptx
>
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-12-05 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236501#comment-14236501
 ] 

Jacky Li commented on SPARK-4001:
-

Sure, Xiangrui. I will update it on next Monday while in office.

> Add Apriori algorithm to Spark MLlib
> 
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jacky Li
>Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4699) Make caseSensitive configurable in Analyzer.scala

2014-12-02 Thread Jacky Li (JIRA)
Jacky Li created SPARK-4699:
---

 Summary: Make caseSensitive configurable in Analyzer.scala
 Key: SPARK-4699
 URL: https://issues.apache.org/jira/browse/SPARK-4699
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jacky Li
 Fix For: 1.2.0


Currently, case sensitivity is true by default in Analyzer. It should be 
configurable by setting SQLConf in the client application



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4639) Pass maxIterations in as a parameter in Analyzer

2014-11-27 Thread Jacky Li (JIRA)
Jacky Li created SPARK-4639:
---

 Summary: Pass maxIterations in as a parameter in Analyzer
 Key: SPARK-4639
 URL: https://issues.apache.org/jira/browse/SPARK-4639
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jacky Li
Priority: Minor
 Fix For: 1.3.0


fix a TODO in Analyzer: 
// TODO: pass this in as a parameter
 val fixedPoint = FixedPoint(100)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-11-26 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226562#comment-14226562
 ] 

Jacky Li commented on SPARK-4001:
-

Thanks for your suggestion, Daniel. 
Here is the current status.
1. Currently I have implemented apriori and fp-growth by referring to YAFIM 
(http://pasa-bigdata.nju.edu.cn/people/ronggu/pub/YAFIM_ParLearning.pdf) and 
PFP (http://dl.acm.org/citation.cfm?id=1454027) 
For apriori, currently there are two versions implemented, one using broadcast 
variable and another one using cartisian join of two RDD, I am testing them 
using mushroom and webdoc open dataset (http://fimi.ua.ac.be/data/) to check 
the performance of them before deciding which one to contribute to MLlib.
I have updated the code in the PR (https://github.com/apache/spark/pull/2847), 
you are welcome to check it and try in your use case.
2. For the input part, currently the apriori algo is taking  
{{RDD\[Array\[String\]\]}} as the input dataset, but not containing basket_id 
or user_id. I am not sure whether it can easily fit into your use case. Can you 
give more detail of how you want to use it in collaborative filtering contexts? 


> Add Apriori algorithm to Spark MLlib
> 
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jacky Li
>Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4269) Make wait time in BroadcastHashJoin configurable

2014-11-06 Thread Jacky Li (JIRA)
Jacky Li created SPARK-4269:
---

 Summary: Make wait time in BroadcastHashJoin configurable
 Key: SPARK-4269
 URL: https://issues.apache.org/jira/browse/SPARK-4269
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jacky Li
 Fix For: 1.2.0


In BroadcastHashJoin, currently it is using a hard coded value (5 minutes) to 
wait for the execution and broadcast of the small table. 

In my opinion, it should be a configurable value since broadcast may exceed 5 
minutes in some case, like in a busy/congested network environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-21 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178200#comment-14178200
 ] 

Jacky Li commented on SPARK-4001:
-


I accidentally use wangfei (my college) 's account to send the last comment, 
sorry for the inconvenience. 

Thanks @Sean Owen for explaining! Frequent itemset algorithm works by scanning 
the input data set, there is no probabilistic model in nature. 

To answer @Xiangrui Meng’s earlier questions:
1.  These algorithm is used for finding major patterns / association rules 
in a data set. For a real use case, some analytic applications in telecom 
domain use them to find subscriber behavior from the data set combining service 
record, network traffic record, and demographic data. Please refer to this 
Chinese article for example:  http://www.ss-lw.com/wxxw-361.html
And, sometimes we use frequent itemset algorithm for preparing features input 
to other algorithm which selects feature and do other ML task like training a 
classifier, like this paper: http://dl.acm.org/citation.cfm?id=1401922, 

2.  Since Apriori is a basic algorithm for frequent itemset mining, I am 
not aware of any parallel implementation for it. But I think the algorithm fits 
Spark’s data parallel model since it only need to scan the input data set. And 
for FP-Growth, I do know there is a Parallel FP-Growth from Haoyuan Li: 
http://dl.acm.org/citation.cfm?id=1454027  . I think I probably will refer  to 
this paper to implement FP-Growth in Spark 

3.  The Apriori computation complexity is about O(N*k) where N is the 
number of item in input data and k is the depth of the frequent item tree to 
search. FP-Grwoth complexity is about O(N),  it is more efficient comparing to 
Apriori. For space efficiency, FP-growth is also more efficient than Apriori. 
But in case of smaller data and if frequent itemset is more, Apriori is more 
efficient. This is because FP-Growth need to construct a FP Tree out of the 
input data set, and it needs some time. And another advantage of Apriori is 
that it can output association rules while FP-Growth can not.

Although these two algorithms are basic algo (FP-Growth is more complex), I 
think it will be handy if mllib can include them since there is no frequent 
itemset mining algo in Spark yet, and especially in distributed environment. 
Please suggest how to handle this issue. Thanks a lot. 

> Add Apriori algorithm to Spark MLlib
> 
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jacky Li
>Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-20 Thread Jacky Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177643#comment-14177643
 ] 

Jacky Li commented on SPARK-4001:
-

Maybe there is a misunderstand, I do not mean to use it in database by saying 
"transactional data set". 
Apriori is the classic algorithm for freq-item mining,  and the frequent item 
sets determined by Apriori can be used to determine association rules which 
highlight general trends in the data set. This has applications in domains such 
as market basket analysis for cross-selling and up-selling.
 
I am not sure whether there is a freq-item algorithm in the mllib, so actually 
I am implementing Apriori and FP-Growth (not include in this JIRA thread). Will 
it be useful if contribute to mllib?

> Add Apriori algorithm to Spark MLlib
> 
>
> Key: SPARK-4001
> URL: https://issues.apache.org/jira/browse/SPARK-4001
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Jacky Li
>Assignee: Jacky Li
>
> Apriori is the classic algorithm for frequent item set mining in a 
> transactional data set.  It will be useful if Apriori algorithm is added to 
> MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4001) Add Apriori algorithm to Spark MLlib

2014-10-19 Thread Jacky Li (JIRA)
Jacky Li created SPARK-4001:
---

 Summary: Add Apriori algorithm to Spark MLlib
 Key: SPARK-4001
 URL: https://issues.apache.org/jira/browse/SPARK-4001
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Jacky Li


Apriori is the classic algorithm for frequent item set mining in a 
transactional data set.  It will be useful if Apriori algorithm is added to 
MLLib in Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org