[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-13 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17575
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17575
  
**[Test build #3661 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3661/testReport)**
 for PR 17575 at commit 
[`d799d46`](https://github.com/apache/spark/commit/d799d460e215c017b4385e8ecbbca8b92128096a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17575
  
**[Test build #3661 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3661/testReport)**
 for PR 17575 at commit 
[`d799d46`](https://github.com/apache/spark/commit/d799d460e215c017b4385e8ecbbca8b92128096a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17575
  
**[Test build #3658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3658/testReport)**
 for PR 17575 at commit 
[`627bfe0`](https://github.com/apache/spark/commit/627bfe0b04180a9c7248df6a3af519f15261faa6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17575
  
**[Test build #3658 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3658/testReport)**
 for PR 17575 at commit 
[`627bfe0`](https://github.com/apache/spark/commit/627bfe0b04180a9c7248df6a3af519f15261faa6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17575
  
**[Test build #3651 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3651/testReport)**
 for PR 17575 at commit 
[`8e5db6a`](https://github.com/apache/spark/commit/8e5db6af95545121d379dbca83bedc23cbd5e6c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread Syrux
Github user Syrux commented on the issue:

https://github.com/apache/spark/pull/17575
  
Yo Sean, I already pushed the requested changes in case it's the correct 
place to do so.
(I can just revert them, if not)

I added two new methods to allow tests. First a method which finds all 
frequent items in a database, second a method that actually clean the database 
using those frequent items. Although I didn't end up using the first method, 
the pre-processing method is now much clearer to understand. So I left the new 
method. Just tell me if I need to put that piece of code back.

I also added tests for multiple types of sequence database. More 
specifically, when there is max one item per itemset, when there can be 
multiple items per itemsets, and when cleaning the database empties it. They 
should cover all cases together.

Of course, the new implementation passes the tests perfectly, and the old 
one doesn't.
Every other thing remained as is.

Tell me if the way I did it was ok. I hope it's up to standards :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread Syrux
Github user Syrux commented on the issue:

https://github.com/apache/spark/pull/17575
  
Ok, should I create a new Jira and push there the additionnal tests ?
Or is here completly fine, since it's related to the current change

Tell me, and I will get the change done asap :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17575
  
Even a simplistic test of this case would give a lot more confidence that 
it's correct. If it means opening up a `private[spark]` method or two to make 
testing possible that seems reasonable. I don' think it needs significant 
change. Something needs to exercise this code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread Syrux
Github user Syrux commented on the issue:

https://github.com/apache/spark/pull/17575
  
Yes exactly, the current implementation adds too much unnecessary 
delimiters. We this one line change, delimiter are only placed where needed. 

Currently there are no tests to verify if the algorithm cleans the 
sequences correctly. I only found that inneficiency by printing stuff around 
while I implemented other things on my local github. 

If you want, I can add some tests, but that will necessitate a small 
refector to separate the cleaning part in it's own method. Calling the current 
method would directly call the main algorithm ... ^^'

Two of the existing tests did cover cases where sequence of zero where 
left. However not at pertinent places (Integer/String type, variable-size 
itemsets clean a five at the end of the third sequence, leaving 2 zero instead 
of one). 

I can however vouch that the previous code worked just fine. Both the 
results of the old implementation and this one are the same. They also 
correspond to the results I obtained for another standalone CP based 
implementation. It's just that this code makes the pre-processing more 
efficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...

2017-04-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17575
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org