[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17575 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17575 **[Test build #3661 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3661/testReport)** for PR 17575 at commit [`d799d46`](https://github.com/apache/spark/commit/d799d460e215c017b4385e8ecbbca8b92128096a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17575 **[Test build #3661 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3661/testReport)** for PR 17575 at commit [`d799d46`](https://github.com/apache/spark/commit/d799d460e215c017b4385e8ecbbca8b92128096a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17575 **[Test build #3658 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3658/testReport)** for PR 17575 at commit [`627bfe0`](https://github.com/apache/spark/commit/627bfe0b04180a9c7248df6a3af519f15261faa6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17575 **[Test build #3658 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3658/testReport)** for PR 17575 at commit [`627bfe0`](https://github.com/apache/spark/commit/627bfe0b04180a9c7248df6a3af519f15261faa6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17575 **[Test build #3651 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3651/testReport)** for PR 17575 at commit [`8e5db6a`](https://github.com/apache/spark/commit/8e5db6af95545121d379dbca83bedc23cbd5e6c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user Syrux commented on the issue: https://github.com/apache/spark/pull/17575 Yo Sean, I already pushed the requested changes in case it's the correct place to do so. (I can just revert them, if not) I added two new methods to allow tests. First a method which finds all frequent items in a database, second a method that actually clean the database using those frequent items. Although I didn't end up using the first method, the pre-processing method is now much clearer to understand. So I left the new method. Just tell me if I need to put that piece of code back. I also added tests for multiple types of sequence database. More specifically, when there is max one item per itemset, when there can be multiple items per itemsets, and when cleaning the database empties it. They should cover all cases together. Of course, the new implementation passes the tests perfectly, and the old one doesn't. Every other thing remained as is. Tell me if the way I did it was ok. I hope it's up to standards :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user Syrux commented on the issue: https://github.com/apache/spark/pull/17575 Ok, should I create a new Jira and push there the additionnal tests ? Or is here completly fine, since it's related to the current change Tell me, and I will get the change done asap :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17575 Even a simplistic test of this case would give a lot more confidence that it's correct. If it means opening up a `private[spark]` method or two to make testing possible that seems reasonable. I don' think it needs significant change. Something needs to exercise this code path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user Syrux commented on the issue: https://github.com/apache/spark/pull/17575 Yes exactly, the current implementation adds too much unnecessary delimiters. We this one line change, delimiter are only placed where needed. Currently there are no tests to verify if the algorithm cleans the sequences correctly. I only found that inneficiency by printing stuff around while I implemented other things on my local github. If you want, I can add some tests, but that will necessitate a small refector to separate the cleaning part in it's own method. Calling the current method would directly call the main algorithm ... ^^' Two of the existing tests did cover cases where sequence of zero where left. However not at pertinent places (Integer/String type, variable-size itemsets clean a five at the end of the third sequence, leaving 2 zero instead of one). I can however vouch that the previous code worked just fine. Both the results of the old implementation and this one are the same. They also correspond to the results I obtained for another standalone CP based implementation. It's just that this code makes the pre-processing more efficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17575: [SPARK-20265][MLlib] Improve Prefix'span pre-processing ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17575 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org