[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16775 I think we can solve this issue by tackling the codes in SQL. So close it for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16775 @viirya I believe this PR meshes with the refactoring and application to pregel GraphX algorithms in #15125. Basically, it moves the periodic checkpointing code from mllib into core and uses it in

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16775 ping @mengxr @jkbradley @liancheng @MLnick May you take a look at this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16775 For the issue reported on mailing list, I found the root cause makes significant difference between 1.6 and current branch. The fix is at #16785. However, I think this patch is still useful.

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16775 `StringIndexer` and `OneHotEncoder` are just used as example here. The concept is to have a pipeline with enough long stages. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread DavidArenburg
Github user DavidArenburg commented on the issue: https://github.com/apache/spark/pull/16775 Wouldn't it better to Vectorize `StringIndexer` and `OneHotEncoder`? Like for instance `.na.fill` or `.na.replace` operate over the whole data set at once instead of running it in a loop? I

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72277/ Test PASSed. ---

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16775 **[Test build #72277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72277/testReport)** for PR 16775 at commit

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16775 **[Test build #72277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72277/testReport)** for PR 16775 at commit

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16775 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72275/ Test FAILed. ---

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72276/ Test FAILed. ---

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16775 **[Test build #72276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72276/testReport)** for PR 16775 at commit

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16775 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72274/ Test PASSed. ---

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16775 **[Test build #72275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72275/testReport)** for PR 16775 at commit

[GitHub] spark issue #16775: [SPARK-19433][ML] Periodic checkout datasets for long ml...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16775 **[Test build #72274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72274/testReport)** for PR 16775 at commit