[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14299 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 Ok, thanks. Do you sen any missing task for it to be merged then ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14299 OK, if this happens normally enough that you would accumulate a lot of broadcasts that aren't going out of scope and hogging resources, I can see special-casing this with try-finally. I'm still surprised it's so common so quickly but am OK with it -- worst case it just doesn't hurt consistency much to worry about. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 For the "normal failure case", the one we encounter is that when the vocabulary is too small the embedding learning raises. And it is pretty legitimate in our use cases to have some "too small vocabularies". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14299 I am still not getting why the failure of this method is a normal execution path - when would it fail repeatedly? Yes the destroy should be non-blocking for consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73770/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73770/testReport)** for PR 14299 at commit [`65a238d`](https://github.com/apache/spark/commit/65a238d1cd09c96b41337d3d8f5162fca9faf2f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73770/testReport)** for PR 14299 at commit [`65a238d`](https://github.com/apache/spark/commit/65a238d1cd09c96b41337d3d8f5162fca9faf2f4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 The blocking flag is not the issue. The issue is that there was not try / finally and the destruction was only done if the fitting succeded and skipped otherwise. In this PR I move the destruction into a try / finally construct, all the while extracting the actual fitting into a dedicated method to keep nesting under control. Whether the destruction should be blocking or not is pretty debatable and I do not have any strong mind on the question, I choose to make them blocking "just to be sure" this is actually performed. If you'd rather keep the non blocking best effort current behaviour I'll perform the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14299 I wonder how that happens and why that fixes this though. It would be the same before and after except for the blocking flag. That should be false like all the other calls ... Unless that is somehow part of the issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 Actually we *did* observe memory leak (severe enough to lead to application failure) due to this when doing thousands of Word2Vec learning in one Spark application, a few of them failing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14299 Yes, but they are still cleaned up (eventually). It can only happen one place. Unless the exception path is common I don't know if it's worth changing this everywhere, because for consistency it would really apply everywhere, and we declined to do that before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 I'm fixing this (issue compiling Spark locally delay me). The whole point is that they are *not* destroyed within the method in case of exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14299 Oh, these broadcasts are already destroyed actually, inside the method. This isn't needed then. I kinda thought we had taken care of most or all of these already. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #3588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3588/testReport)** for PR 14299 at commit [`6b8ae85`](https://github.com/apache/spark/commit/6b8ae85dc362ebef0f8d416a8e35970f57130a9f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #3588 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3588/testReport)** for PR 14299 at commit [`6b8ae85`](https://github.com/apache/spark/commit/6b8ae85dc362ebef0f8d416a8e35970f57130a9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73699/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73699/testReport)** for PR 14299 at commit [`6b8ae85`](https://github.com/apache/spark/commit/6b8ae85dc362ebef0f8d416a8e35970f57130a9f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73699/testReport)** for PR 14299 at commit [`6b8ae85`](https://github.com/apache/spark/commit/6b8ae85dc362ebef0f8d416a8e35970f57130a9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73685/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73685/testReport)** for PR 14299 at commit [`9a0675f`](https://github.com/apache/spark/commit/9a0675f4fbf34d2095710628fb5ca3adf98f7f5b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73685/testReport)** for PR 14299 at commit [`9a0675f`](https://github.com/apache/spark/commit/9a0675f4fbf34d2095710628fb5ca3adf98f7f5b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73600/testReport)** for PR 14299 at commit [`e716700`](https://github.com/apache/spark/commit/e716700a5b684d6a985860ea164996471a7341ef). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14299 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73600/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14299 **[Test build #73600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73600/testReport)** for PR 14299 at commit [`e716700`](https://github.com/apache/spark/commit/e716700a5b684d6a985860ea164996471a7341ef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14299 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14299: [SPARK-16440][MLlib] Ensure broadcasted variables are de...
Github user AnthonyTruchet commented on the issue: https://github.com/apache/spark/pull/14299 @vanzin The ticket was filled a long time ago, I updated the PR to make it clearer. Is any manual linking in some other forge needed ? @thunterdb I have updated the review following your suggestion, without using ARM: ready fo review. I'll check with the team working on W2V how to could contribute to make this piece of code "follow better engineering practice" :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org