[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/3633 [SPARK-4759] Avoid using empty string as default preferred location See JIRA for reproduction. Our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly. I tested this on master and 1.1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark coalesce-preferred-loc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3633.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3633 commit 2f7dfb603c000a204831748f1fbaa53ef52531c8 Author: Andrew Or and...@databricks.com Date: 2014-12-08T07:53:15Z Avoid using empty string as default preferred location This is causing the TaskSetManager to try to schedule certain tasks on the host (empty string). The intended semantics here, however, is that the partition does not have preferred location, and the TSM should schedule the corresponding task in accordance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3633#issuecomment-66035505 [Test build #24219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24219/consoleFull) for PR 3633 at commit [`2f7dfb6`](https://github.com/apache/spark/commit/2f7dfb603c000a204831748f1fbaa53ef52531c8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/3634 [SPARK-3154][STREAMING] Replace ConcurrentHashMap with mutable.HashMap and remove @volatile from 'stopped' Since `sequenceNumberToProcessor` and `stopped` are both protected by the lock `sequenceNumberToProcessor`, `ConcurrentHashMap` and `volatile` is unnecessary. So this PR updated them accordingly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-3154 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3634.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3634 commit 0d087ac6ae18ed7766d08dc630aeb12279dbb4e7 Author: zsxwing zsxw...@gmail.com Date: 2014-12-08T08:02:14Z Replace ConcurrentHashMap with mutable.HashMap and remove @volatile from 'stopped' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3634#issuecomment-66036411 [Test build #24220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24220/consoleFull) for PR 3634 at commit [`0d087ac`](https://github.com/apache/spark/commit/0d087ac6ae18ed7766d08dc630aeb12279dbb4e7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3633#issuecomment-66040668 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24219/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3633#issuecomment-66040660 [Test build #24219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24219/consoleFull) for PR 3633 at commit [`2f7dfb6`](https://github.com/apache/spark/commit/2f7dfb603c000a204831748f1fbaa53ef52531c8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add error message when making local dir unsucc...
GitHub user XuTingjun opened a pull request: https://github.com/apache/spark/pull/3635 Add error message when making local dir unsuccessfully You can merge this pull request into a Git repository by running: $ git pull https://github.com/XuTingjun/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3635.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3635 commit 1c51a0c78c8477f4aae83ec18212c773aed57701 Author: meiyoula 1039320...@qq.com Date: 2014-12-08T09:11:09Z Update DiskBlockManager.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add error message when making local dir unsucc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3635#issuecomment-66041481 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3640] [Streaming] [Kinesis] Allow users...
Github user aniketbhatnagar commented on the pull request: https://github.com/apache/spark/pull/3092#issuecomment-66043882 @cfregly, unfortunately, I have been stuck with some other work and haven't been able to test this yet. I will find this week. Sorry for the delay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3634#issuecomment-66043962 [Test build #24220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24220/consoleFull) for PR 3634 at commit [`0d087ac`](https://github.com/apache/spark/commit/0d087ac6ae18ed7766d08dc630aeb12279dbb4e7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3634#issuecomment-66043975 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24220/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66109601 (1) Users implementing their own regularizers OK. I'd prefer to set all the methods private[mllib] for regularizers. (2) Regular and Robust in the same class I understand what dynamic polymorphism is. Unfortunately getNewTheta() methods have different parameters in robust and non-robust classes. What's more significant, user should know instance of which class is returned -- robust or non-robust. Without this knowledge one will have to cast returned parameter (e.g. of type `DocumentParameters` to type `RobustDocumentParametrs` ) in order to access `noise` field. That's why I see no way to provide a user with a single facade class. And thank you for mentioning visibility -- my fault. (3) PLSA and RobustPLSA code duplication Thank you very much for reading the code. (4) Float vs. Double and linear algebra operations OK. I'll use `Array[Array[Float]]` then. But you've mentioned, it'd be nice to extract all the linear algebra code to `mllib/linalg/`. Could you please point at my code implementing linear algebra operations that should be modved to `mllib/linalg/`. BTW I'm not sure if it's possible due to the fact that `mllib/linalg/` relies on `trait Matrix` and my code relies on `Array[Array[Float]]`. (5) You've also said, Enumerator should be private. I definitely can make it private and change a method `TopicModel.infer()` in the way for it to consume `RDD[Seq[String]]` instead of `RDD[Documents]` and call `Enumerator` in the method. But what if one wants consequently to train ten models (in order to choose the best parameters)? Enumeration will be performed 10 times. Isn't it a waste? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3634#issuecomment-66110750 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add error message when making local dir unsucc...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3635#discussion_r21449898 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -67,11 +67,14 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon if (subDir == null) { subDir = subDirs(dirId).synchronized { val old = subDirs(dirId)(subDirId) -if (old != null) { +if (old != null old.exists()) { old } else { val newDir = new File(localDirs(dirId), %02x.format(subDirId)) - newDir.mkdir() + val foundLocalDir = newDir.mkdir() + if (!foundLocalDir) { --- End diff -- Indent has one too many spaces. The message should probably be a warning. It says ignoring this directory but it doesn't seem to be ignored? You changed the semantics of the condition too, to replace a value that was a non-existent dir. That seems reasonable, but this can replace it with a directory that can't be created for some reason. Is this not an exception condition? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66111941 [Test build #24221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24221/consoleFull) for PR 1269 at commit [`24b11a5`](https://github.com/apache/spark/commit/24b11a57bdd18bdeb0409000cb836235227e6d25). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66112025 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24221/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66112020 [Test build #24221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24221/consoleFull) for PR 1269 at commit [`24b11a5`](https://github.com/apache/spark/commit/24b11a57bdd18bdeb0409000cb836235227e6d25). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66112914 [Test build #24222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24222/consoleFull) for PR 1269 at commit [`4a4a4f8`](https://github.com/apache/spark/commit/4a4a4f84da1954f585f2474ab3ee06c5b998c990). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66113672 [Test build #24222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24222/consoleFull) for PR 1269 at commit [`4a4a4f8`](https://github.com/apache/spark/commit/4a4a4f84da1954f585f2474ab3ee06c5b998c990). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66113681 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24222/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66114412 QA tests have started for PR 1269. This patch DID NOT merge cleanly! brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24223/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
GitHub user Lewuathe opened a pull request: https://github.com/apache/spark/pull/3636 [SPARK-3382] GradientDescent convergence tolerance GrandientDescent can receive convergence tolerance value. Default value is 0.0. When loss value becomes less than the tolerance which is set by user, iteration is terminated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Lewuathe/spark gd-convergence-tolerance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3636.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3636 commit 5433f71a3822b0fb16b910f64dc53ede8d539ebe Author: lewuathe lewua...@me.com Date: 2014-12-08T13:19:21Z [SPARK-3382] GradientDescent convergence tolerance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-66115272 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user denmoroz commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-66125004 Maybe it is better to use RDD[BitSet] as transactions RDD? Then you can add a preprocessor trait and make any transformations for source RDD to RDD of BitSets. For example, transformation of RDD[Array[String]] to RDD[BitSet]. It seems to me, that BitSet is the much better idea of transactions representation then Array[String] or Array[Int] or anything else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66126236 QA results for PR 1269:br- This patch FAILED unit tests.brbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24223/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66126247 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24223/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user akopich commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-66127106 @jkbradley, could you please have a look at logs -- a have no idea why PySpark tests failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-66129552 As long as itemset mining is under consideration, has anybody tried a Spark implementation of Logical Itemset Mining: http://cvit.iiit.ac.in/papers/Chandrashekar2012Logical.pdf --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4001][MLlib] adding apriori and fp-grow...
Github user denmoroz commented on the pull request: https://github.com/apache/spark/pull/2847#issuecomment-66130537 Dou you use SON algorithm for Apriori parallel implementation? (http://importantfish.com/limited-pass-algorithms/) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4764] Ensure that files are fetched ato...
Github user preaudc commented on the pull request: https://github.com/apache/spark/pull/2855#issuecomment-66131633 Thanks for the review, @JoshRosen, I've created a new JIRA as requested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-66134213 Its a matter of whats more obvious to the user who doesn't necessarily read the documentation. Adding in clientmode hopefully helps the user realize this config only does something in yarn-client mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4338. [YARN] Ditch yarn-alpha.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3215#issuecomment-66135534 seems like we are pretty close on the rc. I'm good with merging this. @andrewor14 any objections at this point? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3624#issuecomment-66139557 +1. Thanks Sandy! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3624#issuecomment-66140307 @pwendell is it ok to pull this doc change into 1.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66144913 Hi @JoshRosen - can I please get this run through Jenkins? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66145968 [Test build #24224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24224/consoleFull) for PR 3518 at commit [`ef3dd39`](https://github.com/apache/spark/commit/ef3dd39109aca93e899affef8716655aa7669ce0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66145560 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66158875 [Test build #24224 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24224/consoleFull) for PR 3518 at commit [`ef3dd39`](https://github.com/apache/spark/commit/ef3dd39109aca93e899affef8716655aa7669ce0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3694] RDD and Task serialization debugg...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3518#issuecomment-66158890 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24224/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21473307 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1010,7 +1010,10 @@ private[spark] class BlockManager( info.synchronized { // required ? As of now, this will be invoked only for blocks which are ready // But in case this changes in future, adding for consistency sake. -if (!info.waitForReady()) { +if (blockInfo.get(blockId).isEmpty) { + logWarning(sBlock $blockId was already dropped.) + return None +} else if(!info.waitForReady()) { --- End diff -- Minor style nit: this needs a space after the `if` and before the open paren: `if (!info...`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66162338 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66163112 [Test build #24225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24225/consoleFull) for PR 3574 at commit [`55fa4ba`](https://github.com/apache/spark/commit/55fa4ba1e41eb36b1c4f867efbdd35c9b8a4f131). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/3561#issuecomment-66163505 @JoshRosen I'm pretty sure we can definitely support the `hdfs://` URI model. I'll look and see if, given an `hdfs://` URI, Spark would already have some sort of Hadoop `Configuration` object representing the connection made, but, if not, can always make one. Also, can you help me understand why the tests failed? I'm seeing: `[error] (streaming/test:test) sbt.TestsFailedException: Tests unsuccessful` But that isn't really that helpful and, as with all the talk on the dev distro, I'm just wondering if its the patch that fails or if its a timing / sync issue (`./dev/run-tests` finishes without fail on my OSX machine). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/3559#issuecomment-66163851 @JoshRosen Is there anything else needed for this patch to be pushed in? Any feedback / review would be great as well! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add example that reads a local file, writes to...
Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/3347#issuecomment-66167709 @andrewor14 Could you take a second look when you get a chance? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4298][Core] - The spark-submit cannot r...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3561#issuecomment-66168555 Hmm, it looks like there's already a JIRA for that particular test's flakiness: [SPARK-1600](https://issues.apache.org/jira/browse/SPARK-1600). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4483][SQL]Optimization about reduce mem...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3375#discussion_r21477718 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashOuterJoin.scala --- @@ -68,62 +68,59 @@ case class HashOuterJoin( @transient private[this] lazy val DUMMY_LIST = Seq[Row](null) @transient private[this] lazy val EMPTY_LIST = Seq.empty[Row] + @transient private[this] lazy val joinedRow = new JoinedRow() --- End diff -- I believe that it is working now, but my objection is primarily to having mutable state stored inside of the task instead of local to a single execution. If we decide to be more clever about sharing task metadata in the future this could break in a very subtle ways. Also, the cost of accessing a lazy val is almost certainly higher than accessing a local stack variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3409#discussion_r21478911 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -358,6 +358,21 @@ private[spark] trait ClientBase extends Logging { if (libraryPaths.nonEmpty) { prefixEnv = Some(Utils.libraryPathEnvPrefix(libraryPaths)) } +} else { + // Validate and include yarn am specific java options in yarn-client mode. + val amOptsKey = spark.yarn.clientmode.am.extraJavaOptions + val amOpts = sparkConf.getOption(amOptsKey) + amOpts.map { javaOpts = --- End diff -- I'd just simplify this as: sparkConf.getOption(amOptsKey).foreach { opts = // validate // javaOpts += opts } Hint: `map()` is more expensive than `foreach()` in general (because it returns something, unlike foreach). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4764] Ensure that files are fetched ato...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2855#issuecomment-66173949 Thanks for creating the new JIRA. This looks good to me, so I'm going to merge it into `master` and `branch-1.1` for now (I've added a `backport-needed` label to the JIRA so that we remember to merge this into `branch-1.2` after the 1.2.0 vote ends). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-66174011 LGTM aside from minor style issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4764] Ensure that files are fetched ato...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2855 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/3634#issuecomment-66175119 +1. Looks good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/3634#discussion_r21479585 --- Diff: external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala --- @@ -47,8 +47,8 @@ private[flume] class SparkAvroCallbackHandler(val threads: Int, val channel: Cha val transactionExecutorOpt = Option(Executors.newFixedThreadPool(threads, new ThreadFactoryBuilder().setDaemon(true) .setNameFormat(Spark Sink Processor Thread - %d).build())) - private val sequenceNumberToProcessor = -new ConcurrentHashMap[CharSequence, TransactionProcessor]() + // Protected by `sequenceNumberToProcessor` --- End diff -- Could use the `@GuardedBy(sequenceNumberToProcessor)` javax annotation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3154][STREAMING] Replace ConcurrentHash...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/3634#issuecomment-66175687 LGTM too, at your discretion you could replace the comment with the annotation or not. Will merge when addressed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/3637 [SPARK-4789] [mllib] Standardize ML Prediction APIs This is part (1) of the updates from the WIP PR in [https://github.com/apache/spark/pull/3427] Abstract classes for learning algorithms: * Classifier * Regressor * Predictor Traits for learning algorithms * ProbabilisticClassificationModel Concrete classes: learning algorithms * LinearRegression * LogisticRegression (updated to use new abstract classes) Concrete classes: other * LabeledPoint (adding weight to the old LabeledPoint) Other updates: * Modified ParamMap to sort parameters in toString Test Suites: * LabeledPointSuite * LinearRegressionSuite * LogisticRegressionSuite * + Java versions of above suites CC: @mengxr @etrain @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark ml-api-part1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3637.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3637 commit de1e3b4c39b42757e56345a6bab2bdeefaa3ca25 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-11-24T07:18:52Z Added lots of classes for new ML API: Abstract classes for learning algorithms: * Classifier * Regressor * Predictor Traits for learning algorithms * HasDefaultEstimator * IterativeEstimator * IterativeSolver * ProbabilisticClassificationModel * WeakLearner Concrete classes: learning algorithms * AdaBoost (partly implemented) * NaiveBayes (rough implementation) * LinearRegression * LogisticRegression (updated to use new abstract classes) Concrete classes: evaluation * ClassificationEvaluator * RegressionEvaluator * PredictionEvaluator Concrete classes: other * LabeledPoint (adding weight to the old LabeledPoint) commit 6551244b96d8f70f1daacd0415318cf81fd5111a Author: Joseph K. Bradley jos...@databricks.com Date: 2014-11-24T07:30:31Z fixed compilation issues, but have not added tests yet commit 25b643d4b367fea5a3ba1b91564374c2b1b7a0f1 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-01T18:31:41Z removing everything except for simple class hierarchy for classification commit e61e2738dcb2494be25cec2bd798c3e6e5156b73 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-04T21:37:29Z Added LinearRegression and Regressor back from ml-api branch commit 272e62fb41fc8778f3a13f812d4262d9558a772b Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-05T00:11:02Z Modified ParamMap to sort parameters in toString. Cleaned up classes in class hierarchy, before implementing tests and examples. commit cc13d61f2a277b101f7422af240afa64dfb10236 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-05T01:11:22Z Fixed bug from last commit (sorting paramMap by parameter names in toString). Fixed bug in persisting logreg data. Added threshold_internal to logreg for faster test-time prediction (avoiding map lookup). commit 09fb85fb7502a64a661c5f8ae4c941971ff861c8 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-05T18:22:10Z Fixed issue with logreg threshold being set correctly commit a0faf022792524c5a33a20d7cb591a91a7ac160b Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-05T18:43:14Z Updated docs. Added LabeledPointSuite to spark.ml commit 3e961cb6616906940fd646639f818c58d29c04f6 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-05T23:15:48Z * Changed semantics of Predictor.train() to merge the given paramMap with the embedded paramMap. * remove threshold_internal from logreg * Added Predictor.copy() * Extended LogisticRegressionSuite commit 8922966757e7b5d7588613f5dfc11cee267de1b4 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-06T01:32:14Z added train() to Predictor subclasses which does not take a ParamMap. commit 0c45756e3614c027d662d70dfa11d736690dc837 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-06T03:57:12Z * fixed LinearRegression train() to use embedded paramMap * added Predictor.predict(RDD[Vector]) method * updated Linear/LogisticRegressionSuites commit 6be36c16484478bdb9d847fd343d6b7319759b21 Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-06T06:18:30Z Added JavaLabeledPointSuite.java for spark.ml, and added constructor to LabeledPoint which defaults weight to 1.0 commit d8eaf7099a9be6157f90b11f82917ca5b604e1bd Author: Joseph K. Bradley jos...@databricks.com Date: 2014-12-08T19:09:03Z Added methods: * Classifier: batch predictRaw()
[GitHub] spark pull request: [MLLIB] [WIP] [SPARK-3702] Standardizing abstr...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3427#issuecomment-66177125 I just submitted the first part of this PR: [https://github.com/apache/spark/pull/3637/files] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-66177658 [Test build #24226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24226/consoleFull) for PR 3637 at commit [`1e46094`](https://github.com/apache/spark/commit/1e46094fbf2534ff022cb843a811b3fbd7fb9d64). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-66177711 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24226/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66177706 [Test build #24225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24225/consoleFull) for PR 3574 at commit [`55fa4ba`](https://github.com/apache/spark/commit/55fa4ba1e41eb36b1c4f867efbdd35c9b8a4f131). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-66177709 [Test build #24226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24226/consoleFull) for PR 3637 at commit [`1e46094`](https://github.com/apache/spark/commit/1e46094fbf2534ff022cb843a811b3fbd7fb9d64). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class LabeledPoint(label: Double, features: Vector, weight: Double) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66177717 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24225/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21480864 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -27,6 +27,8 @@ import org.apache.spark.rdd.RDD import org.apache.spark.mllib.linalg.{Vectors, Vector} import org.apache.spark.mllib.rdd.RDDFunctions._ +import scala.util.control.Breaks --- End diff -- Please organize imports (Scala/Java, then non-Spark imports, then Spark) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21480867 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -39,6 +41,7 @@ class GradientDescent private[mllib] (private var gradient: Gradient, private va private var numIterations: Int = 100 private var regParam: Double = 0.0 private var miniBatchFraction: Double = 1.0 + private var convergenceTolerance: Double = 0.0 --- End diff -- I feel like the default should be 0.0. Something small like 0.001 (a value pulled from libsvm [https://github.com/cjlin1/libsvm/blob/master/python/svm.py]) might be reasonable. Basically, I think that convergence tolerance is generally a better stopping criterion than numIterations, and having it 0.0 will give it a chance of taking effect before numIterations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21480907 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -182,34 +195,38 @@ object GradientDescent extends Logging { var regVal = updater.compute( weights, Vectors.dense(new Array[Double](weights.size)), 0, 1, regParam)._2 -for (i - 1 to numIterations) { - val bcWeights = data.context.broadcast(weights) - // Sample a subset (fraction miniBatchFraction) of the total data - // compute and sum up the subgradients on this subset (this is one map-reduce) - val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) -.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( - seqOp = (c, v) = { -// c: (grad, loss, count), v: (label, features) -val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) -(c._1, c._2 + l, c._3 + 1) - }, - combOp = (c1, c2) = { -// c: (grad, loss, count) -(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) - }) - - if (miniBatchSize 0) { -/** - * NOTE(Xinghao): lossSum is computed using the weights from the previous iteration - * and regVal is the regularization value computed in the previous iteration as well. - */ -stochasticLossHistory.append(lossSum / miniBatchSize + regVal) -val update = updater.compute( - weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble), stepSize, i, regParam) -weights = update._1 -regVal = update._2 - } else { -logWarning(sIteration ($i/$numIterations). The size of sampled batch is zero) +val b = new Breaks +b.breakable { + for (i - 1 to numIterations) { +val bcWeights = data.context.broadcast(weights) +// Sample a subset (fraction miniBatchFraction) of the total data +// compute and sum up the subgradients on this subset (this is one map-reduce) +val (gradientSum, lossSum, miniBatchSize) = data.sample(false, miniBatchFraction, 42 + i) + .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( +seqOp = (c, v) = { + // c: (grad, loss, count), v: (label, features) + val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) + (c._1, c._2 + l, c._3 + 1) +}, +combOp = (c1, c2) = { + // c: (grad, loss, count) + (c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) +}) + +if (miniBatchSize 0) { + /** + * NOTE(Xinghao): lossSum is computed using the weights from the previous iteration + * and regVal is the regularization value computed in the previous iteration as well. + */ + stochasticLossHistory.append(lossSum / miniBatchSize + regVal) + val update = updater.compute( +weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble), stepSize, i, regParam) + weights = update._1 + regVal = update._2 + if (stochasticLossHistory.last convergenceTolerance) b.break --- End diff -- This is comparing convergenceTolerance with the objective from the last iteration. It should compare with the absolute value of the difference between the objective from the last iteration and the objective from the iteration before that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21480898 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -77,6 +80,14 @@ class GradientDescent private[mllib] (private var gradient: Gradient, private va } /** + * Set the convergence tolerance. Default 0.0 --- End diff -- It would be good to note what convergence tolerance is. In particular, can you please note that it is compared with the change in the objective between consecutive iterations? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/3636#discussion_r21480909 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala --- @@ -219,4 +236,17 @@ object GradientDescent extends Logging { (weights, stochasticLossHistory.toArray) } + + def runMiniBatchSGD( --- End diff -- It is odd to have an API with 2 different argument orders. Can this please be fixed in 1 of these 2 ways: (1) Keep the old argument order, and have convergenceTolerance come after initialWeights. (2) Remove this old method call completely, and update the code base where relevant. I vote for (1) for consistency. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-66178359 @Lewuathe Thanks for the PR! I added some inline comments. One more general comment: When using subsampling (miniBatchFraction 1.0), testing against a convergenceTolerance can be dangerous because of the stochasticity. It can be would be good to add a check at the beginning of optimization to see if miniBatchFraction 1.0 convergenceTolerance 0.0. If that is the case, then we should print a warning. Let me know when I should make another pass over the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-66179960 [Test build #24227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24227/consoleFull) for PR 3637 at commit [`83109eb`](https://github.com/apache/spark/commit/83109ebef2fca4b6d28a83bf405c2edf1e5075db). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
GitHub user mccheah opened a pull request: https://github.com/apache/spark/pull/3638 [SPARK-4737] Task set manager properly handles serialization errors Dealing with [SPARK-4737], the handling of serialization errors should not be the DAGScheduler's responsibility. The task set manager now catches the error and aborts the stage. If the TaskSetManager throws a TaskNotSerializableException, the TaskSchedulerImpl will return an empty list of task descriptions, because no tasks were started. The scheduler should abort the stage gracefully. Note that I'm not too familiar with this part of the codebase and its place in the overall architecture of the Spark stack. If implementing it this way will have any averse side effects please voice that loudly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mccheah/spark task-set-manager-properly-handle-ser-err Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3638 commit 097e7a21e15d3adf45687bd58ff095088f0282f7 Author: mcheah mch...@palantir.com Date: 2014-12-06T01:45:41Z [SPARK-4737] Catching task serialization exception in TaskSetManager Our previous attempt at handling un-serializable tasks involved selectively sampling a task from a task set, and attempting to serialize it. If the serialization was successful, we assumed that all tasks in the task set would also be serializable. Unfortunately, this is not always the case. For example, ParallelCollectionRDD may have both empty and non-empty partitions, and the empty partitions would be serializable while the non-empty partitions actually contain non-serializable objects. This is one of many examples where sampling task serialization breaks. When task serialization exceptions occurred in the TaskSchedulerImpl and TaskSetManager, the result was that the exception was not caught and the entire scheduler would crash. It would restart, but in a bad state. There's no reason why the stage should not be aborted if any serialization error occurs when submitting a task set. If any task in a task set throws an exception upon serialization, the task set manager informs the DAGScheduler that the stage failed, aborts the stage. The TaskSchedulerImpl needs to return a set of task descriptions that were successfully submitted, but the set will be empty in the case of a serialization error. commit bf5e706918d92c761fa537a88bc15ec2c4cc7838 Author: mcheah mch...@palantir.com Date: 2014-12-08T20:39:45Z Fixing indentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66184682 [Test build #24228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24228/consoleFull) for PR 3638 at commit [`bf5e706`](https://github.com/apache/spark/commit/bf5e706918d92c761fa537a88bc15ec2c4cc7838). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66184722 [Test build #24228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24228/consoleFull) for PR 3638 at commit [`bf5e706`](https://github.com/apache/spark/commit/bf5e706918d92c761fa537a88bc15ec2c4cc7838). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TaskNotSerializableException(error: Throwable) extends Exception(error)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66184723 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24228/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66186961 [Test build #24229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24229/consoleFull) for PR 3638 at commit [`5f486f4`](https://github.com/apache/spark/commit/5f486f462233ae63987aa483e6d6eab342feef96). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66187144 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24229/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66187140 [Test build #24229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24229/consoleFull) for PR 3638 at commit [`5f486f4`](https://github.com/apache/spark/commit/5f486f462233ae63987aa483e6d6eab342feef96). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TaskNotSerializableException(error: Throwable) extends Exception(error)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66188975 [Test build #24230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24230/consoleFull) for PR 3638 at commit [`94844d7`](https://github.com/apache/spark/commit/94844d736ed0d8322e2e0dda762961a9170d6a1d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-6615 [Test build #24227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24227/consoleFull) for PR 3637 at commit [`83109eb`](https://github.com/apache/spark/commit/83109ebef2fca4b6d28a83bf405c2edf1e5075db). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class LabeledPoint(label: Double, features: Vector, weight: Double) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-66188897 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24227/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3996: Shade Jetty in Spark deliverables.
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3130#issuecomment-66189849 Wanted to follow up on this - the priority of getting this done was just increased for us. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3633#issuecomment-66192578 [Test build #24231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24231/consoleFull) for PR 3633 at commit [`f370a4e`](https://github.com/apache/spark/commit/f370a4e710b1ff29a5749944a1557de233223dc6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2309][MLlib] Generalize the binary logi...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/1379#issuecomment-66192930 @avulanov I did couple performance turning in the MLOR gradient calculation in my company's proprietary implementation which results 4x faster than the open source one in github you tested. I'm trying to make it open source and merge into spark soon. (ps, simple polynomial expansion with MLOR can increase the mnist8m accuracy from 86% to 94% in my experiment. See Prof. CJ Lin's talk - https://www.youtube.com/watch?v=GCIJP0cLSmU ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3486#discussion_r21489595 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -50,10 +50,16 @@ private[spark] class CoarseGrainedExecutorBackend( override def preStart() { logInfo(Connecting to driver: + driverUrl) driver = context.actorSelection(driverUrl) -driver ! RegisterExecutor(executorId, hostPort, cores) +driver ! RegisterExecutor(executorId, hostPort, cores, extractLogUrls) context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent]) } + def extractLogUrls : Map[String, String] = { +val prefix = SPARK_LOG_URL_ --- End diff -- On a related note, I added proper command line parsing to CoarseGrainedExecutorBackend over in #3233, which could be a nicer alternative to env variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3486#discussion_r21490039 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -183,6 +193,16 @@ trait SparkListener { * Called when the driver receives task metrics from an executor in a heartbeat. */ def onExecutorMetricsUpdate(executorMetricsUpdate: SparkListenerExecutorMetricsUpdate) { } + + /** + * Called when the driver registers a new executor. + */ + def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) { } --- End diff -- Hmmm. This is going to be one of those cases where it breaks existing code that extends this class. Not sure if there's a good workaround (even though it is marked as `@DeveloperApi`). :-/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3486#discussion_r21490459 --- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala --- @@ -79,6 +80,7 @@ private[ui] class ExecutorsPage( Shuffle Write /span /th + th class=sorttable_nosortLogs/th --- End diff -- Should this be conditioned on whether logs actually exist? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Cdh5
GitHub user orenmazor opened a pull request: https://github.com/apache/spark/pull/3639 Cdh5 https://github.com/Shopify/dataops/issues/2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/Shopify/spark cdh5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3639 commit 422de4cc2a823e16b86fd22095e35d1ebe842a12 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-15T01:29:43Z Add compile script for packserv commit 4ffa04cc6cc7bb8086a422a94d4f2e4105a69786 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-15T02:18:29Z Don't compile streaming when assembling cause it doesn't build against CDH4.4.0 commit b7bf08171e8eb796d86408ce5712175d781e0f8d Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-15T02:22:10Z Make script compile executable commit 65033e665c75f4e82b56c8113c99308f8b419704 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-15T02:22:57Z Make script compile bash commit 9e6fc96f461864f4ffdd6c8aefaa53b6fd8c4ae0 Author: Mark Cooper mcoo...@quantcast.com Date: 2013-11-20T22:26:42Z Add a environment variable that allows for configuring a different path to Spark binaries when running Spark from a different location locally commit fdb0ce298048832f75b24b464fdf59fb791f869f Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-15T19:24:05Z Add fixed conf file with proper master and remote spark home commit a837356d7d84641ab504522e74cedc4b5d865aa3 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T01:50:22Z Copy in hadoop core-site.xml so local clones know where to find hdfs. commit 4d0f3682e0931c21ba6e5b01fc42ee33a44453e1 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T02:12:36Z Update the given spark env to actually work, and only if a custom master isn't provided. commit 01cf4c51f2c3c3089ee91dd64d6cab32dd17aa70 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T04:22:11Z Allow controlling the number of cores pyspark uses using the `-c` option, like spark-shell. - Turns out there isn't actually a way right now to control the number of cores an interactive pyspark session uses, which is annoying if more than one person is trying to work on a cluster interactively at once. - Use the python 2.7 stdlib argparse library to pull out the -c option - This requires changing the bin/pyspark shell script to pass all arguments to the python script instead of allowing the python interpreter program to parse any of them. commit 91ddfb4c43a88a4cf0082e445e2e82bcde069969 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T04:22:34Z Merge branch 'pyspark_cores' commit 0b44511492131b60f744527eee467fd147e4f4c0 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T14:51:05Z Revert Merge branch 'pyspark_cores' This reverts commit 91ddfb4c43a88a4cf0082e445e2e82bcde069969, reversing changes made to 4d0f3682e0931c21ba6e5b01fc42ee33a44453e1. commit b4c5ff7e7d6d550743e3aa97710fa514744b0c6e Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T15:22:09Z Auto setup python and warn if the vpn isn't connected commit 4c2c45eaf14197b79cf5949bb370a74c52a38ff0 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T15:34:05Z Add an applescript to the spark conf file that autoconnects the VPN if it can't find the interface the VPN should create commit 712b8856e4b14f88d34da569505c59884d8e8155 Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T19:24:21Z Check to see if Viscosity is a thing before trying to tell it to connect in spark env setup commit 986a60c0b9a880ee0fd7242e53458efbebfab73e Author: Harry Brundage harry.brund...@gmail.com Date: 2014-01-21T21:21:18Z Merge pull request #1 from Shopify/autoconnect_vpn Autoconnect VPN commit b944fc6ff5f5866b93e937f7d7629370c24944f0 Author: Dana Klassen klassen.d...@gmail.com Date: 2014-01-23T01:36:53Z change configuration to be set through environment variable commit 5eace91604360da5b446a96582f141b09ab109c1 Author: Erik Selin erik.se...@jadedpixel.com Date: 2014-01-23T04:14:17Z apply pr 494 and 496 commit 42dc1708daec21a3ba302f61f473afa57fb5c12c Author: Dana Klassen klassen.d...@gmail.com Date: 2014-01-23T12:15:27Z Merge pull request #2 from Shopify/config_hdfs Config hdfs commit 2b6c170b50f58ccdbe1e2faaf4ff3439bdf9e01e Author: Erik Selin tyr...@gmail.com Date: 2014-01-23T15:28:32Z Merge pull request #3 from Shopify/apply_494_and_496 apply pr 494 and 496 commit 25c5a0d90c5926133b32e43f5e6a8d1a58c0685c Author: Patrick Wendell pwend...@gmail.com
[GitHub] spark pull request: Cdh5
Github user orenmazor closed the pull request at: https://github.com/apache/spark/pull/3639 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3486#discussion_r21490919 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -183,6 +193,16 @@ trait SparkListener { * Called when the driver receives task metrics from an executor in a heartbeat. */ def onExecutorMetricsUpdate(executorMetricsUpdate: SparkListenerExecutorMetricsUpdate) { } + + /** + * Called when the driver registers a new executor. + */ + def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) { } --- End diff -- BTW doesn't this break the build? There are a few listeners in Spark code itself (e.g. `EventLoggingListener`) which should have broken because of this. (BTW fixing that listener means you'll probably need to touch `JsonProtocol` to serialize these new events to the event log... and you'll need to be careful not to keep the log URLs in the replayed UIs since they'll most probably be broken links at that point. Meaning that probably the UI listener should nuke the log URLs when the executor removed message is handled.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] SPARK-2450 Adds exeuctor log links to We...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3486#discussion_r21491005 --- Diff: core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala --- @@ -183,6 +193,16 @@ trait SparkListener { * Called when the driver receives task metrics from an executor in a heartbeat. */ def onExecutorMetricsUpdate(executorMetricsUpdate: SparkListenerExecutorMetricsUpdate) { } + + /** + * Called when the driver registers a new executor. + */ + def onExecutorAdded(executorAdded: SparkListenerExecutorAdded) { } --- End diff -- Ah wait. I see. These methods have default implementations, so they'll only affect people extending `SparkListener` from Java. Still, we should probably save these events to the log for replay later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3574#discussion_r21491375 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1010,7 +1010,10 @@ private[spark] class BlockManager( info.synchronized { // required ? As of now, this will be invoked only for blocks which are ready --- End diff -- This comment actually refers to the `!info.waitForReady()` case, so I'd like to either move the comment or swap the order of these checks so that we check for `blockInfo.get(blockId).isEmpty` in the `else if` clause instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4714][CORE]: Add checking info is null ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3574#issuecomment-66199309 Left one minor code organization comment; aside from that, this looks good to me and should be ready to merge after you fix that up (I can do it if you don't have time, though; just let me know). There are a couple of edits that I'd like to make to the commit title / description before merging this, but I can do it myself on merge. Thanks for the careful analysis and for catching this issue! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66201371 [Test build #24230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24230/consoleFull) for PR 3638 at commit [`94844d7`](https://github.com/apache/spark/commit/94844d736ed0d8322e2e0dda762961a9170d6a1d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TaskNotSerializableException(error: Throwable) extends Exception(error)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4737] Task set manager properly handles...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3638#issuecomment-66201380 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24230/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-66202544 [Test build #24232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24232/consoleFull) for PR 3409 at commit [`e3f9abe`](https://github.com/apache/spark/commit/e3f9abeaa82018835cd9a7055adba0dabc451a24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3637#issuecomment-66203211 The test failure reveals an issue in Spark SQL (ScalaReflection.scala:121 in schemaFor) where it gets confused if the case class includes multiple constructors. The default behavior should probably be to take the constructor with the most arguments, but I'll consult others about this. This PR may be on temporary hold... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3382] GradientDescent convergence toler...
Github user Lewuathe commented on the pull request: https://github.com/apache/spark/pull/3636#issuecomment-66203442 @jkbradley Thank you for reviewing. I'll update these points soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953][YARN]yarn client mode Application...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21493950 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -54,8 +46,25 @@ private[spark] class ClientArguments(args: Array[String], sparkConf: SparkConf) loadEnvironmentArgs() validateArgs() + // Additional memory to allocate to containers + // For now, use driver's memory overhead as our AM container's memory overhead --- End diff -- This comment is no longer true --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4770. [DOC] [YARN] spark.scheduler.minRe...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3624#issuecomment-66204351 @tgravescs It should be fine to pull docs-only changes into `branch-1.2`. We're trying to hold off on merging code changes that aren't addressing 1.2.0 release blockers because we don't want to risk introducing new regressions and having to call new votes. If you do want to merge a code change that should eventually be backported into `branch-1.2`, just merge it into the other branches, leave its JIRA open with 1.2.1 listed in Target Version/s and not Fix Version/s, then add the `backport-needed` label to the issue so that we remember to come back to it after 1.2.0 is released. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3633#issuecomment-66204818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24231/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4759] Avoid using empty string as defau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3633#issuecomment-66204811 [Test build #24231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24231/consoleFull) for PR 3633 at commit [`f370a4e`](https://github.com/apache/spark/commit/f370a4e710b1ff29a5749944a1557de233223dc6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4789] [mllib] Standardize ML Prediction...
Github user Lewuathe commented on a diff in the pull request: https://github.com/apache/spark/pull/3637#discussion_r21494740 --- Diff: mllib/src/main/scala/org/apache/spark/ml/LabeledPoint.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml + +import scala.beans.BeanInfo + +import org.apache.spark.annotation.AlphaComponent +import org.apache.spark.mllib.linalg.Vector + +/** + * :: AlphaComponent :: + * Class that represents an instance (data point) for prediction tasks. + * + * @param label Label to predict + * @param features List of features describing this instance + * @param weight Instance weight + */ +@AlphaComponent +@BeanInfo +case class LabeledPoint(label: Double, features: Vector, weight: Double) { --- End diff -- Why is a label of `LabeledPoint` assumed as only `Double`? I think there are some cases where label is not `Double` such as one-of-k encoding. It seems better not to restrict to `Double` type. If I missed some alternatives, sorry for that and please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org