[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user Shiti commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59886741 The reason for this issue is that the Maven build definition and plugin configuration of yarn-alpha and yarn-stable is the same as that for yarn common. So, the `SettingKey scalaSource` is set to that of the parent for the child projects and since `scalaSource` is a `SettingKey[File]`, we cannot add multiple Scala sources for the same project. The scalaStyle plugin depends on this `SettingKey` to determine Scala files. Modifying the settings for yarn specific projects in the Scala Build file when required is a better approach to fix this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59886278 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21976/consoleFull) for PR 2087 at commit [`1ab662d`](https://github.com/apache/spark/commit/1ab662d8ae674407bfe0f8bbc14aedf1da60c030). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59885707 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59885695 Cool, updated patch addresses comments. It look like the failure is caused by a failure to fetch from git. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...
Github user yu-iskw closed the pull request at: https://github.com/apache/spark/pull/1964 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3012] Standardized Distance Functions b...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-59885635 Because this patch is not fit for the Spark design concept, I close this PR without merging. (http://apache-spark-developers-list.1001551.n3.nabble.com/Standardized-Distance-Functions-in-MLlib-td8697.html) Thank you very much for your cooperation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2850#issuecomment-59885505 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21975/consoleFull) for PR 2850 at commit [`bb0508f`](https://github.com/apache/spark/commit/bb0508f1382186c20ddb80b6032f3fce5c6cf6aa). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2850#issuecomment-59885063 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2872#issuecomment-59884935 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2844#issuecomment-59884782 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21974/consoleFull) for PR 2844 at commit [`1e8268d`](https://github.com/apache/spark/commit/1e8268d6111e4ad45e2acfe47d837718f2170461). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59884738 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21966/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3405] add subnet-id and vpc-id options ...
GitHub user mvj101 opened a pull request: https://github.com/apache/spark/pull/2872 [SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py Based on this gist: https://gist.github.com/amar-analytx/0b62543621e1f246c0a2 We use security group ids instead of security group to get around this issue: https://github.com/boto/boto/issues/350 You can merge this pull request into a Git repository by running: $ git pull https://github.com/mvj101/spark SPARK-3405 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2872.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2872 commit 52aaeec7b03251f3fcb4d1cf892df7c592e03408 Author: Mike Jennings Date: 2014-10-21T06:05:09Z [SPARK-3405] add subnet-id and vpc-id options to spark_ec2.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2850#issuecomment-59884695 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21968/consoleFull) for PR 2850 at commit [`bb0508f`](https://github.com/apache/spark/commit/bb0508f1382186c20ddb80b6032f3fce5c6cf6aa). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59884734 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21966/consoleFull) for PR 2868 at commit [`13585e8`](https://github.com/apache/spark/commit/13585e8738e35743c6c0ab482d34552f01939bd4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], converter: S => T)` * ` class SerializableMapWrapper[A, B](underlying: collection.Map[A, B])` * ` case class ReconnectWorker(masterUrl: String) extends DeployMessage` * `class Predict(` * `case class EvaluatePython(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2850#issuecomment-59884702 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21968/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59884283 @marmbrus in #2499, i reproduce the golden answer and changed some *.ql because of 0.13 changes, the tests passed in my local machine. @zhzhan not get you, why to replace the query play? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59884328 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21965/consoleFull) for PR 2868 at commit [`6b05af0`](https://github.com/apache/spark/commit/6b05af042656b192e7b14954a433a75468df1d1c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59884334 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21965/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19131009 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -227,6 +217,7 @@ private object TorrentBroadcast extends Logging { * If removeFromDriver is true, also remove these persisted blocks on the driver. */ def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = { +logInfo(s"Unpersisting TorrentBroadcast $id") --- End diff -- I'll try to get #2851 merged this week; I'm in the middle of some significant UI code cleanup and I'm planning to merge most of the existing UI patches or to re-implement them myself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19130957 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -227,6 +217,7 @@ private object TorrentBroadcast extends Logging { * If removeFromDriver is true, also remove these persisted blocks on the driver. */ def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = { +logInfo(s"Unpersisting TorrentBroadcast $id") --- End diff -- Its mostly for debugging what broadcasts have been removed and what has not. It can be probably be made debug once we have a UI for this (#2851), but right now this is the only way to figure out if a broadcast variable has been removed by looking at the driver logs. Also its just one line per broadcast variable (we have 2-3 lines per variable when it is created) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883998 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21964/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883993 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21964/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user Ishiihara commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883834 @JoshRosen I have been looking into the compressed bitmap and already get a good idea of how to use roaring bitmap to perform the task. If this work is not urgent, can you give me one day or two to get the compressed bitmap part completed? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59883237 @rxin that's a fair solution, too, although the bitmap needs to be losslessly compressed. I could imagine cases where data is already partitioned but a user performs partition-preserving operations without specifying `preservesPartitioning`, then does a filtering operation that would otherwise benefit from partitioning. In these cases, you might have this extreme bimodal distribution where most blocks are zero but the remaining blocks might be big. In these cases, do you care about the exact sizes of those blocks? Probably not in most cases, since there will be few blocks. I'll look into folding this into the compressed version as you've suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59883187 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21963/consoleFull) for PR 2520 at commit [`c5b2a33`](https://github.com/apache/spark/commit/c5b2a3399d5c57ea0b5e0d15dabf7ee28d1ffaa5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59883190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21963/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19130611 --- Diff: core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala --- @@ -84,6 +89,24 @@ class BroadcastSuite extends FunSuite with LocalSparkContext { assert(results.collect().toSet === (1 to numSlaves).map(x => (x, 10)).toSet) } + test("TorrentBroadcast's blockifyObject and unblockifyObject are inverses") { +import org.apache.spark.broadcast.TorrentBroadcast._ +val blockSize = 1024 +val conf = new SparkConf() +val compressionCodec = Some(new SnappyCompressionCodec(conf)) +val serializer = new JavaSerializer(conf) +val objects = for (size <- Gen.choose(1, 1024 * 10)) yield { --- End diff -- as discussed offline, maybe just use a random number generator here since Gen brings extra complexity but not much benefit in this specific case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59883025 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21973/consoleFull) for PR 2520 at commit [`c5b2a33`](https://github.com/apache/spark/commit/c5b2a3399d5c57ea0b5e0d15dabf7ee28d1ffaa5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2870#discussion_r19130591 --- Diff: python/pyspark/mllib/tests.py --- @@ -202,6 +204,16 @@ def test_regression(self): self.assertTrue(dt_model.predict(features[3]) > 0) +class StatTests(PySparkTestCase): +# SPARK-4023 +def test_col_with_random_rdd(self): +data = RandomRDDs.normalVectorRDD(self.sc, 1000, 10, 10) +summary = Statistics.colStats(data) +self.assertEqual(1000, summary.count()) +mean = summary.mean() +self.assertTrue(all(abs(v) < 0.1 for v in mean)) --- End diff -- This is a non-deterministic test. For SPARK-4023, we only need to test `colStats` and other methods for RDDs of numpy arrays and python arrays. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3958] TorrentBroadcast cleanup / debugg...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2844#discussion_r19130571 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -227,6 +217,7 @@ private object TorrentBroadcast extends Logging { * If removeFromDriver is true, also remove these persisted blocks on the driver. */ def unpersist(id: Long, removeFromDriver: Boolean, blocking: Boolean) = { +logInfo(s"Unpersisting TorrentBroadcast $id") --- End diff -- I don't feel super strongly over this one, but I feel given this is for "debugging" of exceptional cases, it should be in debug. If your worry is that the broadcast cleaner might clean up stuff prematurely, then I think we should log in the cleaner instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59882835 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1658#issuecomment-59882709 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21972/consoleFull) for PR 1658 at commit [`8ac288b`](https://github.com/apache/spark/commit/8ac288bc09e779f1b4c96dcb497ee4eca962439f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59882564 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21969/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2870#issuecomment-59882352 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21961/consoleFull) for PR 2870 at commit [`0871576`](https://github.com/apache/spark/commit/087157620a85c14534ac76f44ff079df6151ea5b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2870#issuecomment-59882356 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21961/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4031] Make torrent broadcast read...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-59882084 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21970/consoleFull) for PR 2871 at commit [`8792ed8`](https://github.com/apache/spark/commit/8792ed8399f9d1501bf4a38694531a8440d65448). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59882089 Actually instead of introducing a new one, what if we introduce a compressed bitmap that tracks zero-sized blocks, and then use avg size to track only non-zero blocks? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-59882086 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21971/consoleFull) for PR 2087 at commit [`1ab662d`](https://github.com/apache/spark/commit/1ab662d8ae674407bfe0f8bbc14aedf1da60c030). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59882030 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21960/consoleFull) for PR 2743 at commit [`c10229e`](https://github.com/apache/spark/commit/c10229e8a4eaa6944ea7c432437cdfafdb702ef5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59882025 Oh wow. Thanks for fixing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59882034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21960/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4031] Make torrent broadcast read...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-59881872 @JoshRosen -- yes, that should be fine. I will rebase once #2844 is checked in --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4031] Make torrent broadcast read...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-59881787 This seems likely to merge-conflict with my PR #2844, so I'd like to merge that one first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-4031] Make torrent broadcast read...
GitHub user shivaram opened a pull request: https://github.com/apache/spark/pull/2871 [WIP] [SPARK-4031] Make torrent broadcast read blocks on use. This avoids reading broadcast variables when they are referenced in the closure but not used by the code. Note: This is a WIP and a request for comments. I will update HttpBroadcast and add some tests if it sounds good. cc @rxin @JoshRosen for review You can merge this pull request into a Git repository by running: $ git pull https://github.com/shivaram/spark-1 broadcast-read-value Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2871.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2871 commit 8792ed8399f9d1501bf4a38694531a8440d65448 Author: Shivaram Venkataraman Date: 2014-10-21T05:35:03Z Make torrent broadcast read blocks on use. This avoids reading broadcast variables when they are referenced in the closure but not used by the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59881469 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21962/consoleFull) for PR 2520 at commit [`c5b2a33`](https://github.com/apache/spark/commit/c5b2a3399d5c57ea0b5e0d15dabf7ee28d1ffaa5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59881474 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21962/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4003] [SQL] add 3 types for java SQL co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2850#issuecomment-59881463 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21968/consoleFull) for PR 2850 at commit [`bb0508f`](https://github.com/apache/spark/commit/bb0508f1382186c20ddb80b6032f3fce5c6cf6aa). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59881210 @scwf Did you also replace the query plan for hive0.13 in your another PR? because I also saw some query plan changes in hive0.13. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3569][SQL] Add metadata field to Struct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2701#issuecomment-59881165 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21967/consoleFull) for PR 2701 at commit [`611d3c2`](https://github.com/apache/spark/commit/611d3c20cf4aed9927b596d89b9ac96b2cbbcdec). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59880636 @marmbrus I think he refers to https://github.com/apache/spark/pull/2499 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59880350 @scwf The golden answer is different in hive12 and hive13. We need some extra shim layer to handle that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129853 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. --- End diff -- `returned` -> `used as precision`. We don't `return` zero. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129857 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val n = math.min(pred.length, k) + var i = 0 + var cnt = 0 + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 +} +i += 1 + } + if (labSet.size == 0) { --- End diff -- If `labSet` is empty, the `while` loop is wasted. ~~~ if (labelSet.nonEmpty) { val n = math.min(...) ... cnt.toDouble / k } else { logWarning("...") 0.0 } ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129865 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val n = math.min(pred.length, k) + var i = 0 + var cnt = 0 + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 +} +i += 1 + } + if (labSet.size == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +cnt.toDouble / k + } +}.mean + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set, the average precision will be zero and a log + * warining is generated. + */ + lazy val meanAveragePrecision: Double = { +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val labSetSize = labSet.size + var i = 0 + var cnt = 0 + var precSum = 0.0 + val n = pred.length + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 + precSum += cnt.toDouble / (i + 1) +} +i += 1 + } + if (labSetSize == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +precSum / labSet.size + } +}.mean + } + + /** + * Compute the average NDCG value of all the queries, truncated at ranking position k. + * The discounted cumulative gain at position k is computed as: + *\sum_{i=1}^k (2^{relevance of ith item} - 1) / log(i + 1), + * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current + * implementation, the relevance value is binary. + * + * If for a query, the ranking algorithm returns n (n < k) results, the NDCG value at position n + * will be used. If the ground truth set contains n (n < k) results, the first n items will be + * used to compute the DCG value on the ground truth set. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. --- End diff -- ditto:
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129866 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val n = math.min(pred.length, k) + var i = 0 + var cnt = 0 + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 +} +i += 1 + } + if (labSet.size == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +cnt.toDouble / k + } +}.mean + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set, the average precision will be zero and a log + * warining is generated. + */ + lazy val meanAveragePrecision: Double = { +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val labSetSize = labSet.size + var i = 0 + var cnt = 0 + var precSum = 0.0 + val n = pred.length + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 + precSum += cnt.toDouble / (i + 1) +} +i += 1 + } + if (labSetSize == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +precSum / labSet.size + } +}.mean + } + + /** + * Compute the average NDCG value of all the queries, truncated at ranking position k. + * The discounted cumulative gain at position k is computed as: + *\sum_{i=1}^k (2^{relevance of ith item} - 1) / log(i + 1), + * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current + * implementation, the relevance value is binary. + * + * If for a query, the ranking algorithm returns n (n < k) results, the NDCG value at position n + * will be used. If the ground truth set contains n (n < k) results, the first n items will be + * used to compute the DCG value on the ground truth set. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the followi
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129859 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val n = math.min(pred.length, k) + var i = 0 + var cnt = 0 + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 +} +i += 1 + } + if (labSet.size == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +cnt.toDouble / k + } +}.mean + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set, the average precision will be zero and a log + * warining is generated. + */ + lazy val meanAveragePrecision: Double = { +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val labSetSize = labSet.size + var i = 0 + var cnt = 0 + var precSum = 0.0 + val n = pred.length + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 + precSum += cnt.toDouble / (i + 1) +} +i += 1 + } + if (labSetSize == 0) { --- End diff -- ditto (do not go through the while loop if labSet is empty) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129863 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val n = math.min(pred.length, k) + var i = 0 + var cnt = 0 + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 +} +i += 1 + } + if (labSet.size == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +cnt.toDouble / k + } +}.mean + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set, the average precision will be zero and a log + * warining is generated. + */ + lazy val meanAveragePrecision: Double = { +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val labSetSize = labSet.size + var i = 0 + var cnt = 0 + var precSum = 0.0 + val n = pred.length + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 + precSum += cnt.toDouble / (i + 1) +} +i += 1 + } + if (labSetSize == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +precSum / labSet.size + } +}.mean + } + + /** + * Compute the average NDCG value of all the queries, truncated at ranking position k. + * The discounted cumulative gain at position k is computed as: + *\sum_{i=1}^k (2^{relevance of ith item} - 1) / log(i + 1), + * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current + * implementation, the relevance value is binary. + * + * If for a query, the ranking algorithm returns n (n < k) results, the NDCG value at position n + * will be used. If the ground truth set contains n (n < k) results, the first n items will be + * used to compute the DCG value on the ground truth set. --- End diff -- This paragraph is not necessary because those cases are compatible with the definition of NDCG. --- If your project i
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129850 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the --- End diff -- `retrived` -> `retrieved` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129869 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val n = math.min(pred.length, k) + var i = 0 + var cnt = 0 + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 +} +i += 1 + } + if (labSet.size == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +cnt.toDouble / k + } +}.mean + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set, the average precision will be zero and a log + * warining is generated. + */ + lazy val meanAveragePrecision: Double = { +predictionAndLabels.map { case (pred, lab) => + val labSet = lab.toSet + val labSetSize = labSet.size + var i = 0 + var cnt = 0 + var precSum = 0.0 + val n = pred.length + + while (i < n) { +if (labSet.contains(pred(i))) { + cnt += 1 + precSum += cnt.toDouble / (i + 1) +} +i += 1 + } + if (labSetSize == 0) { +logWarning("Empty ground truth set, check input data") +0.0 + } else { +precSum / labSet.size + } +}.mean + } + + /** + * Compute the average NDCG value of all the queries, truncated at ranking position k. + * The discounted cumulative gain at position k is computed as: + *\sum_{i=1}^k (2^{relevance of ith item} - 1) / log(i + 1), + * and the NDCG is obtained by dividing the DCG value on the ground truth set. In the current + * implementation, the relevance value is binary. + * + * If for a query, the ranking algorithm returns n (n < k) results, the NDCG value at position n + * will be used. If the ground truth set contains n (n < k) results, the first n items will be + * used to compute the DCG value on the ground truth set. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the followi
[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19129854 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.evaluation + +import scala.reflect.ClassTag + +import org.apache.spark.Logging +import org.apache.spark.SparkContext._ +import org.apache.spark.annotation.Experimental +import org.apache.spark.rdd.RDD + +/** + * ::Experimental:: + * Evaluator for ranking algorithms. + * + * @param predictionAndLabels an RDD of (predicted ranking, ground truth set) pairs. + */ +@Experimental +class RankingMetrics[T: ClassTag](predictionAndLabels: RDD[(Array[T], Array[T])]) + extends Logging with Serializable { + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n < k) results, the precision value will be + * computed as #(relevant items retrived) / k. This formula also applies when the size of the + * ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be returned together with a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. + *K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + def precisionAt(k: Int): Double = { +require (k > 0,"ranking position k should be positive") --- End diff -- `require(k > 0, "ranking ...` (remove space before `(` and add space after `,`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59880277 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21966/consoleFull) for PR 2868 at commit [`13585e8`](https://github.com/apache/spark/commit/13585e8738e35743c6c0ab482d34552f01939bd4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59880136 @scwf, which PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59879975 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21965/consoleFull) for PR 2868 at commit [`6b05af0`](https://github.com/apache/spark/commit/6b05af042656b192e7b14954a433a75468df1d1c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59879666 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59879704 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21964/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59879567 We can reproduce the golden answer for hive 0.13 as i done in my closed PR, how about that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59879396 ;retest --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59879420 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59879138 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21963/consoleFull) for PR 2520 at commit [`c5b2a33`](https://github.com/apache/spark/commit/c5b2a3399d5c57ea0b5e0d15dabf7ee28d1ffaa5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user codedeft commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59878898 Seems like lots of line too long messages. Will address this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59878761 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2870#issuecomment-59878295 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21961/consoleFull) for PR 2870 at commit [`0871576`](https://github.com/apache/spark/commit/087157620a85c14534ac76f44ff079df6151ea5b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59878293 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21962/consoleFull) for PR 2520 at commit [`c5b2a33`](https://github.com/apache/spark/commit/c5b2a3399d5c57ea0b5e0d15dabf7ee28d1ffaa5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59878151 @davies ah I see, thanks. This should have triggered the old one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59878121 @pwendell There are two PullRequestBuilder plugins, one is work, another one (called NewSparkPullRequestBuilder) is still failing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59878023 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21960/consoleFull) for PR 2743 at commit [`c10229e`](https://github.com/apache/spark/commit/c10229e8a4eaa6944ea7c432437cdfafdb702ef5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59877987 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4023] [MLlib] [PySpark] convert rdd int...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/2870 [SPARK-4023] [MLlib] [PySpark] convert rdd into RDD of Vector Convert the input rdd to RDD of Vector. cc @mengxr You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark fix4023 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2870.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2870 commit 087157620a85c14534ac76f44ff079df6151ea5b Author: Davies Liu Date: 2014-10-21T04:35:15Z convert rdd into RDD of Vector --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877810 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21959/consoleFull) for PR 2868 at commit [`9ea76df`](https://github.com/apache/spark/commit/9ea76df661a93b1ebdf5ce5a764c7549b2fcbfd0). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877812 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21959/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877748 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21959/consoleFull) for PR 2868 at commit [`9ea76df`](https://github.com/apache/spark/commit/9ea76df661a93b1ebdf5ce5a764c7549b2fcbfd0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59877740 @davies this should have been fixed, not sure what is going on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59877728 jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877519 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3161][MLLIB] Adding a node Id caching m...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2868#issuecomment-59877524 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-59877085 I think this is looking pretty good, but I'm not okay with merging it before the tests are passing for Hive 13. Let me take a look and see how hard that will be. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59876973 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21955/consoleFull)** for PR 2520 at commit [`c5b2a33`](https://github.com/apache/spark/commit/c5b2a3399d5c57ea0b5e0d15dabf7ee28d1ffaa5) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3677] [BUILD] [YARN] pom.xml and SparkB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2520#issuecomment-59876974 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21955/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59876465 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/409/consoleFull) for PR 2743 at commit [`c10229e`](https://github.com/apache/spark/commit/c10229e8a4eaa6944ea7c432437cdfafdb702ef5). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], converter: S => T)` * ` case class ReconnectWorker(masterUrl: String) extends DeployMessage` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2743#issuecomment-59876269 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/409/consoleFull) for PR 2743 at commit [`c10229e`](https://github.com/apache/spark/commit/c10229e8a4eaa6944ea7c432437cdfafdb702ef5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59876264 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21957/consoleFull) for PR 2866 at commit [`c23897a`](https://github.com/apache/spark/commit/c23897aea7881eb819ec074073a4431ec8ba7eb5). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4019] Fix MapStatus compression bug tha...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-59876271 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21957/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2696#issuecomment-59875888 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21956/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2321] [WIP] Stable pull-based progress ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2696#issuecomment-59875884 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21956/consoleFull) for PR 2696 at commit [`787444c`](https://github.com/apache/spark/commit/787444c4ee20693a8f8c4fb5320ee4c4133a0d91). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging ` * ` class JobUIData(` * `public final class JavaStatusAPITest ` * ` public static final class IdentityWithDelay implements Function ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4012] call tryOrExit instead of logUnca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2864#issuecomment-59875800 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21958/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4012] call tryOrExit instead of logUnca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2864#issuecomment-59875799 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21958/consoleFull) for PR 2864 at commit [`3893a7e`](https://github.com/apache/spark/commit/3893a7e051674df70124b09c386c13afdc5ab3d8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaFutureActionWrapper[S, T](futureAction: FutureAction[S], converter: S => T)` * ` case class ReconnectWorker(masterUrl: String) extends DeployMessage` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4016] Allow user to show/hide UI metric...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2867#issuecomment-59872654 @kayousterhout This might integrate nicely with my #2852, which introduces some new abstractions to simplify the web UI's table rendering code. With my framework, I think you might be able to automatically generate the ids used to show / hide columns rather than having to have a class that holds a bunch of strings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4016] Allow user to show/hide UI metric...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2867#issuecomment-59872492 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21952/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4016] Allow user to show/hide UI metric...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2867#issuecomment-59872487 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21952/consoleFull) for PR 2867 at commit [`e989560`](https://github.com/apache/spark/commit/e989560562b473624159e4e3554ec9898884a247). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4012] call tryOrExit instead of logUnca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2864#issuecomment-59871965 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21958/consoleFull) for PR 2864 at commit [`3893a7e`](https://github.com/apache/spark/commit/3893a7e051674df70124b09c386c13afdc5ab3d8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org