[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3079 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-67382250 Do please reopen though once you having something that is passing tests :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-67382177 hi @erikerlandson, thanks for working on this. It would be great to have a solution to this long running problem. Since it looks like there is still some work to be done, I propose we close this issue for now. We are on a renewed mission to keep the PR queue small and limited to active PRs so that things don't fall through the cracks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-63881800 For reference, this other issue has some overlap: https://issues.apache.org/jira/browse/SPARK-4514 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/3079#discussion_r20062337 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +117,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- `valRD` is a kinda confusing name. I think the convention would be to name it `_rangeBounds`. Eg. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/FutureAction.scala#L111 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61719969 [Test build #22892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22892/consoleFull) for PR 3079 at commit [`2183325`](https://github.com/apache/spark/commit/2183325884c69878184fcf55d368339379269f35). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61719975 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22892/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61704937 [Test build #22892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22892/consoleFull) for PR 3079 at commit [`2183325`](https://github.com/apache/spark/commit/2183325884c69878184fcf55d368339379269f35). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61675446 [Test build #22880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22880/consoleFull) for PR 3079 at commit [`0fc30fe`](https://github.com/apache/spark/commit/0fc30fe411056090e4e18c96beeb3049ee750b24). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61675448 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22880/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61675397 [Test build #22880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22880/consoleFull) for PR 3079 at commit [`0fc30fe`](https://github.com/apache/spark/commit/0fc30fe411056090e4e18c96beeb3049ee750b24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61565392 [Test build #22828 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22828/consoleFull) for PR 3079 at commit [`019ac27`](https://github.com/apache/spark/commit/019ac27689132cdd0a7858259b168b9c91d3ba7a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61565401 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22828/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61556754 @erikerlandson I think you also need -Phive for the tests to run. It is possible some other things changed (or even that that test case changed with the upgrade to hive 13). Perhaps you can include a dummy change to the Hive code to make sure those tests are run in this PR so we can see what Jenkins thinks? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61556278 [Test build #22828 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22828/consoleFull) for PR 3079 at commit [`019ac27`](https://github.com/apache/spark/commit/019ac27689132cdd0a7858259b168b9c91d3ba7a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61556289 @marmbrus, @scwf, FWIW, the `correlationoptimizer14` test appears to be working for me. I ran it using: `env _RUN_SQL_TESTS=true _SQL_TESTS_ONLY=true ./dev/run-tests > ~/run-tests-1021.txt 2>&1` Not sure why, but running `sbt -Dspark.hive.whitelist=correlationoptimizer14 hive/test was not causing the test to run in my environment` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/3079#issuecomment-61555496 Reboot of #1689 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
GitHub user erikerlandson opened a pull request: https://github.com/apache/spark/pull/3079 [SPARK-1021] Defer the data-driven computation of partition bounds in so... ...rtByKey() until evaluation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/erikerlandson/spark spark-1021-pr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3079.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3079 commit 019ac27689132cdd0a7858259b168b9c91d3ba7a Author: Erik Erlandson Date: 2014-07-30T22:59:27Z [SPARK-1021] Defer the data-driven computation of partition bounds in sortByKey() until evaluation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-61508261 @marmbrus, FWIW, the `correlationoptimizer14` test appears to be working for me. I ran it using: `env _RUN_SQL_TESTS=true _SQL_TESTS_ONLY=true ./dev/run-tests > ~/run-tests-1021.txt 2>&1` Not sure why, but running`sbt -Dspark.hive.whitelist=correlationoptimizer14 hive/test` was not causing the test to run in my environment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57110142 @rxin @marmbrus I will check it out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57108427 I reverted this commit. @erikerlandson mind taking a look at this problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57106886 Since this PR was merged the correlationoptimizer14 test has been hanging. We might want to consider rolling back. You can reproduce the problem as follows: `sbt -Dspark.hive.whitelist=correlationoptimizer14 hive/test` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57043930 Have either of you thought about how to coordinate this with Josh's work on SPARK-3626? https://github.com/apache/spark/pull/2482 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57043862 BTW one thing that would be great to add is a test that makes sure we don't block the main dag scheduler thread. The reason I think we don't block is that we call rdd.partitions.length in submitJob: ```scala /** * Submit a job to the job scheduler and get a JobWaiter object back. The JobWaiter object * can be used to block until the the job finishes executing or can be used to cancel the job. */ def submitJob[T, U]( rdd: RDD[T], func: (TaskContext, Iterator[T]) => U, partitions: Seq[Int], callSite: CallSite, allowLocal: Boolean, resultHandler: (Int, U) => Unit, properties: Properties = null): JobWaiter[U] = { // Check to make sure we are not launching a task on a partition that does not exist. val maxPartitions = rdd.partitions.length ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1689 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57043822 @erikerlandson i'm going to merge this first. Maybe we can do the cleanup later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r18122214 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- Maybe we can rename valRB to _rangeBounds and use this directly in getPartitions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r18122212 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- this is going to be called once for every record on workers actually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r18122197 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V]( } @throws(classOf[IOException]) - private def readObject(in: ObjectInputStream) { + private def readObject(in: ObjectInputStream): Unit = this.synchronized { +if (valRB != null) return --- End diff -- that's not possible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-57043705 Actually I looked at it again. I don't think it would block the scheduler because we compute partitions outside the scheduler thread. This approach looks good to me! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55805772 So far the best idea I have for (2) is to set some kind of time-out on the evaluation. The bound computation uses subsampling that will (when all goes well) cap the computation at constant time. If the timeout triggers, some sub-optimal falback for partitioning might be used. Or just fail the entire evaluation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55797086 Yea I don't think we need to fully solve 3 here. My main concern with these set of changes is 2, since a single badly behaved RDD can potentially block the (unfortunately single threaded) scheduler forever. Let me think about this a little bit and get back to you. If you have an idea about how to fix that, feel free to suggest them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55628401 Or, maybe just look into playing the same game with the cogrouped RDDs that I did with sortByKey. Don't get into invoking `defaultPartitioner` until somebody asks for the output RDD's partitioner, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55627362 Hi @rxin, 1) SimpleFutureAction is still referred to in submitJob method, but that doesn't appear to be invoked anywhere. I was reluctant to get rid of it, as it's all experimental, and I could envision use cases for it. 2) I see your point. I don't currently have any clever ideas to avoid that scenario when it happens. 3) Very interesting -- so this scenario is triggered because `defaultPartitioner` starts examining input RDD partitioners, which sets off the job when it trips over the data driven partitioning computation from `sortByKey`. My impression is that this whack-a-mole with non-laziness stems from a combination of (a) a data-dependent partitioner(s), with (b) methods that refer to input partitioners as part of the construction of new RDDs. It *might* be possible to thread some design changes around so that references to partitioning are consistently encapsulated in a Future. Functions such as `defaultPartitioner` would then also have to return a Future, etc. Or, even more generally, somehow encapsulate *all* RDD initialization in a Future, with the idea that these futures would finally unwind when some Action was invoked. However it seems (imo) outside the scope of this particular Jira/PR. Maybe we could start another umbrella Jira to track possible solutions along these lines. Another orthogonal thought -- you can short circuit all this by providing a partitioner instead of forcing it to be computed from data. That's not as sexy, or widely applicable, as some deeper fix to the problem, but users can do it now as a workaround when it's feasible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55464403 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20236/consoleFull) for PR 1689 at commit [`50b6da6`](https://github.com/apache/spark/commit/50b6da6234188a147508654b08e6b67cbf01fbec). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55458226 @erikerlandson thanks for looking at this. A few questions: 1. After this pull request, does anything still use SimpleFutureAction? 2. If I understand this correctly, this could potentially block the single-threaded scheduler from doing anything else while waiting for the rangeBounds to be computed. Any comment on this? 3. This is not always lazy still right? See a test case ```scala c.parallelize(1 to 1000).map(x => (x, x)).sortByKey().join(sc.parallelize(1 to 10).map(x=>(x,x))) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55457077 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20236/consoleFull) for PR 1689 at commit [`50b6da6`](https://github.com/apache/spark/commit/50b6da6234188a147508654b08e6b67cbf01fbec). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-55456438 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-54694535 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52400243 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18675/consoleFull) for PR 1689 at commit [`f3448e4`](https://github.com/apache/spark/commit/f3448e47b671570cc95c99f809b68b9382c5cd1b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52397817 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18675/consoleFull) for PR 1689 at commit [`f3448e4`](https://github.com/apache/spark/commit/f3448e47b671570cc95c99f809b68b9382c5cd1b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52342401 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18615/consoleFull) for PR 1689 at commit [`09f0637`](https://github.com/apache/spark/commit/09f0637ac5ff986701d76c874b6567313022a0ab). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52339006 Excellent! I'll try to find some time to review this soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52336202 Latest push updates RangePartition sampling job to be async, and updates the async action functions so that they will properly enclose the sampling job induced by calling 'partitions'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-52336221 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18615/consoleFull) for PR 1689 at commit [`09f0637`](https://github.com/apache/spark/commit/09f0637ac5ff986701d76c874b6567313022a0ab). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15931660 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V]( } @throws(classOf[IOException]) - private def readObject(in: ObjectInputStream) { + private def readObject(in: ObjectInputStream): Unit = this.synchronized { +if (valRB != null) return --- End diff -- also was assuming readObject might be called in multiple threads. Can that happen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user erikerlandson commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15931609 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- My understanding of getPartitions was that it executes once, and is therefore "allowed to be expensive". Also, isn't rangeBounds generally only returning a reference to the array? (except for the first time, where it's computed) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15920203 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- It wouldn't surprise me if this performance figure varied with different combinations of hardware and Java version; but for at least one such combination, volatile reads are roughly 2-3x as costly as non-volatile reads as long as they are uncontended -- much more expensive when there are concurrent writes to contend with. http://brooker.co.za/blog/2012/09/10/volatile.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15919599 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- Any idea on volatile's impact on read performance? rangeBounds is read multiple times in getPartition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15919352 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V]( } @throws(classOf[IOException]) - private def readObject(in: ObjectInputStream) { + private def readObject(in: ObjectInputStream): Unit = this.synchronized { +if (valRB != null) return --- End diff -- Do we not want to deserialize valRB if it is not null? Are you worried rangeBounds might be called while the deserialization is happening? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-51424177 QA results for PR 1689:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18089/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-51421389 QA tests have started for PR 1689. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18089/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15900503 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,13 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + private var valRB: Array[K] = Array() --- End diff -- Can we perhaps make this thread safe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-50829158 QA results for PR 1689:- This patch PASSES unit tests.- This patch merges cleanly- This patch adds no public classesFor more information see test ouptut:https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17611/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-50824621 QA tests have started for PR 1689. This patch merges cleanly. View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17611/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-50824343 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
GitHub user erikerlandson opened a pull request: https://github.com/apache/spark/pull/1689 [SPARK-1021] Defer the data-driven computation of partition bounds in so... ...rtByKey() until evaluation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/erikerlandson/spark spark-1021-pr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1689.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1689 commit fa9bbca42d423b3cc9b10063073f7ab7507b1922 Author: Erik Erlandson Date: 2014-07-30T22:59:27Z [SPARK-1021] Defer the data-driven computation of partition bounds in sortByKey() until evaluation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1689#issuecomment-50765803 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---