[GitHub] spark issue #14558: [SPARK-16508][SparkR] Fix warnings on undocumented/dupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14558 **[Test build #63423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63423/consoleFull)** for PR 14558 at commit [`82e2f09`](https://github.com/apache/spark/commit/82e2f09517e9f3d726af0046d251748f892f59c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14554: [SPARK-16964][SQL] Remove private[sql] and privat...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14554#discussion_r74007849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala --- @@ -47,7 +47,7 @@ class SparkPlanInfo( } } -private[sql] object SparkPlanInfo { +private[execution] object SparkPlanInfo { --- End diff -- If it is something internal to the implementation (i.e. not part of some public field for expressions or commands or query plan), I'm keeping them private for tighten visibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14558: [SPARK-16508][SparkR] Fix warnings on undocumente...
GitHub user junyangq opened a pull request: https://github.com/apache/spark/pull/14558 [SPARK-16508][SparkR] Fix warnings on undocumented/duplicated arguments by CRAN-check ## What changes were proposed in this pull request? This PR tries to fix all the "undocumented/duplicated arguments" warnings given by CRAN-check. Most have been resolved, with only a few left that will be handled soon. ## How was this patch tested? R unit test and check-cran.sh script. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junyangq/spark SPARK-16508-branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14558.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14558 commit 82e2f09517e9f3d726af0046d251748f892f59c8 Author: Junyang QianDate: 2016-08-09T04:52:34Z Fix part of undocumented/duplicated arguments warnings by CRAN-check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14557 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63421/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14557 **[Test build #63421 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63421/consoleFull)** for PR 14557 at commit [`1a1ea2f`](https://github.com/apache/spark/commit/1a1ea2f598e7db4ab1b856b420dca36b796c2a1c). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14557 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14557: [SPARK-16709][CORE] Kill the running task if stage faile...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14557 **[Test build #63421 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63421/consoleFull)** for PR 14557 at commit [`1a1ea2f`](https://github.com/apache/spark/commit/1a1ea2f598e7db4ab1b856b420dca36b796c2a1c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14517 **[Test build #63422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63422/consoleFull)** for PR 14517 at commit [`31c43e6`](https://github.com/apache/spark/commit/31c43e6f3d9544478142990b4968fb105d8a03d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14557: [SPARK-16709][CORE] Kill the running task if stag...
GitHub user shenh062326 opened a pull request: https://github.com/apache/spark/pull/14557 [SPARK-16709][CORE] Kill the running task if stage failed ## What changes were proposed in this pull request? At SPARK-16709, when a stage failed, but the running task is still running, the retry stage will rerun the running task, it could cause TaskCommitDeniedException and task retry forever. Here is the log: `16/07/28 05:22:15 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 175, 10.215.146.81, partition 1,PROCESS_LOCAL, 1930 bytes) 16/07/28 05:28:35 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.1 (TID 207, 10.196.147.232, partition 1,PROCESS_LOCAL, 1930 bytes) 16/07/28 05:28:48 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 175) in 393261 ms on 10.215.146.81 (3/50) 16/07/28 05:34:11 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.1 (TID 207, 10.196.147.232): TaskCommitDenied (Driver denied task commit) for job: 1, partition: 1, attemptNumber: 207` 1 task 1.0 in stage1.0 start 2 stage1.0 failed, start stage1.1. 3 task 1.0 in stage1.1 start 4 task 1.0 in stage1.0 finished. 5 task 1.0 in stage1.1 failed with TaskCommitDenied Exception, then retry forever. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/shenh062326/spark SPARK-16709 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14557.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14557 commit 1a1ea2f598e7db4ab1b856b420dca36b796c2a1c Author: hongshenDate: 2016-08-09T06:44:14Z Kill the running task if stage failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14556: [SPARK-16966][Core] Make App Name to the valid name inst...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14556 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14555#discussion_r74006226 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala --- @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") ( @Since("2.0.0") val indices: Array[Int], @Since("2.0.0") val values: Array[Double]) extends Vector { - require(indices.length == values.length, "Sparse vectors require that the dimension of the" + -s" indices match the dimension of the values. You provided ${indices.length} indices and " + -s" ${values.length} values.") - require(indices.length <= size, s"You provided ${indices.length} indices and values, " + -s"which exceeds the specified vector size ${size}.") + validate() --- End diff -- Sure, but why bother writing a method? it's invoked once directly above. This is just constructor code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...
Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/14555#discussion_r74006057 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala --- @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") ( @Since("2.0.0") val indices: Array[Int], @Since("2.0.0") val values: Array[Double]) extends Vector { - require(indices.length == values.length, "Sparse vectors require that the dimension of the" + -s" indices match the dimension of the values. You provided ${indices.length} indices and " + -s" ${values.length} values.") - require(indices.length <= size, s"You provided ${indices.length} indices and values, " + -s"which exceeds the specified vector size ${size}.") + validate() --- End diff -- What do you mean ? This method would be called when SparseVector is created, it can refer any variable in SparseVector. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14556: [SPARK-16966][Core] Make App Name to the valid na...
GitHub user Sherry302 opened a pull request: https://github.com/apache/spark/pull/14556 [SPARK-16966][Core] Make App Name to the valid name instead of a rand⦠## What changes were proposed in this pull request? In the SparkSession, before setting "spark.app.name" to "java.util.UUID.randomUUID().toString", sparkConf.contains("spark.app.name") should be checked instead of options.contains("spark.app.name") ## How was this patch tested? Manual. E.g.: ./bin/spark-submit --name myApplicationTest --verbose --executor-cores 3 --num-executors 1 --master yarn --deploy-mode client --class org.apache.spark.examples.SparkKMeans examples/target/original-spark-examples_2.11-2.1.0-SNAPSHOT.jar The application "org.apache.spark.examples.SparkKMeans" above did not invoke ".appName()". Before this commit, in the history server UI: App Name was a randomUUID 70c06dc5-1b99-4b4a-a826-ea27497e977b. Now, with this commit, the App Name is the valid name "myApplicationTest". â¦omUUID when 'spark.app.name' exists You can merge this pull request into a Git repository by running: $ git pull https://github.com/Sherry302/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14556.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14556 commit a21937be7de24a353a3e8c9bbe7471b31a1f4719 Author: Weiqing YangDate: 2016-08-09T06:42:39Z [SPARK-16966][Core] Make App Name to the valid name instead of a randomUUID when 'spark.app.name' exists --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14551: [SPARK-16961][CORE] Fixed off-by-one error that b...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14551#discussion_r74005725 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -874,4 +874,38 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } } + + test("chi square test of randomizeInPlace") { +// Parameters +val arraySize = 10 +val numTrials = 1000 +val threshold = 0.05 +val seed = 1L + +// results[i][j]: how many times Utils.randomize moves an element from position j to position i +val results: Array[Array[Long]] = Array.ofDim(arraySize, arraySize) + +// This must be seeded because even a fair random process will fail this test with +// probability equal to the value of `threshold`, which is inconvenient for a unit test. +val rand = new java.util.Random(seed) +val range = 0 until arraySize + +for { + _ <- 0 until numTrials + trial = Utils.randomizeInPlace(range.toArray, rand) + i <- range +} results(i)(trial(i)) += 1L + +val chi = new org.apache.commons.math3.stat.inference.ChiSquareTest() + +// We expect an even distribution; this array will be rescaled by `chiSquareTest` +val expected: Array[Double] = Array.fill(arraySize * arraySize)(1.0) +val observed: Array[Long] = results.flatMap(x => x) --- End diff -- `flatten`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14551: [SPARK-16961][CORE] Fixed off-by-one error that b...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14551#discussion_r74005704 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -874,4 +874,38 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } } + + test("chi square test of randomizeInPlace") { +// Parameters +val arraySize = 10 +val numTrials = 1000 +val threshold = 0.05 +val seed = 1L + +// results[i][j]: how many times Utils.randomize moves an element from position j to position i +val results: Array[Array[Long]] = Array.ofDim(arraySize, arraySize) + +// This must be seeded because even a fair random process will fail this test with +// probability equal to the value of `threshold`, which is inconvenient for a unit test. +val rand = new java.util.Random(seed) +val range = 0 until arraySize + +for { + _ <- 0 until numTrials + trial = Utils.randomizeInPlace(range.toArray, rand) + i <- range +} results(i)(trial(i)) += 1L + +val chi = new org.apache.commons.math3.stat.inference.ChiSquareTest() --- End diff -- import; remove types below --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14551: [SPARK-16961][CORE] Fixed off-by-one error that b...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14551#discussion_r74005614 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -874,4 +874,38 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } } + + test("chi square test of randomizeInPlace") { +// Parameters +val arraySize = 10 +val numTrials = 1000 +val threshold = 0.05 +val seed = 1L + +// results[i][j]: how many times Utils.randomize moves an element from position j to position i +val results: Array[Array[Long]] = Array.ofDim(arraySize, arraySize) --- End diff -- Some minor style things -- just omit the type here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14551: [SPARK-16961][CORE] Fixed off-by-one error that b...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14551#discussion_r74005685 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -874,4 +874,38 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } } + + test("chi square test of randomizeInPlace") { +// Parameters +val arraySize = 10 +val numTrials = 1000 +val threshold = 0.05 +val seed = 1L + +// results[i][j]: how many times Utils.randomize moves an element from position j to position i +val results: Array[Array[Long]] = Array.ofDim(arraySize, arraySize) + +// This must be seeded because even a fair random process will fail this test with +// probability equal to the value of `threshold`, which is inconvenient for a unit test. +val rand = new java.util.Random(seed) +val range = 0 until arraySize + +for { + _ <- 0 until numTrials + trial = Utils.randomizeInPlace(range.toArray, rand) --- End diff -- I think this ends up being a little hard to grok. Just do two nested loops --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14554: [SPARK-16964][SQL] Remove private[sql] and private[spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14554 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14551: [SPARK-16961][CORE] Fixed off-by-one error that b...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14551#discussion_r74005625 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -874,4 +874,38 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging { } } } + + test("chi square test of randomizeInPlace") { +// Parameters +val arraySize = 10 +val numTrials = 1000 +val threshold = 0.05 +val seed = 1L + +// results[i][j]: how many times Utils.randomize moves an element from position j to position i +val results: Array[Array[Long]] = Array.ofDim(arraySize, arraySize) + +// This must be seeded because even a fair random process will fail this test with +// probability equal to the value of `threshold`, which is inconvenient for a unit test. +val rand = new java.util.Random(seed) --- End diff -- import java.util.Random --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14554: [SPARK-16964][SQL] Remove private[sql] and private[spark...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63415/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14554: [SPARK-16964][SQL] Remove private[sql] and private[spark...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14554 **[Test build #63415 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63415/consoleFull)** for PR 14554 at commit [`d9ba88f`](https://github.com/apache/spark/commit/d9ba88f465898e78a79db8390213dc831f841ae2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14555#discussion_r74005289 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala --- @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") ( @Since("2.0.0") val indices: Array[Int], @Since("2.0.0") val values: Array[Double]) extends Vector { - require(indices.length == values.length, "Sparse vectors require that the dimension of the" + -s" indices match the dimension of the values. You provided ${indices.length} indices and " + -s" ${values.length} values.") - require(indices.length <= size, s"You provided ${indices.length} indices and values, " + -s"which exceeds the specified vector size ${size}.") + validate() + + private def validate(): Unit = { +require(size >= 0, "The size of the requested sparse vector must be greater than 0.") +require(indices.length == values.length, "Sparse vectors require that the dimension of the" + + s" indices match the dimension of the values. You provided ${indices.length} indices and " + + s" ${values.length} values.") +require(indices.length <= size, s"You provided ${indices.length} indices and values, " + + s"which exceeds the specified vector size ${size}.") + +var prev = -1 +indices.foreach { i => + require(i >= 0, s"Found negative indice: $i.") + require(prev < i, s"Found duplicate indices: $i.") + prev = i +} +require(prev < size, s"You may not write an element to index $prev because the declared " + --- End diff -- If you're doing it this way, then just check if the first index is >= 0. If any of them is, it will be. This message could be a little more straightforward: "found index that exceeds size" or something --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14555#discussion_r74005178 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala --- @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") ( @Since("2.0.0") val indices: Array[Int], @Since("2.0.0") val values: Array[Double]) extends Vector { - require(indices.length == values.length, "Sparse vectors require that the dimension of the" + -s" indices match the dimension of the values. You provided ${indices.length} indices and " + -s" ${values.length} values.") - require(indices.length <= size, s"You provided ${indices.length} indices and values, " + -s"which exceeds the specified vector size ${size}.") + validate() + + private def validate(): Unit = { +require(size >= 0, "The size of the requested sparse vector must be greater than 0.") +require(indices.length == values.length, "Sparse vectors require that the dimension of the" + + s" indices match the dimension of the values. You provided ${indices.length} indices and " + + s" ${values.length} values.") +require(indices.length <= size, s"You provided ${indices.length} indices and values, " + + s"which exceeds the specified vector size ${size}.") + +var prev = -1 +indices.foreach { i => + require(i >= 0, s"Found negative indice: $i.") --- End diff -- index, not indice --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14555#discussion_r74005184 --- Diff: mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala --- @@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") ( @Since("2.0.0") val indices: Array[Int], @Since("2.0.0") val values: Array[Double]) extends Vector { - require(indices.length == values.length, "Sparse vectors require that the dimension of the" + -s" indices match the dimension of the values. You provided ${indices.length} indices and " + -s" ${values.length} values.") - require(indices.length <= size, s"You provided ${indices.length} indices and values, " + -s"which exceeds the specified vector size ${size}.") + validate() --- End diff -- What's the value in this method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63412/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #63412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63412/consoleFull)** for PR 13701 at commit [`cee74b7`](https://github.com/apache/spark/commit/cee74b7b9cba73a91d9120add0cfe8e3226f19a6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14531#discussion_r74004679 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -80,13 +83,49 @@ case class CreateTableLikeCommand( s"Source table in CREATE TABLE LIKE cannot be temporary: '$sourceTable'") } -val tableToCreate = catalog.getTableMetadata(sourceTable).copy( - identifier = targetTable, - tableType = CatalogTableType.MANAGED, - createTime = System.currentTimeMillis, - lastAccessTime = -1).withNewStorage(locationUri = None) +val sourceTableDesc = catalog.getTableMetadata(sourceTable) -catalog.createTable(tableToCreate, ifNotExists) +sourceTableDesc.tableType match { + case CatalogTableType.MANAGED | CatalogTableType.EXTERNAL | CatalogTableType.VIEW => // OK + case o => throw new AnalysisException( +s"CREATE TABLE LIKE is not allowed when the source table is ${o.name}") +} + +if (DDLUtils.isDatasourceTable(sourceTableDesc) && +sourceTableDesc.tableType != CatalogTableType.MANAGED) { + throw new AnalysisException( +"CREATE TABLE LIKE is not allowed when the source table is an external table created " + + "using the datasource API") +} + +// For EXTERNAL_TABLE, the table properties has a particular field. To change it +// to a MANAGED_TABLE, we need to remove it; Otherwise, it will be EXTERNAL_TABLE, +// even if we set the tableType to MANAGED +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// Table comment is stored as a table property. To clean it, we also should remove them. +val newTableProp = + sourceTableDesc.properties.filterKeys(key => key != "EXTERNAL" && key != "comment") +val newSerdeProp = + if (DDLUtils.isDatasourceTable(sourceTableDesc)) { +val newPath = catalog.defaultTablePath(targetTable) +sourceTableDesc.storage.properties ++ Map("path" -> newPath) + } else { +sourceTableDesc.storage.properties + } +val newTableDesc = + sourceTableDesc.copy( --- End diff -- I think it's more clear to list all the fields that need to retain in class doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bu...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14531#discussion_r74004636 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -80,13 +83,49 @@ case class CreateTableLikeCommand( s"Source table in CREATE TABLE LIKE cannot be temporary: '$sourceTable'") } -val tableToCreate = catalog.getTableMetadata(sourceTable).copy( - identifier = targetTable, - tableType = CatalogTableType.MANAGED, - createTime = System.currentTimeMillis, - lastAccessTime = -1).withNewStorage(locationUri = None) +val sourceTableDesc = catalog.getTableMetadata(sourceTable) -catalog.createTable(tableToCreate, ifNotExists) +sourceTableDesc.tableType match { + case CatalogTableType.MANAGED | CatalogTableType.EXTERNAL | CatalogTableType.VIEW => // OK + case o => throw new AnalysisException( +s"CREATE TABLE LIKE is not allowed when the source table is ${o.name}") +} + +if (DDLUtils.isDatasourceTable(sourceTableDesc) && +sourceTableDesc.tableType != CatalogTableType.MANAGED) { + throw new AnalysisException( +"CREATE TABLE LIKE is not allowed when the source table is an external table created " + + "using the datasource API") +} + +// For EXTERNAL_TABLE, the table properties has a particular field. To change it +// to a MANAGED_TABLE, we need to remove it; Otherwise, it will be EXTERNAL_TABLE, +// even if we set the tableType to MANAGED +// (metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L1095-L1105) +// Table comment is stored as a table property. To clean it, we also should remove them. +val newTableProp = + sourceTableDesc.properties.filterKeys(key => key != "EXTERNAL" && key != "comment") +val newSerdeProp = + if (DDLUtils.isDatasourceTable(sourceTableDesc)) { +val newPath = catalog.defaultTablePath(targetTable) +sourceTableDesc.storage.properties ++ Map("path" -> newPath) + } else { +sourceTableDesc.storage.properties + } +val newTableDesc = + sourceTableDesc.copy( --- End diff -- which fields in `storage` we should retain? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14531 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63413/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14531: [SPARK-16943] [SPARK-16942] [SQL] Fix multiple bugs in C...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14531 **[Test build #63413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63413/consoleFull)** for PR 14531 at commit [`b820be8`](https://github.com/apache/spark/commit/b820be831ecdecb3261bf9eb1171ac8545748aa3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14537 @rajeshbalamohan, the changes to `HiveMetastoreCatalog.scala` look reasonable. This mirrors the behavior of this method before the `if (fileType.equals("parquet"))` expression was introduced in 1e886159849e3918445d3fdc3c4cef86c6c1a236. @tejasapatil, can you help review this PR? I ask because you're the author of 1e886159849e3918445d3fdc3c4cef86c6c1a236, which is where the code in question in `HiveMetastoreCatalog.scala` was written. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63410/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13701 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13701: [SPARK-15639][SPARK-16321][SQL] Push down filter at RowG...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13701 **[Test build #63410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63410/consoleFull)** for PR 13701 at commit [`0b38ba1`](https://github.com/apache/spark/commit/0b38ba18bb51f4e6ee9dabe00c377601ae32777e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14537: [SPARK-16948][SQL] Querying empty partitioned orc...
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14537#discussion_r74003788 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala --- @@ -294,7 +294,9 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, inferred) }.getOrElse(metastoreSchema) } else { - defaultSource.inferSchema(sparkSession, options, fileCatalog.allFiles()).get + val inferredSchema = --- End diff -- There's some code duplicated in both branches of this `if` expression. Can you refactor it to remove the duplication, please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 @sethah Thanks for your careful review! The PR here already passing the bcFeaturesStd and bcCoeffs as constructor args to the `LogisticAggregator`, like your PR #14109 You mean add another two member into `LogisticAggregator` like `@transient lazy val featureStd = bcFeatureStd.value` `@transient lazy val coeffs = bcCoeff.value` ? And explicitly destroy broadcast I will add it soon! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14546 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14546 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63408/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14546: [SPARK-16955][SQL] Using ordinals in ORDER BY and GROUP ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14546 **[Test build #63408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63408/consoleFull)** for PR 14546 at commit [`7c3d732`](https://github.com/apache/spark/commit/7c3d732e1d84708faecc314b2548803dbbebe84f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org