[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87196/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20499 **[Test build #87196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87196/testReport)** for PR 20499 at commit [`9f49b05`](https://github.com/apache/spark/commit/9f49b05de312495c84fccec93c82d0af8205eff3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class _NoValueType(object):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20527: [SPARK-23348][SQL] append data using saveAsTable should ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20527 LGTM only some nits. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166853858 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -132,6 +134,32 @@ class InMemoryCatalogedDDLSuite extends DDLSuite with SharedSQLContext with Befo checkAnswer(spark.table("t"), Row(Row("a", 1)) :: Nil) } } + + // TODO: This test is copied from HiveDDLSuite, unify it later. + test("SPARK-23348: append data to data source table with saveAsTable") { --- End diff -- Do we also want to cover the following case: ``` 2) Target tables have column metadata ```? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/702/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20534: [SPARK-23319][TESTS][BRANCH-2.3] Explicitly specify Pand...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20534 Merged to branch-2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20534: [SPARK-23319][TESTS][BRANCH-2.3] Explicitly speci...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/20534 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/700/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/701/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87188/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20527: [SPARK-23348][SQL] append data using saveAsTable ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20527#discussion_r166852743 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala --- @@ -346,37 +349,11 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] wit """.stripMargin) } - castAndRenameChildOutput(insert.copy(partition = normalizedPartSpec), expectedColumns) + insert.copy(query = newQuery, partition = normalizedPartSpec) --- End diff -- nit: don't need to copy the `newQuery` if it is the same as `query`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate e...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/20532#discussion_r166852617 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -53,10 +53,21 @@ package object config { .booleanConf .createWithDefault(false) - private[spark] val EVENT_LOG_BLOCK_UPDATES = -ConfigBuilder("spark.eventLog.logBlockUpdates.enabled") - .booleanConf - .createWithDefault(false) + private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION = +ConfigBuilder("spark.eventLog.logBlockUpdates.fraction") + .doc("Expected number of times each blockUpdated event is chosen to log, " + +"fraction must be [0, 1]. 0 by default, means disabled") + .doubleConf + .checkValue(_ >= 0, "The fraction must not be negative") --- End diff -- >how about control the max number of events recorded per time split? I think this approach is still hard to balance the user requirement and event log size. Spark will possibly ignore the events that is required by the user at the specific time. IMO, using "true" or "false" might be a feasible solution - whether to dump all the events or just ignore them. For normal user, by default (false) should be enough for them, but if you want further analysis, you can enable this by taking the risk of large event file. For the configuration, I think we could use something like "spark.eventLog.logVerboseEvent.enabled" to control all the verbose events that will be dumped manually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20499 **[Test build #87201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87201/testReport)** for PR 20499 at commit [`885b4d0`](https://github.com/apache/spark/commit/885b4d00af53dfd0148c431fdacce9a2789f32a2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87202/testReport)** for PR 20382 at commit [`874c91c`](https://github.com/apache/spark/commit/874c91c41942972cabb85be175f929fc62e74af7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87200/testReport)** for PR 20525 at commit [`cb73001`](https://github.com/apache/spark/commit/cb730014ee1d951b481cc6af65c21fa37d94bcb4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20509 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20509 **[Test build #87188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87188/testReport)** for PR 20509 at commit [`ec64554`](https://github.com/apache/spark/commit/ec645544fc940bcb58d9afec876245dc31a95166). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/699/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20382 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20382 **[Test build #87199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87199/testReport)** for PR 20382 at commit [`fdc9b9c`](https://github.com/apache/spark/commit/fdc9b9c8a1dcc749be97cfd1c46a502c33bf4bb9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20499 LGTM, waiting for more feedbacks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166849342 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -19,6 +19,7 @@ package org.apache.spark.sql.execution.datasources import org.apache.spark.sql.{QueryTest, Row} import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.{StringType, StructField, StructType} --- End diff -- will remove --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166849297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -190,9 +190,18 @@ object FileFormatWriter extends Logging { global = false, child = plan).execute() } - val ret = new Array[WriteTaskResult](rdd.partitions.length) + + // SPARK-23271 If we are attempting to write a zero partition rdd, create a dummy single + // partition rdd to make sure we at least set up one write task to write the metadata. + val finalRdd = if (rdd.partitions.length == 0) { --- End diff -- Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/698/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20499 **[Test build #87198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87198/testReport)** for PR 20499 at commit [`a349d07`](https://github.com/apache/spark/commit/a349d078a78efe0763b90b932c8d0b26f6aa4c86). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20499 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/697/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20516#discussion_r166849307 --- Diff: core/src/test/scala/org/apache/spark/network/netty/NettyBlockTransferServiceSuite.scala --- @@ -77,16 +79,53 @@ class NettyBlockTransferServiceSuite verifyServicePort(expectedPort = service0.port + 1, actualPort = service1.port) } + test("can bind to two max specific ports") { +service0 = createService(port = 65535) +service1 = createService(port = 65535) +verifyServicePort(expectedPort = 65535, actualPort = service0.port) +// see `Utils.userPort` the user port to try when trying to bind a service, +// the max privileged port is 1024. +verifyServicePort(expectedPort = 1024, actualPort = service1.port) + } + + test("can't bind to a privileged port") { +intercept[IllegalArgumentException] { + service0 = createService(port = 23) +} + } + + test("turn off spark.port.maxRetries, bind repeat port is fail") { +val port = 17634 + Random.nextInt(1) +logInfo("random port for test: " + port) +service0 = createService(port) + +// `service0.port` is occupied, bind repeat port throw BindException. +intercept[BindException] { + val conf = new SparkConf() +.set("spark.app.id", s"test-${getClass.getName}") +.set("spark.testing", "true") +.set("spark.port.maxRetries", "0") + + val securityManager = new SecurityManager(conf) + val blockDataManager = mock(classOf[BlockDataManager]) + val service = new NettyBlockTransferService(conf, securityManager, "localhost", "localhost", +service0.port, 1) + service.init(blockDataManager) +} + } + private def verifyServicePort(expectedPort: Int, actualPort: Int): Unit = { actualPort should be >= expectedPort // avoid testing equality in case of simultaneous tests +// `spark.testing` is true, // the default value for `spark.port.maxRetries` is 100 under test actualPort should be <= (expectedPort + 100) } private def createService(port: Int): NettyBlockTransferService = { val conf = new SparkConf() .set("spark.app.id", s"test-${getClass.getName}") + .set("spark.testing", "true") --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20534: [SPARK-23319][TESTS][BRANCH-2.3] Explicitly specify Pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20534 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87185/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166849190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -190,9 +190,18 @@ object FileFormatWriter extends Logging { global = false, child = plan).execute() } - val ret = new Array[WriteTaskResult](rdd.partitions.length) + + // SPARK-23271 If we are attempting to write a zero partition rdd, create a dummy single + // partition rdd to make sure we at least set up one write task to write the metadata. + val finalRdd = if (rdd.partitions.length == 0) { --- End diff -- how about `val rddWithNonEmptyPartitions ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20534: [SPARK-23319][TESTS][BRANCH-2.3] Explicitly specify Pand...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20534 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20532 @jerryshao and @jiangxb1987 , thanks for your advice. In 2.1.x, the two Update events (BlockUpdated & ExecutorMetricsUpdate) are all dumb. And in 2.2.x, only BlockUpdated event has a configuration to be logged. So the fair way is just adding an enable configuration to logging ExecutorMetrics for further using. But actually, we are refactoring the heartbeat to report more Executor information to the Driver. And this information will be logged to event log to be analysed. You know the Update events are so mass and they are not sequential. So we don't need all the events to do analysing. For example, we report Heap Usage after GC of all Executors, 10% events also can help us to get the result we want. So just add a configuration "enabled" for ExecutorMetricsUpdate is poorly simple in a big production cluster. BTW, the work I mentioned about will be contributed back to community as well soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166849056 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala --- @@ -19,6 +19,7 @@ package org.apache.spark.sql.execution.datasources import org.apache.spark.sql.{QueryTest, Row} import org.apache.spark.sql.test.SharedSQLContext +import org.apache.spark.sql.types.{StringType, StructField, StructType} --- End diff -- unnecessary change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20534: [SPARK-23319][TESTS][BRANCH-2.3] Explicitly specify Pand...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20534 **[Test build #87185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87185/testReport)** for PR 20534 at commit [`c110e34`](https://github.com/apache/spark/commit/c110e34f0137a031bcff602d6855b0b28febe9ab). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20516: [SPARK-23343][CORE][TEST] Increase the exception ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20516#discussion_r166848789 --- Diff: core/src/test/scala/org/apache/spark/network/netty/NettyBlockTransferServiceSuite.scala --- @@ -77,16 +79,53 @@ class NettyBlockTransferServiceSuite verifyServicePort(expectedPort = service0.port + 1, actualPort = service1.port) } + test("can bind to two max specific ports") { +service0 = createService(port = 65535) +service1 = createService(port = 65535) +verifyServicePort(expectedPort = 65535, actualPort = service0.port) +// see `Utils.userPort` the user port to try when trying to bind a service, +// the max privileged port is 1024. +verifyServicePort(expectedPort = 1024, actualPort = service1.port) + } + + test("can't bind to a privileged port") { +intercept[IllegalArgumentException] { + service0 = createService(port = 23) +} + } + + test("turn off spark.port.maxRetries, bind repeat port is fail") { +val port = 17634 + Random.nextInt(1) +logInfo("random port for test: " + port) +service0 = createService(port) + +// `service0.port` is occupied, bind repeat port throw BindException. +intercept[BindException] { + val conf = new SparkConf() +.set("spark.app.id", s"test-${getClass.getName}") +.set("spark.testing", "true") +.set("spark.port.maxRetries", "0") + + val securityManager = new SecurityManager(conf) + val blockDataManager = mock(classOf[BlockDataManager]) + val service = new NettyBlockTransferService(conf, securityManager, "localhost", "localhost", +service0.port, 1) + service.init(blockDataManager) +} + } + private def verifyServicePort(expectedPort: Int, actualPort: Int): Unit = { actualPort should be >= expectedPort // avoid testing equality in case of simultaneous tests +// `spark.testing` is true, // the default value for `spark.port.maxRetries` is 100 under test actualPort should be <= (expectedPort + 100) } private def createService(port: Int): NettyBlockTransferService = { val conf = new SparkConf() .set("spark.app.id", s"test-${getClass.getName}") + .set("spark.testing", "true") --- End diff -- I think the test framework will set `spark.testing`, in `./project/SparkBuild.scala:795:javaOptions in Test += "-Dspark.testing=1"` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87195/testReport)** for PR 20525 at commit [`92f490e`](https://github.com/apache/spark/commit/92f490e334445f6887970e32de49f93a8da75bde). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20499: [SPARK-23328][PYTHON] Disallow default value None in na....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20499 **[Test build #87196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87196/testReport)** for PR 20499 at commit [`9f49b05`](https://github.com/apache/spark/commit/9f49b05de312495c84fccec93c82d0af8205eff3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87197/testReport)** for PR 20477 at commit [`0efd5d3`](https://github.com/apache/spark/commit/0efd5d3919e24a480a9771ddd1d81bef11341e94). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/696/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/695/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20477 also cc @tdas @jose-torres @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20535 **[Test build #87194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87194/testReport)** for PR 20535 at commit [`e92b6b2`](https://github.com/apache/spark/commit/e92b6b2083c4dbf31c27c961096a45cd8d84f16e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/694/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/20516 @jiangxb1987 thank you for review it. First, the source of` 100 `in the revised` (expectedPort + 100)` must be after the setting of the spark.testing, Second, to add some boundary tests, such as port 65535 and Utils.userPort self circulation to generating port when base port is 65535. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87189/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate e...
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/20532#discussion_r166844625 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -53,10 +53,21 @@ package object config { .booleanConf .createWithDefault(false) - private[spark] val EVENT_LOG_BLOCK_UPDATES = -ConfigBuilder("spark.eventLog.logBlockUpdates.enabled") - .booleanConf - .createWithDefault(false) + private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION = +ConfigBuilder("spark.eventLog.logBlockUpdates.fraction") + .doc("Expected number of times each blockUpdated event is chosen to log, " + +"fraction must be [0, 1]. 0 by default, means disabled") + .doubleConf + .checkValue(_ >= 0, "The fraction must not be negative") --- End diff -- Agree, I think a max limitation is necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20477 **[Test build #87189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87189/testReport)** for PR 20477 at commit [`2b4a095`](https://github.com/apache/spark/commit/2b4a0956216ebc5e2a04c72c65d8f2c484b8abcd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166843930 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +73,26 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + Seq("orc", "parquet").foreach { format => +test(s"SPARK-23271 empty RDD when saved should write a metadata only file - $format") { + withTempDir { inputPath => +withTempPath { outputPath => + val anySchema = StructType(StructField("anyName", StringType) :: Nil) + val df = spark.read.schema(anySchema).csv(inputPath.toString) --- End diff -- an easier to create an empty dataframe: `spark.emptyDataFrame.select(lit(1).as("i"))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166843705 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -190,9 +190,18 @@ object FileFormatWriter extends Logging { global = false, child = plan).execute() } - val ret = new Array[WriteTaskResult](rdd.partitions.length) + + // SPARK-23271 If we are attempting to write a zero partition rdd, create a dummy single + // partition rdd to make sure we at least set up one write task to write the metadata. + val finalRdd = if (rdd.partitions.length == 0) { +sparkSession.sparkContext.parallelize(Array.empty[InternalRow]) --- End diff -- `sparkSession.sparkContext.parallelize(Array.empty[InternalRow], 1)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19077 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/20532 I agree with @jiangxb1987 . @LantaoJin would you please elaborate the usage scenario of dumping executor metrics to event log? Seems history server doesn't leverage such information necessarily. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87183/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87193/testReport)** for PR 20525 at commit [`37343a8`](https://github.com/apache/spark/commit/37343a84ee87cf6663c5cd2f67c3ac27af9fdfdf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19077 **[Test build #87183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87183/testReport)** for PR 19077 at commit [`acfe551`](https://github.com/apache/spark/commit/acfe5513f0d83d83aed6899de2bb42262442e3a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/693/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19077 **[Test build #87192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87192/testReport)** for PR 19077 at commit [`78ede3f`](https://github.com/apache/spark/commit/78ede3fae243c5379eaea1c86584e200ce697c19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20508: [SPARK-23335][SQL] Should not convert to double w...
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/20508#discussion_r166839312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -327,6 +327,14 @@ object TypeCoercion { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e + // For integralType should not convert to double which will cause precision loss. + case a @ BinaryArithmetic(left @ StringType(), right @ IntegralType()) => --- End diff -- @wangyum Sorry for bothering you, i will take some time to fix this later . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/692/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r166836868 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -46,9 +46,12 @@ private boolean shouldPool(long size) { @Override public MemoryBlock allocate(long size) throws OutOfMemoryError { -if (shouldPool(size)) { +int numWords = (int) ((size + 7) / 8); --- End diff -- L51: assert (alignedSize >= size); I think `assert (alignedSize >= size) ` can make sure `(size + 7) / 8` doesn't exceed max int. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r166836613 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -46,9 +46,12 @@ private boolean shouldPool(long size) { @Override public MemoryBlock allocate(long size) throws OutOfMemoryError { -if (shouldPool(size)) { +int numWords = (int) ((size + 7) / 8); +long alignedSize = numWords * 8; --- End diff -- yeah,thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87191/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20535 **[Test build #87191 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87191/testReport)** for PR 20535 at commit [`6644e49`](https://github.com/apache/spark/commit/6644e49ce41e971103298fe3966e921765a82804). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87182/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20525 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20525 **[Test build #87182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87182/testReport)** for PR 20525 at commit [`9536469`](https://github.com/apache/spark/commit/953646935dfdb127113cf061d1aeed349f671971). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20535 **[Test build #87191 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87191/testReport)** for PR 20535 at commit [`6644e49`](https://github.com/apache/spark/commit/6644e49ce41e971103298fe3966e921765a82804). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20535 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/691/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r166834552 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -46,9 +46,12 @@ private boolean shouldPool(long size) { @Override public MemoryBlock allocate(long size) throws OutOfMemoryError { -if (shouldPool(size)) { +int numWords = (int) ((size + 7) / 8); --- End diff -- let's add some check to make sure `(size + 7) / 8` doesn't exceed max int. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r166834595 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -46,9 +46,12 @@ private boolean shouldPool(long size) { @Override public MemoryBlock allocate(long size) throws OutOfMemoryError { -if (shouldPool(size)) { +int numWords = (int) ((size + 7) / 8); +long alignedSize = numWords * 8; --- End diff -- `numWords * 8L`, to avoid overflow --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20540: Branch 2.3
Github user zhangzg187 closed the pull request at: https://github.com/apache/spark/pull/20540 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20540: Branch 2.3
GitHub user zhangzg187 opened a pull request: https://github.com/apache/spark/pull/20540 Branch 2.3 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20540.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20540 commit cd92913f345c8d932d3c651626c7f803e6abdcdb Author: jerryshaoDate: 2018-01-04T19:39:42Z [SPARK-21475][CORE][2ND ATTEMPT] Change to use NIO's Files API for external shuffle service ## What changes were proposed in this pull request? This PR is the second attempt of #18684 , NIO's Files API doesn't override `skip` method for `InputStream`, so it will bring in performance issue (mentioned in #20119). But using `FileInputStream`/`FileOutputStream` will also bring in memory issue (https://dzone.com/articles/fileinputstream-fileoutputstream-considered-harmful), which is severe for long running external shuffle service. So here in this proposal, only fixing the external shuffle service related code. ## How was this patch tested? Existing tests. Author: jerryshao Closes #20144 from jerryshao/SPARK-21475-v2. (cherry picked from commit 93f92c0ed7442a4382e97254307309977ff676f8) Signed-off-by: Shixiong Zhu commit bc4bef472de0e99f74a80954d694c3d1744afe3a Author: Marcelo Vanzin Date: 2018-01-04T22:19:00Z [SPARK-22850][CORE] Ensure queued events are delivered to all event queues. The code in LiveListenerBus was queueing events before start in the queues themselves; so in situations like the following: bus.post(someEvent) bus.addToEventLogQueue(listener) bus.start() "someEvent" would not be delivered to "listener" if that was the first listener in the queue, because the queue wouldn't exist when the event was posted. This change buffers the events before starting the bus in the bus itself, so that they can be delivered to all registered queues when the bus is started. Also tweaked the unit tests to cover the behavior above. Author: Marcelo Vanzin Closes #20039 from vanzin/SPARK-22850. (cherry picked from commit d2cddc88eac32f26b18ec26bb59e85c6f09a8c88) Signed-off-by: Imran Rashid commit 2ab4012adda941ebd637bd248f65cefdf4aaf110 Author: Marcelo Vanzin Date: 2018-01-04T23:00:09Z [SPARK-22948][K8S] Move SparkPodInitContainer to correct package. Author: Marcelo Vanzin Closes #20156 from vanzin/SPARK-22948. (cherry picked from commit 95f9659abe8845f9f3f42fd7ababd79e55c52489) Signed-off-by: Marcelo Vanzin commit 84707f0c6afa9c5417e271657ff930930f82213c Author: Yinan Li Date: 2018-01-04T23:35:20Z [SPARK-22953][K8S] Avoids adding duplicated secret volumes when init-container is used ## What changes were proposed in this pull request? User-specified secrets are mounted into both the main container and init-container (when it is used) in a Spark driver/executor pod, using the `MountSecretsBootstrap`. Because `MountSecretsBootstrap` always adds new secret volumes for the secrets to the pod, the same secret volumes get added twice, one when mounting the secrets to the main container, and the other when mounting the secrets to the init-container. This PR fixes the issue by separating `MountSecretsBootstrap.mountSecrets` out into two methods: `addSecretVolumes` for adding secret volumes to a pod and `mountSecrets` for mounting secret volumes to a container, respectively. `addSecretVolumes` is only called once for each pod, whereas `mountSecrets` is called individually for the main container and the init-container (if it is used). Ref: https://github.com/apache-spark-on-k8s/spark/issues/594. ## How was this patch tested? Unit tested and manually tested. vanzin This replaces https://github.com/apache/spark/pull/20148. hex108 foxish kimoonkim Author: Yinan Li Closes #20159 from liyinan926/master. (cherry picked from commit
[GitHub] spark pull request #20531: [SPARK-23352][PYTHON] Explicitly specify supporte...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20531#discussion_r166831572 --- Diff: python/pyspark/sql/udf.py --- @@ -112,15 +112,31 @@ def returnType(self): else: self._returnType_placeholder = _parse_datatype_string(self._returnType) -if self.evalType == PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF \ -and not isinstance(self._returnType_placeholder, StructType): -raise ValueError("Invalid returnType: returnType must be a StructType for " - "pandas_udf with function type GROUPED_MAP") -elif self.evalType == PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF \ -and isinstance(self._returnType_placeholder, (StructType, ArrayType, MapType)): -raise NotImplementedError( -"ArrayType, StructType and MapType are not supported with " -"PandasUDFType.GROUPED_AGG") +if self.evalType == PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF: --- End diff -- nit: I'd prefer to keep the check order by the definition in `PythonEvalType` if you don't have a special reason. E.g., ``` if self.evalType == PythonEvalType.SQL_SCALAR_PANDAS_UDF: ... elif self.evalType == PythonEvalType.SQL_GROUPED_MAP_PANDAS_UDF: ... elif self.evalType == PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF: ... ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20525#discussion_r166834243 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -190,9 +190,13 @@ object FileFormatWriter extends Logging { global = false, child = plan).execute() } - val ret = new Array[WriteTaskResult](rdd.partitions.length) + + // SPARK-23271 If we are attempting to write a zero partition rdd, change the number of + // partition to 1 to make sure we at least set up one write task to write the metadata. + val finalRdd = if (rdd.partitions.length == 0) rdd.repartition(1) else rdd --- End diff -- @cloud-fan @jiangxb1987 Thanks a **LOT**. This works perfectly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20531: [SPARK-23352][PYTHON] Explicitly specify supporte...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/20531#discussion_r166832153 --- Diff: python/pyspark/sql/tests.py --- @@ -4562,14 +4585,14 @@ def test_basic(self): self.assertPandasEqual(expected4.toPandas(), result4.toPandas()) def test_unsupported_types(self): -from pyspark.sql.types import ArrayType, DoubleType, MapType +from pyspark.sql.types import DoubleType, MapType from pyspark.sql.functions import pandas_udf, PandasUDFType with QuietTest(self.sc): with self.assertRaisesRegexp(NotImplementedError, 'not supported'): -@pandas_udf(ArrayType(DoubleType()), PandasUDFType.GROUPED_AGG) +@pandas_udf(ArrayType(ArrayType(TimestampType())), PandasUDFType.GROUPED_AGG) def mean_and_std_udf(v): --- End diff -- nit: should rename this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20516: [SPARK-23343][CORE][TEST] Increase the exception test fo...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20516 Does this PR add any value or fix any bugs? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20532 I'm also worried that if we want to sample more events in the future, we have to add more configs following this way, which doesn't sound like a perfect choice. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate e...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20532#discussion_r166833429 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -53,10 +53,21 @@ package object config { .booleanConf .createWithDefault(false) - private[spark] val EVENT_LOG_BLOCK_UPDATES = -ConfigBuilder("spark.eventLog.logBlockUpdates.enabled") - .booleanConf - .createWithDefault(false) + private[spark] val EVENT_LOG_BLOCK_UPDATES_FRACTION = +ConfigBuilder("spark.eventLog.logBlockUpdates.fraction") + .doc("Expected number of times each blockUpdated event is chosen to log, " + +"fraction must be [0, 1]. 0 by default, means disabled") + .doubleConf + .checkValue(_ >= 0, "The fraction must not be negative") --- End diff -- It is actually hard for users to set a sample ratio to balance the event log size and analysis requirement, how about control the max number of events recorded per time split? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/690/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19077 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20532: [SPARK-23353][CORE] Allow ExecutorMetricsUpdate events t...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20532 Emmm... in case we want to sample more events, does that means we shall add a new config for each event sampling? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r166832418 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/HeapMemoryAllocator.java --- @@ -46,9 +47,10 @@ private boolean shouldPool(long size) { @Override public MemoryBlock allocate(long size) throws OutOfMemoryError { -if (shouldPool(size)) { --- End diff -- I aggree with you . I hava updated,thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19077 **[Test build #87190 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87190/testReport)** for PR 19077 at commit [`ad211b8`](https://github.com/apache/spark/commit/ad211b860ae4869aca8be747924578c944ba8974). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20477: [SPARK-23303][SQL] improve the explain result for data s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20477 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/689/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/688/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20509: [SPARK-23268][SQL][followup] Reorganize packages in data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20509 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org