[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21866#discussion_r204987968 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends FileFormat with DataSourceRegister { spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] = { -val conf = spark.sparkContext.hadoopConfiguration +val conf = spark.sessionState.newHadoopConf() --- End diff -- Sure, will do it. Can I make it a simple one, check the appearance of `= spark.sparkContext.hadoopConfiguration` ? Otherwise there are too many usage for the `sparkContext.hadoopConfiguration` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21850 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93526/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93526/testReport)** for PR 21822 at commit [`75fb114`](https://github.com/apache/spark/commit/75fb1145fa84725fb931752906ece34ae6028290). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93527/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93527/testReport)** for PR 21822 at commit [`f2f1a97`](https://github.com/apache/spark/commit/f2f1a97e447a41e8b9b6c094376d32b32af00991). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21866 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21866 retest this please -- --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21729: [SPARK-24755][Core] Executor loss can cause task to not ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21729 For other reviewers, this is merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21866 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93528/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21866 **[Test build #93528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93528/testReport)** for PR 21866 at commit [`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93525/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21850 **[Test build #93525 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93525/testReport)** for PR 21850 at commit [`59fada7`](https://github.com/apache/spark/commit/59fada75fb59b1c3dabdac0a5d22b35c8f139a44). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21221: [SPARK-23429][CORE] Add executor memory metrics t...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/21221#discussion_r204976606 --- Diff: core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala --- @@ -251,6 +261,215 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit } } + /** + * Test stage executor metrics logging functionality. This checks that peak + * values from SparkListenerExecutorMetricsUpdate events during a stage are + * logged in a StageExecutorMetrics event for each executor at stage completion. + */ + private def testStageExecutorMetricsEventLogging() { +val conf = getLoggingConf(testDirPath, None) +val logName = "stageExecutorMetrics-test" +val eventLogger = new EventLoggingListener(logName, None, testDirPath.toUri(), conf) +val listenerBus = new LiveListenerBus(conf) + +// expected StageExecutorMetrics, for the given stage id and executor id +val expectedMetricsEvents: Map[(Int, String), SparkListenerStageExecutorMetrics] = + Map( +((0, "1"), + new SparkListenerStageExecutorMetrics("1", 0, 0, + Array(5000L, 50L, 50L, 20L, 50L, 10L, 100L, 30L, 70L, 20L))), +((0, "2"), + new SparkListenerStageExecutorMetrics("2", 0, 0, + Array(7000L, 70L, 50L, 20L, 10L, 10L, 50L, 30L, 80L, 40L))), +((1, "1"), + new SparkListenerStageExecutorMetrics("1", 1, 0, + Array(7000L, 70L, 50L, 30L, 60L, 30L, 80L, 55L, 50L, 0L))), +((1, "2"), + new SparkListenerStageExecutorMetrics("2", 1, 0, + Array(7000L, 70L, 50L, 40L, 10L, 30L, 50L, 60L, 40L, 40L + +// Events to post. +val events = Array( + SparkListenerApplicationStart("executionMetrics", None, +1L, "update", None), + createExecutorAddedEvent(1), + createExecutorAddedEvent(2), + createStageSubmittedEvent(0), + // receive 3 metric updates from each executor with just stage 0 running, + // with different peak updates for each executor + createExecutorMetricsUpdateEvent(1, + Array(4000L, 50L, 20L, 0L, 40L, 0L, 60L, 0L, 70L, 20L)), + createExecutorMetricsUpdateEvent(2, + Array(1500L, 50L, 20L, 0L, 0L, 0L, 20L, 0L, 70L, 0L)), + // exec 1: new stage 0 peaks for metrics at indexes: 2, 4, 6 + createExecutorMetricsUpdateEvent(1, + Array(4000L, 50L, 50L, 0L, 50L, 0L, 100L, 0L, 70L, 20L)), + // exec 2: new stage 0 peaks for metrics at indexes: 0, 4, 6 + createExecutorMetricsUpdateEvent(2, + Array(2000L, 50L, 10L, 0L, 10L, 0L, 30L, 0L, 70L, 0L)), + // exec 1: new stage 0 peaks for metrics at indexes: 5, 7 + createExecutorMetricsUpdateEvent(1, + Array(2000L, 40L, 50L, 0L, 40L, 10L, 90L, 10L, 50L, 0L)), + // exec 2: new stage 0 peaks for metrics at indexes: 0, 5, 6, 7, 8 + createExecutorMetricsUpdateEvent(2, + Array(3500L, 50L, 15L, 0L, 10L, 10L, 35L, 10L, 80L, 0L)), + // now start stage 1, one more metric update for each executor, and new + // peaks for some stage 1 metrics (as listed), initialize stage 1 peaks + createStageSubmittedEvent(1), + // exec 1: new stage 0 peaks for metrics at indexes: 0, 3, 7 --- End diff -- Are this comment and the one in line 322 correct? Shouldn't it say stage 1? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21869 **[Test build #93534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93534/testReport)** for PR 21869 at commit [`a45bf36`](https://github.com/apache/spark/commit/a45bf360d923e8b187f6579b4a73d24a9222198a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21869 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1301/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21869: [SPARK-24891][FOLLOWUP][HOT-FIX][2.3] Fix the Compilatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21869 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21869: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rul...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/21869 [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark testSPARK-24891 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21869.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21869 commit a45bf360d923e8b187f6579b4a73d24a9222198a Author: Xiao Li Date: 2018-07-25T04:04:25Z fix build failure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21853 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21853 Thank you very much @gatorsmile and @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21863 @gatorsmile Got it. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21049: [SPARK-23957][SQL] Remove redundant sort operator...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21049 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21853: [SPARK-23957][SQL] Sorts in subqueries are redund...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21853 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21803 yup the current change sounds okay. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21853: [SPARK-23957][SQL] Sorts in subqueries are redundant and...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21853 Ideally, this is a perfect fix. We can make it more general to remove all the unnecessary sorts during the query planning. However, this optimization is still nice to have since the sorts removed by this PR are not rare. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21834 Currently, no. Is it ok that the log level is `INFO`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r204972249 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager( } if (data != null) { -return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize)) +// SPARK-24307 undocumented "escape-hatch" in case there are any issues in converting to +// to ChunkedByteBuffer, to go back to old code-path. Can be removed post Spark 2.4 if +// new path is stable. +if (conf.getBoolean("spark.fetchToNioBuffer", false)) { --- End diff -- Since this condition is immutable, can we define a new variable whose value is assigned out of this method to reduce overhead? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21803 It is nice to have. Actually, I believe we need to fix the bug in `SHOW CREATE TABLE`, which is widely used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21863 @dilipbiswal This PR is to fix a message. It is nice to have. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21848: [SPARK-24890] [SQL] Short circuiting the `if` con...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21848 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21848 LGTM Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21848: [SPARK-24890] [SQL] Short circuiting the `if` con...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21848#discussion_r204971225 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -403,14 +404,14 @@ object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper { e.copy(branches = newBranches) } - case e @ CaseWhen(branches, _) if branches.headOption.map(_._1) == Some(TrueLiteral) => + case CaseWhen(branches, _) if branches.headOption.map(_._1).contains(TrueLiteral) => --- End diff -- Normally, we avoid adding unneeded refactoring in such a PR. Please avoid it next time. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21834 @maropu Do we have a log message for users to know the generated where clauses? If not, could you add one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21803 I am okay now just for clarification ~ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93523/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21848 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21848: [SPARK-24890] [SQL] Short circuiting the `if` condition ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21848 **[Test build #93523 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93523/testReport)** for PR 21848 at commit [`b4f1431`](https://github.com/apache/spark/commit/b4f143180adc0196aa16650efc399226b463699f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21803: [SPARK-24849][SPARK-24911][SQL] Converting a valu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21803#discussion_r204969836 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala --- @@ -155,6 +155,18 @@ package object util { def toPrettySQL(e: Expression): String = usePrettyExpression(e).sql + + def escapeSingleQuotedString(str: String): String = { --- End diff -- Why this is moved under `catalyst`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21818: [SPARK-24860][SQL] Support setting of partitionOv...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21818#discussion_r204969835 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1335,7 +1335,9 @@ object SQLConf { "overwriting. In dynamic mode, Spark doesn't delete partitions ahead, and only overwrite " + "those partitions that have data written into it at runtime. By default we use static " + "mode to keep the same behavior of Spark prior to 2.3. Note that this config doesn't " + -"affect Hive serde tables, as they are always overwritten with dynamic mode.") +"affect Hive serde tables, as they are always overwritten with dynamic mode. This can " + +"also be set as an output option for a data source using key partitionOverwriteMode, " + --- End diff -- Also need to explain the precedence between the option and sqlconf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93522/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21803 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21834 **[Test build #93532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93532/testReport)** for PR 21834 at commit [`1041a38`](https://github.com/apache/spark/commit/1041a38571eb4daf66a23d37d5bf51a1abb8d74c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93533/testReport)** for PR 21822 at commit [`f2f1a97`](https://github.com/apache/spark/commit/f2f1a97e447a41e8b9b6c094376d32b32af00991). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1300/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21803: [SPARK-24849][SPARK-24911][SQL] Converting a value of St...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21803 **[Test build #93522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93522/testReport)** for PR 21803 at commit [`738e97c`](https://github.com/apache/spark/commit/738e97cdc1801c95b8b9d87ad00c6c8aeaf0f20b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21834 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JDBC part...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21834 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1299/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93524/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93524/testReport)** for PR 21822 at commit [`abfd0a8`](https://github.com/apache/spark/commit/abfd0a8cd16c54fa19e7c66bfc2fb1f1c6b85a12). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait AnalysisHelper extends QueryPlan[LogicalPlan] ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20949 **[Test build #93531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93531/testReport)** for PR 20949 at commit [`025958a`](https://github.com/apache/spark/commit/025958a7d9e8a741875db2af8878f60cb07409d3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21818: [SPARK-24860][SQL] Support setting of partitionOverWrite...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21818 **[Test build #93530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93530/testReport)** for PR 21818 at commit [`a4ebf9d`](https://github.com/apache/spark/commit/a4ebf9dc21a44f784a2c1d884c2b396c95c664f0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21818: [SPARK-24860][SQL] Support setting of partitionOverWrite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1298/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21818: [SPARK-24860][SQL] Support setting of partitionOverWrite...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21818 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21866#discussion_r204967223 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends FileFormat with DataSourceRegister { spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] = { -val conf = spark.sparkContext.hadoopConfiguration +val conf = spark.sessionState.newHadoopConf() --- End diff -- @gengliangwang Please help this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rul...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21851 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21818: [SPARK-24860][SQL] Support setting of partitionOverWrite...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21818 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20949 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21851: [SPARK-24891][SQL] Fix HandleNullInputsForUDF rule
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21851 LGTM Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21542 This was reverted in favour of https://github.com/apache/spark/pull/21865 and SPARK-24895 for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21865: [SPARK-24895] Remove spotbugs plugin
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21865 Thank you all. I couldn't foresee this problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21868: [SPARK-24906][SQL] Adaptively enlarge split / partition ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21868 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...
GitHub user habren opened a pull request: https://github.com/apache/spark/pull/21868 [SPARK-24906][SQL] Adaptively enlarge split / partition size for Parq⦠Please refer to https://issues.apache.org/jira/browse/SPARK-24906 for more detail and test For columnar file, such as, when spark sql read the table, each split will be 128 MB by default since spark.sql.files.maxPartitionBytes is default to 128MB. Even when user set it to a large value, such as 512MB, the task may read only few MB or even hundreds of KB. Because the table (Parquet) may consists of dozens of columns while the SQL only need few columns. And spark will prune the unnecessary columns. In this case, spark DataSourceScanExec can enlarge maxPartitionBytes adaptively. For example, there is 40 columns , 20 are integer while another 20 are long. When use query on an integer type column and an long type column, the maxPartitionBytes should be 20 times larger. (20*4+20*8) / (4+8) = 20. With this optimization, the number of task will be smaller and the job will run faster. More importantly, for a very large cluster (more the 10 thousand nodes), it will relieve RM's schedule pressure. Here is the test The table named test2 has more than 40 columns and there are more than 5 TB data each hour. When we issue a very simple query ` select count(device_id) from test2 where date=20180708 and hour='23'` There are 72176 tasks and the duration of the job is 4.8 minutes Most tasks last less than 1 second and read less than 1.5 MB data After the optimization, there are only 1615 tasks and the job last only 30 seconds. It almost 10 times faster. The median of read data is 44.2MB. https://issues.apache.org/jira/browse/SPARK-24906 You can merge this pull request into a Git repository by running: $ git pull https://github.com/habren/spark SPARK-24906 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21868.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21868 commit 9ff34525e346e6e1cbe4b12fc6f972a163fd920e Author: éä¿ Date: 2018-07-25T02:07:38Z [SPARK-24906][SQL] Adaptively enlarge split / partition size for Parquet scan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21320 gentle ping @mallman since the code freeze is close --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21863 **[Test build #93529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93529/testReport)** for PR 21863 at commit [`e55d700`](https://github.com/apache/spark/commit/e55d7007fe7932f527347250f483f54dfde07355). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21863 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21863: [SPARK-18874][SQL][FOLLOW-UP] Improvement type mismatche...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1297/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21546#discussion_r204961977 --- Diff: python/pyspark/serializers.py --- @@ -184,27 +184,67 @@ def loads(self, obj): raise NotImplementedError -class ArrowSerializer(FramedSerializer): +class BatchOrderSerializer(Serializer): --- End diff -- Since you verifies the performance difference is trivial, I don't think it's a hard requirement to merge this though. At least, I would just push this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21866#discussion_r204961291 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends FileFormat with DataSourceRegister { spark: SparkSession, options: Map[String, String], files: Seq[FileStatus]): Option[StructType] = { -val conf = spark.sparkContext.hadoopConfiguration +val conf = spark.sessionState.newHadoopConf() --- End diff -- can we add a linter rule for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21546: [SPARK-23030][SQL][PYTHON] Use Arrow stream forma...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21546#discussion_r204960838 --- Diff: python/pyspark/serializers.py --- @@ -184,27 +184,67 @@ def loads(self, obj): raise NotImplementedError -class ArrowSerializer(FramedSerializer): +class BatchOrderSerializer(Serializer): --- End diff -- Ah, okay. I think I understood the benefit. But my impression is that this is something we already were doing. Also, if this is something we could apply to other functionalities too, then it sounded to me a bit of orthogonal work to do separately. Another concern is, for example, how much we'd likely hit this OOM because I usually expect the data for createDataFrame from Pandas DataFrame or toPandas is likely be small. If the changes were small, then it would have been okay to me but kind of large changes and looks affecting many codes from Scala side to Python side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21834: [SPARK-22814][SQL] Support Date/Timestamp in a JD...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21834#discussion_r204960461 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -1341,6 +1352,70 @@ class JDBCSuite extends QueryTest checkAnswer( sql("select name, theid from queryOption"), Row("fred", 1) :: Nil) + } + + test("SPARK-22814 support date/timestamp types in partitionColumn") { +val expectedResult = Seq( + ("2018-07-06", "2018-07-06 05:50:00.0"), + ("2018-07-06", "2018-07-06 08:10:08.0"), + ("2018-07-08", "2018-07-08 13:32:01.0"), + ("2018-07-12", "2018-07-12 09:51:15.0") +).map { case (date, timestamp) => + Row(Date.valueOf(date), Timestamp.valueOf(timestamp)) +} + +// DataType partition column +val df1 = spark.read.format("jdbc") + .option("url", urlWithUserAndPass) + .option("dbtable", "TEST.DATETIME") + .option("partitionColumn", "d") + .option("lowerBound", "2018-07-06") + .option("upperBound", "2018-07-20") + .option("numPartitions", 3) + .load() +df1.logicalPlan match { + case LogicalRelation(JDBCRelation(_, parts, _), _, _, _) => +val whereClauses = parts.map(_.asInstanceOf[JDBCPartition].whereClause).toSet +assert(whereClauses === Set( + D" < '2018-07-10' or "D" is null""", + D" >= '2018-07-10' AND "D" < '2018-07-14'""", + D" >= '2018-07-14'""")) +} +checkAnswer(df1, expectedResult) + +// TimestampType partition column +val df2 = spark.read.format("jdbc") + .option("url", urlWithUserAndPass) + .option("dbtable", "TEST.DATETIME") + .option("partitionColumn", "t") + .option("lowerBound", "2018-07-04 03:30:00.0") + .option("upperBound", "2018-07-27 14:11:05.0") + .option("numPartitions", 2) + .load() + +df2.logicalPlan match { + case LogicalRelation(JDBCRelation(_, parts, _), _, _, _) => +val whereClauses = parts.map(_.asInstanceOf[JDBCPartition].whereClause).toSet +assert(whereClauses === Set( + T" < '2018-07-15 20:50:32.5' or "T" is null""", --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21858: [SPARK-24899][SQL][DOC] Add example of monotonica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21858#discussion_r204959753 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1150,16 +1150,48 @@ object functions { /** * A column expression that generates monotonically increasing 64-bit integers. * - * The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. + * The generated IDs are guaranteed to be monotonically increasing and unique, but not + * consecutive (unless all rows are in the same single partition which you rarely want due to + * the volume of the data). * The current implementation puts the partition ID in the upper 31 bits, and the record number * within each partition in the lower 33 bits. The assumption is that the data frame has * less than 1 billion partitions, and each partition has less than 8 billion records. * - * As an example, consider a `DataFrame` with two partitions, each with 3 records. - * This expression would return the following IDs: - * * {{{ - * 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594. + * // Create a dataset with four partitions, each with two rows. + * val q = spark.range(start = 0, end = 8, step = 1, numPartitions = 4) + * + * // Make sure that every partition has the same number of rows + * q.mapPartitions(rows => Iterator(rows.size)).foreachPartition(rows => assert(rows.next == 2)) + * q.select(monotonically_increasing_id).show --- End diff -- eh @jaceklaskowski, wouldn't this one be enough as an example? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r204959300 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager( } if (data != null) { -return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize)) +// SPARK-24307 undocumented "escape-hatch" in case there are any issues in converting to +// to ChunkedByteBuffer, to go back to old code-path. Can be removed post Spark 2.4 if +// new path is stable. +if (conf.getBoolean("spark.fetchToNioBuffer", false)) { --- End diff -- can we have a better prefix, rather than just spark. ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93520/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21866 **[Test build #93528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93528/testReport)** for PR 21866 at commit [`cff6f2a`](https://github.com/apache/spark/commit/cff6f2a0459e8cc4e48f28bde8103ea44ce5a1ab). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21867 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21867 **[Test build #93520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93520/testReport)** for PR 21867 at commit [`bc2ea46`](https://github.com/apache/spark/commit/bc2ea46b291fe2aea6b9d254dc0fdb4e81f90ebd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1296/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21866: [SPARK-24768][FollowUp][SQL]Avro migration followup: cha...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21866 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204957474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -751,7 +751,8 @@ object TypeCoercion { */ case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { -override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = plan transform { case p => +override protected def coerceTypes( + plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p => --- End diff -- im using a weird wrapping here to minimize the diff. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21474: [SPARK-24297][CORE] Fetch-to-disk by default for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21474 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93527/testReport)** for PR 21822 at commit [`f2f1a97`](https://github.com/apache/spark/commit/f2f1a97e447a41e8b9b6c094376d32b32af00991). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21822 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1295/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 I changed the way we do the checks in test to use a thread local rather than checking the stacktrace, so they should run faster now. Also added test cases for the various new methods. Also moved the relevant code into AnalysisHelper for better code structure. This should be ready now if tests pass. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21822 **[Test build #93526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93526/testReport)** for PR 21822 at commit [`75fb114`](https://github.com/apache/spark/commit/75fb1145fa84725fb931752906ece34ae6028290). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21850 **[Test build #93525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93525/testReport)** for PR 21850 at commit [`59fada7`](https://github.com/apache/spark/commit/59fada75fb59b1c3dabdac0a5d22b35c8f139a44). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204955869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) + // TODO(rxin): Why do we need transformUp here? --- End diff -- @cloud-fan ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1294/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` when the...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21850 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org