[GitHub] [spark] AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530889195 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110519/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530889190 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
SparkQA removed a comment on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530878720 **[Test build #110519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110519/testReport)** for PR 25743 at commit [`202f5ee`](https://github.com/apache/spark/commit/202f5eef963820af574bcdfad62da4e00255d8ba). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
SparkQA commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530889036 **[Test build #110519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110519/testReport)** for PR 25743 at commit [`202f5ee`](https://github.com/apache/spark/commit/202f5eef963820af574bcdfad62da4e00255d8ba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #25554: [SPARK-28796][DOC]Document DROP DATABASE statement in SQL Reference
dilipbiswal commented on issue #25554: [SPARK-28796][DOC]Document DROP DATABASE statement in SQL Reference URL: https://github.com/apache/spark/pull/25554#issuecomment-530887863 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
cloud-fan commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#discussion_r323816539 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala ## @@ -90,4 +90,30 @@ class DebuggingSuite extends SharedSparkSession { | id LongType: {} |""".stripMargin)) } + + test("Prints bytecode statistics in debugCodegen") { +Seq(("SELECT sum(v) FROM VALUES(1) t(v)", (0, 0)), + // We expect HashAggregate uses an inner class for fast hash maps + // in partial aggregates with keys. Review comment: I'd like to avoid end-to-end tests in this case. It's highly coupled with how we codegen these operators and is easy to break if we change the implementation in the future. Can we add some UT that calls `CodeGenerator.compile` directly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
cloud-fan commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#discussion_r323814030 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1336,11 +1347,13 @@ object CodeGenerator extends Logging { val codeAttr = Utils.classForName("org.codehaus.janino.util.ClassFile$CodeAttribute") val codeAttrField = codeAttr.getDeclaredField("code") codeAttrField.setAccessible(true) -val codeSizes = classes.flatMap { case (_, classBytes) => - CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.update(classBytes.length) +val codeStats = classes.map { case (_, classBytes) => Review comment: I would like to make the code more readable, by ``` val (classSizes, maxMethodSizes, constPoolSize) = classes.mapunzip3 ByteCodeStats( maxClassCodeSize = classSizes.max, maxMethodCodeSize = maxMethodSizes.max, maxConstPoolSize = constPoolSize.max, // Minus 2 for `GeneratedClass` and an outer-most generated class numInnerClasses = classSizes.size - 2) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE
cloud-fan commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25751#issuecomment-530882523 make sense, LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE
cloud-fan commented on a change in pull request #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25751#discussion_r323810510 ## File path: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ## @@ -2779,6 +2779,45 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi .contains("Spark cannot rollback the ShuffleMapStage 1")) } + test("SPARK-29042: Sampled RDD with unordered input should be indeterminate") { +val shuffleMapRdd1 = new MyRDD(sc, 2, Nil, indeterminate = false) + +val shuffleDep1 = new ShuffleDependency(shuffleMapRdd1, new HashPartitioner(2)) +val shuffleId1 = shuffleDep1.shuffleId +val shuffleMapRdd2 = new MyRDD(sc, 2, List(shuffleDep1), tracker = mapOutputTracker) + +assert(shuffleMapRdd2.outputDeterministicLevel == DeterministicLevel.UNORDERED) + +val sampledRdd = shuffleMapRdd2.sample(true, 0.3, 1000L) +assert(sampledRdd.outputDeterministicLevel == DeterministicLevel.INDETERMINATE) Review comment: I think we can stop here. We have enough test coverage for test rerun when the RDD is INDETERMINATE. We just need to prove that the sampled RDD with unordered input is INDETERMINATE This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] alaazbair edited a comment on issue #25682: [SPARK-28842][DOC]Cleanup the formatting/trailing spaces in the K8s integration testing guide
alaazbair edited a comment on issue #25682: [SPARK-28842][DOC]Cleanup the formatting/trailing spaces in the K8s integration testing guide URL: https://github.com/apache/spark/pull/25682#issuecomment-528546021 @holdenk Could you please review my PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#discussion_r323808715 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroDataSourceV2.scala ## @@ -35,7 +36,10 @@ class AvroDataSourceV2 extends FileDataSourceV2 { AvroTable(tableName, sparkSession, options, paths, None, fallbackFileFormat) } - override def getTable(options: CaseInsensitiveStringMap, schema: StructType): Table = { + override def getTable( + options: CaseInsensitiveStringMap, + schema: StructType, + partitions: Array[Transform]): Table = { Review comment: Or we can make table properties case insensitive. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] Support passing all Table metadata in TableProvider URL: https://github.com/apache/spark/pull/25651#discussion_r323808342 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroDataSourceV2.scala ## @@ -35,7 +36,10 @@ class AvroDataSourceV2 extends FileDataSourceV2 { AvroTable(tableName, sparkSession, options, paths, None, fallbackFileFormat) } - override def getTable(options: CaseInsensitiveStringMap, schema: StructType): Table = { + override def getTable( + options: CaseInsensitiveStringMap, + schema: StructType, + partitions: Array[Transform]): Table = { Review comment: But we do have a problem here. Table properties are case sensitive while scan options are case insensitive. Think about 2 cases: 1. `spark.read.format("myFormat").options(...).schema(...).load()`. We need to get the table with the user-specifed options and schema. When scan the table, we need to use the user-specified options as scan options. The problem is, `DataFrameReader.options` specifies both table properties and scan options in this case. 2. `CREATE TABLE t USING myFormat TABLEPROP ...` and then `spark.read.options(...).table("t")` In this case, `DataFrameReader.options` only specifies scan options. Ideally, `TableProvider.getTable` takes table properties which should be case sensitive. However, `DataFrameReader.options` also specifies scan options which should be case insensitive. I don't have a good idea now. Maybe it's OK to treat this as a special table which accepts case insensitive table properties. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE
viirya commented on issue #25751: [SPARK-29042][Core] Sampling-based RDD with unordered input should be INDETERMINATE URL: https://github.com/apache/spark/pull/25751#issuecomment-530879739 It is a problem in ML applications. In ML, sample is used to prepare training data. ML algorithm fits the model based on the sampled data. If rerun tasks of sample produce different output during model fitting, ML results will be unreliable and also buggy. Each sample is random output, but once you sampled, the output should be determinate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
SparkQA commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530878720 **[Test build #110519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110519/testReport)** for PR 25743 at commit [`202f5ee`](https://github.com/apache/spark/commit/202f5eef963820af574bcdfad62da4e00255d8ba). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530877843 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15494/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
AmplabJenkins removed a comment on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530877843 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15494/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
AmplabJenkins commented on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530877832 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
AmplabJenkins removed a comment on issue #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#issuecomment-530877832 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530870831 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110515/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530870821 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530870821 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
SparkQA removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530801110 **[Test build #110515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110515/testReport)** for PR 25766 at commit [`fa4234c`](https://github.com/apache/spark/commit/fa4234c0cbdb8aaeb1360d7565f6db5eebe87f30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530870831 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110515/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
SparkQA commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530870441 **[Test build #110515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110515/testReport)** for PR 25766 at commit [`fa4234c`](https://github.com/apache/spark/commit/fa4234c0cbdb8aaeb1360d7565f6db5eebe87f30). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ByteCodeStats(` * ` * Returns the bytecode statistics (max class bytecode size, max method bytecode size,` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mgaido91 commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
mgaido91 commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#discussion_r323786212 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala ## @@ -81,11 +82,14 @@ package object debug { def writeCodegen(append: String => Unit, plan: SparkPlan): Unit = { val codegenSeq = codegenStringSeq(plan) append(s"Found ${codegenSeq.size} WholeStageCodegen subtrees.\n") -for (((subtree, code), i) <- codegenSeq.zipWithIndex) { - append(s"== Subtree ${i + 1} / ${codegenSeq.size} ==\n") +for (((subtree, code, codeStats), i) <- codegenSeq.zipWithIndex) { + val codeStatsStr = s"maxClassCodeSize:${codeStats.maxClassCodeSize} " + Review comment: thank you This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on issue #25022: [SPARK-24695][SQL] Move `CalendarInterval` to org.apache.spark.sql.types package
kiszk commented on issue #25022: [SPARK-24695][SQL] Move `CalendarInterval` to org.apache.spark.sql.types package URL: https://github.com/apache/spark/pull/25022#issuecomment-530862520 ping @priyankagargnitk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING
cloud-fan commented on a change in pull request #25651: [SPARK-28948][SQL] support data source v2 in CREATE TABLE USING URL: https://github.com/apache/spark/pull/25651#discussion_r323785172 ## File path: external/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroDataSourceV2.scala ## @@ -35,7 +36,10 @@ class AvroDataSourceV2 extends FileDataSourceV2 { AvroTable(tableName, sparkSession, options, paths, None, fallbackFileFormat) } - override def getTable(options: CaseInsensitiveStringMap, schema: StructType): Table = { + override def getTable( + options: CaseInsensitiveStringMap, + schema: StructType, + partitions: Array[Transform]): Table = { Review comment: read options should be passed in `Table.newScanBuilder`. The `options` here is the table properties. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
AmplabJenkins removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-530859446 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
AmplabJenkins removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-530859453 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110512/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
AmplabJenkins commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-530859453 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110512/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
AmplabJenkins commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-530859446 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
SparkQA removed a comment on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-530767235 **[Test build #110512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110512/testReport)** for PR 25734 at commit [`1b145e2`](https://github.com/apache/spark/commit/1b145e2158679dc27fce07a8ddf17f6341175afe). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd
SparkQA commented on issue #25734: [SPARK-28939][SQL][2.4] Propagate SQLConf for plans executed by toRdd URL: https://github.com/apache/spark/pull/25734#issuecomment-530858897 **[Test build #110512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110512/testReport)** for PR 25734 at commit [`1b145e2`](https://github.com/apache/spark/commit/1b145e2158679dc27fce07a8ddf17f6341175afe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
AmplabJenkins removed a comment on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#issuecomment-530852507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15493/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
AmplabJenkins removed a comment on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#issuecomment-530852496 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
SparkQA commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#issuecomment-530853522 **[Test build #110518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110518/testReport)** for PR 25774 at commit [`2753af5`](https://github.com/apache/spark/commit/2753af5c2adbbb0c27decda3afb7a06e8ff1a31f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
AmplabJenkins commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#issuecomment-530852507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15493/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
AmplabJenkins commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#issuecomment-530852496 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
cloud-fan commented on issue #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#issuecomment-530850602 cc @brkyvz @rdblue @gengliangwang @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
cloud-fan commented on a change in pull request #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#discussion_r323770221 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -671,6 +671,15 @@ class Analyzer( case scala.Right(tableOpt) => tableOpt } v2TableOpt.map(DataSourceV2Relation.create).getOrElse(u) + + case i @ InsertIntoStatement(u: UnresolvedRelation, _, _, _, _) if i.query.resolved => Review comment: simpler to `ResolveRelations`, `ResolveTables` should handle both `UnresolvedRelation` and `InsertIntoStatement(UnresolvedRelation, ...)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
cloud-fan commented on a change in pull request #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774#discussion_r323770329 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -785,41 +794,28 @@ class Analyzer( object ResolveInsertInto extends Rule[LogicalPlan] { override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoStatement(u: UnresolvedRelation, _, _, _, _) if i.query.resolved => -lookupV2Relation(u.multipartIdentifier) match { - case scala.Left((_, _, Some(v2Table: Table))) => -resolveV2Insert(i, v2Table) - case scala.Right(Some(v2Table: Table)) => -resolveV2Insert(i, v2Table) - case _ => -i + case i @ InsertIntoStatement(r: DataSourceV2Relation, _, _, _, _) if i.query.resolved => +// ifPartitionNotExists is append with validation, but validation is not supported Review comment: just indentation changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan opened a new pull request #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup
cloud-fan opened a new pull request #25774: [SPARK-29069][SQL] ResolveInsertInto should not do table lookup URL: https://github.com/apache/spark/pull/25774 ### What changes were proposed in this pull request? It's more clear to only do table lookup in `ResolveTables` rule (for v2 tables) and `ResolveRelations` rule (for v1 tables). `ResolveInsertInto` should only resolve the `InsertIntoStatement` with resolved relations. ### Why are the changes needed? to make the code simpler ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530843065 **[Test build #110517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110517/testReport)** for PR 25690 at commit [`5b2766a`](https://github.com/apache/spark/commit/5b2766a2259e7f8b776f97e4b152562069796e18). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530842326 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15492/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530842315 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530842326 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15492/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530842315 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530839668 **[Test build #110516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110516/testReport)** for PR 25690 at commit [`5b2766a`](https://github.com/apache/spark/commit/5b2766a2259e7f8b776f97e4b152562069796e18). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
wangyum commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530839400 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530837299 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530837311 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110511/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530837311 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110511/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530837299 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming
srowen commented on issue #22282: [SPARK-23539][SS] Add support for Kafka headers in Structured Streaming URL: https://github.com/apache/spark/pull/22282#issuecomment-530836992 I think it's OK @dongjinleekr , just needs a rebase now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530836651 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110514/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530836638 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
SparkQA removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530759715 **[Test build #110511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110511/testReport)** for PR 25772 at commit [`766110a`](https://github.com/apache/spark/commit/766110ad07bc1a9911b80a179033df6ad9c924fb). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530836651 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110514/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
SparkQA commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530836603 **[Test build #110511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110511/testReport)** for PR 25772 at commit [`766110a`](https://github.com/apache/spark/commit/766110ad07bc1a9911b80a179033df6ad9c924fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
AmplabJenkins commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530836638 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
SparkQA commented on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530836226 **[Test build #110514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110514/testReport)** for PR 25690 at commit [`5b2766a`](https://github.com/apache/spark/commit/5b2766a2259e7f8b776f97e4b152562069796e18). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file
SparkQA removed a comment on issue #25690: [SPARK-27831][FOLLOW-UP][SQL][TEST][test-hadoop3.2] Move Hive test jars to local file URL: https://github.com/apache/spark/pull/25690#issuecomment-530788398 **[Test build #110514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110514/testReport)** for PR 25690 at commit [`5b2766a`](https://github.com/apache/spark/commit/5b2766a2259e7f8b776f97e4b152562069796e18). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #25770: [SPARK-29064][CORE] Add PrometheusResource to export Executor metrics
srowen commented on a change in pull request #25770: [SPARK-29064][CORE] Add PrometheusResource to export Executor metrics URL: https://github.com/apache/spark/pull/25770#discussion_r323752067 ## File path: core/src/main/scala/org/apache/spark/ui/SparkUI.scala ## @@ -66,6 +66,7 @@ private[spark] class SparkUI private ( addStaticHandler(SparkUI.STATIC_RESOURCE_DIR) attachHandler(createRedirectHandler("/", "/jobs/", basePath = basePath)) attachHandler(ApiRootResource.getServletHandler(this)) +attachHandler(PrometheusResource.getServletHandler(this)) Review comment: Should this be more optional, like only attached if one configures something to use Prometheus? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #25769: [SPARK-29032][CORE] Add PrometheusServlet to monitor Master/Worker/Driver
srowen commented on a change in pull request #25769: [SPARK-29032][CORE] Add PrometheusServlet to monitor Master/Worker/Driver URL: https://github.com/apache/spark/pull/25769#discussion_r323751789 ## File path: core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala ## @@ -43,6 +43,12 @@ private[spark] class MetricsConfig(conf: SparkConf) extends Logging { prop.setProperty("*.sink.servlet.path", "/metrics/json") prop.setProperty("master.sink.servlet.path", "/metrics/master/json") prop.setProperty("applications.sink.servlet.path", "/metrics/applications/json") + +prop.setProperty("*.sink.prometheusServlet.class", Review comment: Actually one last question - does this cause prometheusServlet to always be configured and available? I had thought this more opt-in, that it wouldn't be on by default unless configured. Just checking whether that's true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25764: [SPARK-29060][SQL] Add tree traversal helper for adaptive spark plans
cloud-fan commented on issue #25764: [SPARK-29060][SQL] Add tree traversal helper for adaptive spark plans URL: https://github.com/apache/spark/pull/25764#issuecomment-530835150 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #25764: [SPARK-29060][SQL] Add tree traversal helper for adaptive spark plans
cloud-fan closed pull request #25764: [SPARK-29060][SQL] Add tree traversal helper for adaptive spark plans URL: https://github.com/apache/spark/pull/25764 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #25769: [SPARK-29032][CORE] Add PrometheusServlet to monitor Master/Worker/Driver
srowen commented on issue #25769: [SPARK-29032][CORE] Add PrometheusServlet to monitor Master/Worker/Driver URL: https://github.com/apache/spark/pull/25769#issuecomment-530834494 @itsvikramagr the difference is that that change actually added Prometheus and all its dependencies to Spark. This just uses Prometheus if it's present. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25764: [SPARK-29060][SQL] Add tree traversal helper for adaptive spark plans
cloud-fan commented on a change in pull request #25764: [SPARK-29060][SQL] Add tree traversal helper for adaptive spark plans URL: https://github.com/apache/spark/pull/25764#discussion_r323748844 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanHelper.scala ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import org.apache.spark.sql.execution.SparkPlan + +/** + * This class provides utility methods related to tree traversal of an [[AdaptiveSparkPlanExec]] + * plan. Unlike their counterparts in [[org.apache.spark.sql.catalyst.trees.TreeNode]] or + * [[org.apache.spark.sql.catalyst.plans.QueryPlan]], these methods traverse down leaf nodes of + * adaptive plans, i.e., [[AdaptiveSparkPlanExec]] and [[QueryStageExec]]. + */ +trait AdaptiveSparkPlanHelper { + + /** + * Find the first [[SparkPlan]] that satisfies the condition specified by `f`. + * The condition is recursively applied to this node and all of its children (pre-order). + */ + def find(p: SparkPlan)(f: SparkPlan => Boolean): Option[SparkPlan] = if (f(p)) { +Some(p) + } else { +allChildren(p).foldLeft(Option.empty[SparkPlan]) { (l, r) => l.orElse(find(r)(f)) } + } + + /** + * Runs the given function on this node and then recursively on children. + * @param f the function to be applied to each node in the tree. + */ + def foreach(p: SparkPlan)(f: SparkPlan => Unit): Unit = { +f(p) +allChildren(p).foreach(foreach(_)(f)) + } + + /** + * Runs the given function recursively on children then on this node. + * @param f the function to be applied to each node in the tree. + */ + def foreachUp(p: SparkPlan)(f: SparkPlan => Unit): Unit = { +allChildren(p).foreach(foreachUp(_)(f)) +f(p) + } + + /** + * Returns a Seq containing the result of applying the given function to each + * node in this tree in a preorder traversal. + * @param f the function to be applied. + */ + def map[A](p: SparkPlan)(f: SparkPlan => A): Seq[A] = { +val ret = new collection.mutable.ArrayBuffer[A]() +foreach(p)(ret += f(_)) +ret + } + + /** + * Returns a Seq by applying a function to all nodes in this tree and using the elements of the + * resulting collections. + */ + def flatMap[A](p: SparkPlan)(f: SparkPlan => TraversableOnce[A]): Seq[A] = { +val ret = new collection.mutable.ArrayBuffer[A]() +foreach(p)(ret ++= f(_)) +ret + } + + /** + * Returns a Seq containing the result of applying a partial function to all elements in this + * tree on which the function is defined. + */ + def collect[B](p: SparkPlan)(pf: PartialFunction[SparkPlan, B]): Seq[B] = { +val ret = new collection.mutable.ArrayBuffer[B]() +val lifted = pf.lift +foreach(p)(node => lifted(node).foreach(ret.+=)) +ret + } + + /** + * Returns a Seq containing the leaves in this tree. + */ + def collectLeaves(p: SparkPlan): Seq[SparkPlan] = { +collect(p) { case plan if allChildren(plan).isEmpty => plan } + } + + /** + * Finds and returns the first [[SparkPlan]] of the tree for which the given partial function + * is defined (pre-order), and applies the partial function to it. + */ + def collectFirst[B](p: SparkPlan)(pf: PartialFunction[SparkPlan, B]): Option[B] = { +val lifted = pf.lift +lifted(p).orElse { + allChildren(p).foldLeft(Option.empty[B]) { (l, r) => l.orElse(collectFirst(r)(pf)) } +} + } + + /** + * Returns a sequence containing the result of applying a partial function to all elements in this + * plan, also considering all the plans in its (nested) subqueries + */ + def collectInPlanAndSubqueries[B](p: SparkPlan)(f: PartialFunction[SparkPlan, B]): Seq[B] = { +(p +: subqueriesAll(p)).flatMap(collect(_)(f)) + } + + /** + * Returns a sequence containing the subqueries in this plan, also including the (nested) + * subquries in its children + */ + def subqueriesAll(p: SparkPlan): Seq[SparkPlan] = { Review comment: nvm, this consistent with `QueryPlan.subqueriesAll`
[GitHub] [spark] srowen commented on issue #25759: [SPARK-19147][CORE] netty throw NPE
srowen commented on issue #25759: [SPARK-19147][CORE] netty throw NPE URL: https://github.com/apache/spark/pull/25759#issuecomment-530829156 Also please improve the title of this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #25759: [SPARK-19147][CORE] netty throw NPE
srowen commented on a change in pull request #25759: [SPARK-19147][CORE] netty throw NPE URL: https://github.com/apache/spark/pull/25759#discussion_r323742872 ## File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java ## @@ -192,7 +192,20 @@ public TransportClient createClient(String remoteHost, int remotePort) logger.info("Found inactive connection to {}, creating a new one.", resolvedAddress); } } - clientPool.clients[clientIndex] = createClient(resolvedAddress); + try { +clientPool.clients[clientIndex] = createClient(resolvedAddress); + } catch (Exception e) { +// createClient() is called by task and close() is called by executor. +// When stop the executor, close() will set workerGroup = null, +// NPE will occur in createClient which generate many exception in log. +// For exception occurs after close(), treated it as an expected Exception +// and transform it to InterruptedException which can be processed by Executor. +// See SPARK-19147 +if (workerGroup == null) { + throw new InterruptedException(e.getMessage()); Review comment: This is still going to generate an exception in the logs, no? should it just be a log warning? This is I think too indirect. Why not throw `IllegalStateException` in `createClient` instead in this case and catch for it specifically? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk edited a comment on issue #25716: [SPARK-29012][SQL] Support special timestamp values
MaxGekk edited a comment on issue #25716: [SPARK-29012][SQL] Support special timestamp values URL: https://github.com/apache/spark/pull/25716#issuecomment-530826650 I have some performance related concerns regarding to using the config. In current implementation, decision is pretty cheap - just comparing first byte. In the case of the config usage, we will need to retrieve it and compare its value with other string which can bring visible overhead even if PostgreSQL compatibility mode is turned off here https://github.com/apache/spark/pull/25716/files#diff-da60f07e1826788aaeb07f295fae4b8aR223 Are you absolutely sure about using this config in the PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk edited a comment on issue #25716: [SPARK-29012][SQL] Support special timestamp values
MaxGekk edited a comment on issue #25716: [SPARK-29012][SQL] Support special timestamp values URL: https://github.com/apache/spark/pull/25716#issuecomment-530826650 I have some performance related concerns regarding to using the config. In current implementation, decision is pretty cheap - just comparing first byte. In the case of the config usage, we will need to retrieve it and compare its value with other string which can bring visible overhead even if PostgreSQL compatibility mode is turned off. Are you absolutely sure about using this config in the PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #25716: [SPARK-29012][SQL] Support special timestamp values
MaxGekk commented on issue #25716: [SPARK-29012][SQL] Support special timestamp values URL: https://github.com/apache/spark/pull/25716#issuecomment-530826650 I have some performance related concerns regarding to using the config. In current implementation, decision is pretty cheap - just comparing first byte. In the case of the config usage, we will need to retrieve it and compare its value with other string which can bring visible overhead even if PostgreSQL compatibility mode is turned off. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted
wangyum commented on a change in pull request #25743: [SPARK-29036][SQL]SparkThriftServer cancel job after execute() thread interrupted URL: https://github.com/apache/spark/pull/25743#discussion_r323740056 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala ## @@ -267,6 +267,9 @@ private[hive] class SparkExecuteStatementOperation( // Actually do need to catch Throwable as some failures don't inherit from Exception and // HiveServer will silently swallow them. case e: Throwable => +if (statementId != null) { Review comment: Could we add a comment explaining why we need this change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection
AmplabJenkins removed a comment on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection URL: https://github.com/apache/spark/pull/25754#issuecomment-530825209 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection
AmplabJenkins removed a comment on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection URL: https://github.com/apache/spark/pull/25754#issuecomment-530825216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110510/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection
AmplabJenkins commented on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection URL: https://github.com/apache/spark/pull/25754#issuecomment-530825216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110510/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection
AmplabJenkins commented on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection URL: https://github.com/apache/spark/pull/25754#issuecomment-530825209 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
cloud-fan commented on issue #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#issuecomment-530824963 LGTM, cc @rednaxelafx to take another look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection
SparkQA removed a comment on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection URL: https://github.com/apache/spark/pull/25754#issuecomment-530751633 **[Test build #110510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110510/testReport)** for PR 25754 at commit [`ab3e5d4`](https://github.com/apache/spark/commit/ab3e5d4e5119ec05553c9a2f8cdf3b6544f699ed). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection
SparkQA commented on issue #25754: [SPARK-29048] Improve performance on Column.isInCollection() with a large size collection URL: https://github.com/apache/spark/pull/25754#issuecomment-530824372 **[Test build #110510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110510/testReport)** for PR 25754 at commit [`ab3e5d4`](https://github.com/apache/spark/commit/ab3e5d4e5119ec05553c9a2f8cdf3b6544f699ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530812586 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530812593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110509/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530812593 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/110509/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
AmplabJenkins commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530812586 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
SparkQA removed a comment on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530737271 **[Test build #110509 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110509/testReport)** for PR 25772 at commit [`e54c41a`](https://github.com/apache/spark/commit/e54c41af5f0f7bde357c151dfd7ebdb060fda83a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
SparkQA commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530811832 **[Test build #110509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110509/testReport)** for PR 25772 at commit [`e54c41a`](https://github.com/apache/spark/commit/e54c41af5f0f7bde357c151dfd7ebdb060fda83a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r323718636 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ## @@ -419,4 +419,27 @@ class WholeStageCodegenSuite extends QueryTest with SharedSparkSession { } } } + + test("Give up splitting subexpression code if a parameter length goes over the limit") { +withSQLConf( +SQLConf.CODEGEN_SPLIT_AGGREGATE_FUNC.key -> "false", Review comment: Yea, we need to. If that flag is true, `HashAggregateExec` throws an exception in this test: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25716: [SPARK-29012][SQL] Support special timestamp values
maropu commented on issue #25716: [SPARK-29012][SQL] Support special timestamp values URL: https://github.com/apache/spark/pull/25716#issuecomment-530805052 How about holding this pr until this weekend for the @gengliangwang work? I personally think we don't have any reason to rush to merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
maropu commented on issue #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772#issuecomment-530803392 Thanks! Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark
maropu closed pull request #25772: [SPARK-29065][SQL][TEST] Extend `EXTRACT` benchmark URL: https://github.com/apache/spark/pull/25772 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530802868 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530802874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15491/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530802874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15491/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530802868 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
maropu commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#discussion_r323711557 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1353,19 +1370,17 @@ object CodeGenerator extends Logging { byteCodeSize } } -Some(stats) +(classCodeSize, methodCodeSizes.max, constPoolSize) } catch { case NonFatal(e) => logWarning("Error calculating stats of compiled class.", e) - None + (classCodeSize, -1, -1) } -}.flatten - -if (codeSizes.nonEmpty) { - codeSizes.max -} else { - 0 } + +ByteCodeStats(codeStats.reduce[(Int, Int, Int)] { case (v1, v2) => + (Math.max(v1._1, v2._1), Math.max(v1._2, v2._2), Math.max(v1._3, v2._3)) Review comment: Currently, this pr prints statistics per a whole-stage codegen entry, so the current one looks ok to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
maropu commented on a change in pull request #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#discussion_r323710328 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1353,19 +1370,17 @@ object CodeGenerator extends Logging { byteCodeSize } } -Some(stats) +(classCodeSize, methodCodeSizes.max, constPoolSize) Review comment: How about the latest code? I added a new metric (# of inner classes), so using a tuple in that part is ok? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
SparkQA commented on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530801110 **[Test build #110515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/110515/testReport)** for PR 25766 at commit [`fa4234c`](https://github.com/apache/spark/commit/fa4234c0cbdb8aaeb1360d7565f6db5eebe87f30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530800419 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/15490/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen
AmplabJenkins removed a comment on issue #25766: [SPARK-29061][SQL] Prints bytecode statistics in debugCodegen URL: https://github.com/apache/spark/pull/25766#issuecomment-530800415 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org