[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23218 Surprisingly, all of three are due to consistent JVM crashes. It seems that Scala 2.12.8 or Spark has some unstable code somewhere. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99679/consoleFull ``` [info] - SPARK-17641: collect functions should not collect null values (231 milliseconds) 10:51:04.251 WARN org.apache.spark.sql.execution.window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. 10:51:04.262 WARN org.apache.spark.sql.execution.window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fa843744e44, pid=116353, tid=140360030242560 ``` - https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4451/consoleFull ``` [info] - read from textfile (508 milliseconds) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f60ec641e44, pid=40380, tid=140053491689216 # ``` - https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4452/consoleFull ``` [info] - SPARK-21996 read from text files generated by file sink -- file name has space (532 milliseconds) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f399e84ee44, pid=106264, tid=139883238606592 # ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 yea, it seems its longer by ~4 times; ``` 23:25:43.880 WARN org.apache.spark.sql.SQLQueryTestSuite: === Codegen/Interpreter Time Metrics === Total time: 602.64531157 seconds Configs Run Time (seconds) spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=NO_CODEGEN 156414789416 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY 138343055840 spark.sql.codegen.wholeStage=true,spark.sql.codegen.factoryMode=CODEGEN_ONLY 171905020550 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN 135982445764 ``` https://github.com/apache/spark/commit/7a69e0b6700fc5c7ad3acef35137f220b8804fd6 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23203: [SPARK-26252][PYTHON] Add support to run specific...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23203 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23203 Thank you @cloud-fan, @viirya, @srowen, and @BryanCutler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23203 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 I'm looking into that now ;) Just give me more time to check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/23222 We can compare the plans and see whether the rule takes an effect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23222 That PR also added an end-to-end test, does this mean that test is not valid? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to transp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r238950241 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -734,6 +734,28 @@ object CollapseWindow extends Rule[LogicalPlan] { } } +/** + * Transpose Adjacent Window Expressions. --- End diff -- why is this rule useful? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 do you know how long `SQLQueryTestSuite` takes? We are making it longer by 4 times here, so better to know the overhead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238949362 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- We usually don't write a migration guide for perf optimizations. Otherwise it's annoying to write one for each optimization and ask users to turn it off if something goes wrong. I think we only do that when there are known issues. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [MINOR][SQL][TEST] WholeStageCodegen metrics should be t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23224 **[Test build #99699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99699/testReport)** for PR 23224 at commit [`021728c`](https://github.com/apache/spark/commit/021728ccc70cf971592c560cfc5492dedbdc362a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [MINOR][SQL][TEST] WholeStageCodegen metrics should be t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23224 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23224: [MINOR][SQL][TEST] WholeStageCodegen metrics should be t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23224 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23224: [MINOR][SQL][TEST] WholeStageCodegen metrics shou...
GitHub user seancxmao opened a pull request: https://github.com/apache/spark/pull/23224 [MINOR][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled ## What changes were proposed in this pull request? In `org.apache.spark.sql.execution.metric.SQLMetricsSuite`, there's a test case named "WholeStageCodegen metrics". However, it is executed with whole-stage codegen disabled. This PR fixes this by enable whole-stage codegen for this test case. ## How was this patch tested? Tested locally using exiting test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/seancxmao/spark codegen-metrics Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23224.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23224 commit 021728ccc70cf971592c560cfc5492dedbdc362a Author: seancxmao Date: 2018-12-05T06:28:02Z [MINOR][SQL][TEST] WholeStageCodegen metrics should be tested with whole-stage codegen enabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238944485 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala --- @@ -186,6 +186,82 @@ abstract class OrcSuite extends OrcTest with BeforeAndAfterAll { } } + protected def testORCTableLocation(isConvertMetastore: Boolean): Unit = { --- End diff -- Since this test helper function is only used in `HiveOrcSourceSuite`, can we move this into `HiveOrcSourceSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicates and R...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/23211 @wangyum Thanks.. Can you please tell me how you generate this ? Also, is it possible to get runtimes of these queries to see if there are any regressions ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238944132 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,51 @@ class HiveDDLSuite )) } } + + test("SPARK-25993 Add test cases for resolution of Parquet table location") { +withTempPath { path => +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).toDF("c1", "c2", "c3").repartition(1) +withTable("tbl1", "tbl2", "tbl3") { +val dataDir = s"${path.getCanonicalPath}/l3/l2/l1/" +val parentDir = s"${path.getCanonicalPath}/l3/l2/" +val l3Dir = s"${path.getCanonicalPath}/l3/" +val wildcardParentDir = new File(s"${path}/l3/l2/*").toURI +val wildcardL3Dir = new File(s"${path}/l3/*").toURI +someDF1.write.parquet(dataDir) +val parentDirStatement = + s""" + |CREATE EXTERNAL TABLE tbl1( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${parentDir}'""".stripMargin +sql(parentDirStatement) +checkAnswer(sql("select * from tbl1"), Nil) + +val wildcardStatement = + s""" + |CREATE EXTERNAL TABLE tbl2( + | c1 int, + | c2 int, + | c3 string) + |STORED AS parquet + |LOCATION '${wildcardParentDir}'""".stripMargin +sql(wildcardStatement) +checkAnswer(sql("select * from tbl2"), + (1 to 2).map(i => Row(i, i, s"parq$i"))) + +val wildcardL3Statement = +s""" --- End diff -- indentation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238944067 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,51 @@ class HiveDDLSuite )) } } + + test("SPARK-25993 Add test cases for resolution of Parquet table location") { +withTempPath { path => +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).toDF("c1", "c2", "c3").repartition(1) --- End diff -- Indentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238944097 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,51 @@ class HiveDDLSuite )) } } + + test("SPARK-25993 Add test cases for resolution of Parquet table location") { +withTempPath { path => +val someDF1 = Seq((1, 1, "parq1"), (2, 2, "parq2")).toDF("c1", "c2", "c3").repartition(1) +withTable("tbl1", "tbl2", "tbl3") { +val dataDir = s"${path.getCanonicalPath}/l3/l2/l1/" --- End diff -- indentation? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99695/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238943983 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,51 @@ class HiveDDLSuite )) } } + + test("SPARK-25993 Add test cases for resolution of Parquet table location") { --- End diff -- Also, for the full test coverage, can we have the following combination like ORC, too? ``` Seq(true, false).foreach { convertMetastore => ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99695/testReport)** for PR 22683 at commit [`235b2fb`](https://github.com/apache/spark/commit/235b2fbf20dae9c7a2177992b24765085fb2f221). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238943694 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala --- @@ -190,4 +190,12 @@ class HiveOrcSourceSuite extends OrcSuite with TestHiveSingleton { } } } + + test("SPARK-25993 Add test cases for resolution of ORC table location") { --- End diff -- Please change this to `CREATE EXTERNAL TABLE with subdirectories`, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238943607 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,51 @@ class HiveDDLSuite )) } } + + test("SPARK-25993 Add test cases for resolution of Parquet table location") { --- End diff -- Also, let's replace the test case name with `CREATE EXTERNAL TABLE with subdirectories`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99696/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99696/testReport)** for PR 22683 at commit [`4c4674e`](https://github.com/apache/spark/commit/4c4674e1abfa28a01d733f4ae60039410e769fc8). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23108#discussion_r238943270 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -2370,4 +2370,51 @@ class HiveDDLSuite )) } } + + test("SPARK-25993 Add test cases for resolution of Parquet table location") { --- End diff -- Maybe, `HiveParquetSourceSuite`? That's the similar one with `OrcSourceSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: Yarnallocator should have same blacklist behaviour with ...
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/23223 ping @attilapiros @vanzin @jerryshao for kindly review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: [SPARK-26269][YARN]Yarnallocator should have same blackl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23223: Yarnallocator should have same blacklist behaviour with ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23223 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23223: Yarnallocator should have same blacklist behaviou...
GitHub user Ngone51 opened a pull request: https://github.com/apache/spark/pull/23223 Yarnallocator should have same blacklist behaviour with yarn to maxmize use of cluster resource ## What changes were proposed in this pull request? As I mentioned in jira [SPARK-26269](https://issues.apache.org/jira/browse/SPARK-26269), in order to maxmize the use of cluster resource, this pr try to make `YarnAllocator` have the same blacklist behaviour with YARN. ## How was this patch tested? Added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ngone51/spark dev-YarnAllocator-should-have-same-blacklist-behaviour-with-YARN Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23223.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23223 commit 9f88e1c22876e4cdb1a0a6e952930e76f3206e96 Author: wuyi Date: 2018-12-04T16:17:35Z YarnAllocator should have same blacklist behaviour with YARN commit 65a70dcbb7993731104deab2592a5b969a31414e Author: Ngone51 Date: 2018-12-05T06:11:06Z fix ut --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99693/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicates and R...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23211 I generated the TPC-DS plans to compare the differences after this patch to help review: https://github.com/wangyum/spark/commit/7e7a1fe24e8970830c67f80604ce238caa035b85#diff-1a4e6beba801fa647e1dcbd61ed7e5bf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22683 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99693/testReport)** for PR 22683 at commit [`8f11891`](https://github.com/apache/spark/commit/8f11891396d47ee9f404283e30922f9f16bc612a). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99694/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #99694 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99694/testReport)** for PR 22514 at commit [`57fc943`](https://github.com/apache/spark/commit/57fc94383ad3c66e5b93f40378d8c94aaa726e7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99692/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99692/testReport)** for PR 23213 at commit [`808af50`](https://github.com/apache/spark/commit/808af50d756583bd69b7dd7ca1e1ae09d2457b41). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5747/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238933039 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- Since the regression was already introduced, we need to add a conf and migration guide. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23222 **[Test build #99698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99698/testReport)** for PR 23222 at commit [`1270e89`](https://github.com/apache/spark/commit/1270e89026d80c862137c03edbeee53e56f3ed6d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23222: [SPARK-20636] Add the rule TransposeWindow to the optimi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/23222 cc @ptkool @jiangxb1987 @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23222: [SPARK-20636] Add the rule TransposeWindow to the...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/23222 [SPARK-20636] Add the rule TransposeWindow to the optimization batch ## What changes were proposed in this pull request? This PR is a follow-up of the PR https://github.com/apache/spark/pull/17899. It is to add the rule the optimizer batch. ## How was this patch tested? The existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark followupSPARK-20636 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23222 commit 1270e89026d80c862137c03edbeee53e56f3ed6d Author: gatorsmile Date: 2018-12-05T05:07:00Z add the rule TransposeWindow to the batch --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23120 Hi @MaxGekk , since this changes the result(although makes it better), do you mind adding a migration guide? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99690/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22721 **[Test build #99690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99690/testReport)** for PR 22721 at commit [`c91c154`](https://github.com/apache/spark/commit/c91c15493b30e49e81fbf9097b37bf0b4bdafc79). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99688/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERN...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23108 **[Test build #99688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99688/testReport)** for PR 23108 at commit [`fe472c8`](https://github.com/apache/spark/commit/fe472c81a21700ff52c84808437b85d02d6871ed). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99691/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23221 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23221: [SPARK-24243][CORE] Expose exceptions from InProcessAppH...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23221 **[Test build #99691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99691/testReport)** for PR 23221 at commit [`e58fc91`](https://github.com/apache/spark/commit/e58fc919355c48d2d3b1cacb4d0ee18036cacbc6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99697/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23203 **[Test build #99697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99697/testReport)** for PR 23203 at commit [`bd23e01`](https://github.com/apache/spark/commit/bd23e01078deb90bcdba654ff82047603a462b2e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user DaveDeCaprio commented on the issue: https://github.com/apache/spark/pull/23169 @HeartSaVioR I added tests for the default case and for a truncated plan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99686/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99686/testReport)** for PR 23169 at commit [`22fe117`](https://github.com/apache/spark/commit/22fe117656ea004757efaffd847f81dc01df8433). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23203 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5746/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23203 **[Test build #99697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99697/testReport)** for PR 23203 at commit [`bd23e01`](https://github.com/apache/spark/commit/bd23e01078deb90bcdba654ff82047603a462b2e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 Thanks @vanzin @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99696/testReport)** for PR 22683 at commit [`4c4674e`](https://github.com/apache/spark/commit/4c4674e1abfa28a01d733f4ae60039410e769fc8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99689/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22721 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22721 **[Test build #99689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99689/testReport)** for PR 22721 at commit [`c601b67`](https://github.com/apache/spark/commit/c601b674ec1c0e288c0b3852dcdb511c64bfa6a5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5745/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #99694 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99694/testReport)** for PR 22514 at commit [`57fc943`](https://github.com/apache/spark/commit/57fc94383ad3c66e5b93f40378d8c94aaa726e7a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99695/testReport)** for PR 22683 at commit [`235b2fb`](https://github.com/apache/spark/commit/235b2fbf20dae9c7a2177992b24765085fb2f221). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #99693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99693/testReport)** for PR 22683 at commit [`8f11891`](https://github.com/apache/spark/commit/8f11891396d47ee9f404283e30922f9f16bc612a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238909822 --- Diff: core/src/main/scala/org/apache/spark/shuffle/metrics.scala --- @@ -50,3 +50,57 @@ private[spark] trait ShuffleWriteMetricsReporter { private[spark] def decBytesWritten(v: Long): Unit private[spark] def decRecordsWritten(v: Long): Unit } + + +/** + * A proxy class of ShuffleWriteMetricsReporter which proxy all metrics updating to the input + * reporters. + */ +private[spark] class GroupedShuffleWriteMetricsReporter( --- End diff -- For the write metrics, it's different. It's the default one calls the SQL one, which needs to hack the default one to register external reporters. Maybe we should not change the read side, just create a special `PairShuffleWriteMetricsReporter` to update both the SQL reporter and default reporter. Another idea is, `ShuffleDependency` carries a `reporter => reporter` function, instead of a reporter. Then we can create a SQL reporter which takes another reporter(similar to read side), and put the SQL reporter's constructor in `ShuffleDependency`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238909363 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238908877 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- I don't mind to add `HiveUtils.CONVERT_METASTORE_ORC_CTAS`, maybe we can do it in a followup? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23210: [SPARK-26233][SQL] CheckOverflow when encoding a decimal...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23210 a late LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23218 Hm, one failure was due to a JVM crash, but it fails twice consistent, with sbt just exiting with status 134. No other failures are logged. Not sure what to make of that! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99692/testReport)** for PR 23213 at commit [`808af50`](https://github.com/apache/spark/commit/808af50d756583bd69b7dd7ca1e1ae09d2457b41). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5744/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21486: [SPARK-24387][Core] Heartbeat-timeout executor is added ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21486 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238905795 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala --- @@ -53,6 +55,133 @@ class ExplainSuite extends QueryTest with SharedSQLContext { checkKeywordsExistsInExplain(df, keywords = "InMemoryRelation", "StorageLevel(disk, memory, deserialized, 1 replicas)") } + + test("optimized plan should show the rewritten aggregate expression") { +withTempView("test_agg") { + sql( +""" + |CREATE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES + | (1, true), (1, false), + | (2, true), + | (3, false), (3, null), + | (4, null), (4, null), + | (5, null), (5, true), (5, false) AS test_agg(k, v) +""".stripMargin) + + // simple explain of queries having every/some/any aggregates. Optimized + // plan should show the rewritten aggregate expression. + val df = sql("SELECT k, every(v), some(v), any(v) FROM test_agg GROUP BY k") + checkKeywordsExistsInExplain(df, +"Aggregate [k#x], [k#x, min(v#x) AS every(v)#x, max(v#x) AS some(v)#x, " + --- End diff -- I forgot to set true at extended in explain... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23216: [SPARK-26264][CORE]It is better to add @transient...
Github user 10110346 closed the pull request at: https://github.com/apache/spark/pull/23216 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/23216 Ok, I will close this PR, thank you very much --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23216 I think just leave it. The `@transient` in `ShuffleMapTasks`'s `locs` is just superfluous here, not sure it's worth changing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238902415 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- hmm, the optimization is already controlled by configs like `HiveUtils.CONVERT_METASTORE_ORC` and `HiveUtils.CONVERT_METASTORE_PARQUET`. Do we need another config for it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23217 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23217 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixe...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238899777 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- updated! Thanks, guys! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238899698 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -181,62 +180,39 @@ case class RelationConversions( conf: SQLConf, sessionCatalog: HiveSessionCatalog) extends Rule[LogicalPlan] { private def isConvertible(relation: HiveTableRelation): Boolean = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) -serde.contains("parquet") && conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || - serde.contains("orc") && conf.getConf(HiveUtils.CONVERT_METASTORE_ORC) +isConvertible(relation.tableMeta) } - // Return true for Apache ORC and Hive ORC-related configuration names. - // Note that Spark doesn't support configurations like `hive.merge.orcfile.stripe.level`. - private def isOrcProperty(key: String) = -key.startsWith("orc.") || key.contains(".orc.") - - private def isParquetProperty(key: String) = -key.startsWith("parquet.") || key.contains(".parquet.") - - private def convert(relation: HiveTableRelation): LogicalRelation = { -val serde = relation.tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) - -// Consider table and storage properties. For properties existing in both sides, storage -// properties will supersede table properties. -if (serde.contains("parquet")) { - val options = relation.tableMeta.properties.filterKeys(isParquetProperty) ++ -relation.tableMeta.storage.properties + (ParquetOptions.MERGE_SCHEMA -> - conf.getConf(HiveUtils.CONVERT_METASTORE_PARQUET_WITH_SCHEMA_MERGING).toString) - sessionCatalog.metastoreCatalog -.convertToLogicalRelation(relation, options, classOf[ParquetFileFormat], "parquet") -} else { - val options = relation.tableMeta.properties.filterKeys(isOrcProperty) ++ -relation.tableMeta.storage.properties - if (conf.getConf(SQLConf.ORC_IMPLEMENTATION) == "native") { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.execution.datasources.orc.OrcFileFormat], - "orc") - } else { -sessionCatalog.metastoreCatalog.convertToLogicalRelation( - relation, - options, - classOf[org.apache.spark.sql.hive.orc.OrcFileFormat], - "orc") - } -} + private def isConvertible(tableMeta: CatalogTable): Boolean = { +val serde = tableMeta.storage.serde.getOrElse("").toLowerCase(Locale.ROOT) +serde.contains("parquet") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_PARQUET) || + serde.contains("orc") && SQLConf.get.getConf(HiveUtils.CONVERT_METASTORE_ORC) } + private val metastoreCatalog = sessionCatalog.metastoreCatalog + override def apply(plan: LogicalPlan): LogicalPlan = { plan resolveOperators { // Write path case InsertIntoTable(r: HiveTableRelation, partition, query, overwrite, ifPartitionNotExists) // Inserting into partitioned table is not supported in Parquet/Orc data source (yet). if query.resolved && DDLUtils.isHiveTable(r.tableMeta) && !r.isPartitioned && isConvertible(r) => -InsertIntoTable(convert(r), partition, query, overwrite, ifPartitionNotExists) +InsertIntoTable(metastoreCatalog.convert(r), partition, + query, overwrite, ifPartitionNotExists) // Read path case relation: HiveTableRelation if DDLUtils.isHiveTable(relation.tableMeta) && isConvertible(relation) => -convert(relation) +metastoreCatalog.convert(relation) + + // CTAS + case CreateTable(tableDesc, mode, Some(query)) + if DDLUtils.isHiveTable(tableDesc) && tableDesc.partitionColumnNames.isEmpty && +isConvertible(tableDesc) => --- End diff -- It's not a new optimization... It's an optimization we dropped in 2.3 by mistake. I'm fine to add a config with default value true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user 10110346 commented on the issue: https://github.com/apache/spark/pull/23216 > > > Are you sure it's even a field in the class? it looks like it's only used to define this: > > ``` > @transient private[this] val preferredLocs: Seq[TaskLocation] = { > if (locs == null) Nil else locs.toSet.toSeq > } > ``` > > I'd expect Scala would not generate a field. Indeed the thing it is used to make is transient. Yeah, it would not generate a field, thanks @srowen By the way, is it better to remove `transient` for `ShuffleMapTask`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org