[GitHub] spark pull request #16417: [SPARK-19014][SQL] support complex aggregate buff...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16417#discussion_r94109096 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -201,6 +210,25 @@ public void setNullAt(int i) { Platform.putLong(baseObject, getFieldOffset(i), 0); } + public void setNullData(int ordinal) { --- End diff -- Might be confused with `setNullAt`, add a short comment for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16432 **[Test build #70712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70712/testReport)** for PR 16432 at commit [`0c1ec23`](https://github.com/apache/spark/commit/0c1ec23626b74544e456778262030cee1c623b30). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16432: [SPARK-19021][YARN] Generailize HDFSCredentialPro...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/16432 [SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems Change-Id: I85d7963c4980cf9660f495f377ba27227da4b1b1 Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier. ## How was this patch tested? Manually verified in security cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-19021 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16432 commit 0c1ec23626b74544e456778262030cee1c623b30 Author: jerryshao Date: 2016-12-29T07:30:19Z Generailize HDFSCredentialProvider to support non HDFS security FS Change-Id: I85d7963c4980cf9660f495f377ba27227da4b1b1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...
Github user zdh2292390 commented on the issue: https://github.com/apache/spark/pull/16415 @jkbradley There is another problem : predErrorCheckpointer did not unpersist the RDD in the queue after the loop. There will be 2 RDD left cached after the loop . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94107668 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, +isFormatted: Boolean) + extends RunnableCommand { + + override val output: Seq[Attribute] = +// Column names are based on Hive. +if (isFormatted) { + Seq( +AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), +AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), +AttributeReference("min", StringType, nullable = true, + new MetadataBuilder().putString("comment", "min value of the column").build())(), +AttributeReference("max", StringType, nullable = true, + new MetadataBuilder().putString("comment", "max value of the column").build())(), +AttributeReference("num_nulls", StringType, nullable = true, + new MetadataBuilder().putString("comment", "number of nulls of the column").build())(), +AttributeReference("distinct_count", StringType, nullable = true, + new MetadataBuilder().putString("comment", "distinct count of the column").build())(), +AttributeReference("avg_col_len", StringType, nullable = true, + new MetadataBuilder().putString("comment", +"average length of the values of the column").build())(), +AttributeReference("max_col_len", StringType, nullable = true, + new MetadataBuilder().putString("comment", +"max length of the values of the column").build())(), +AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())()) +} else { + Seq( +AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), +AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), +AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())()) +} + + override def run(sparkSession: SparkSession): Seq[Row] = { +val result = new ArrayBuffer[Row] --- End diff -- yea I'll delete it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94107442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, --- End diff -- I will remove it since the result is same with or without `isExtended`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16417 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16417 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70706/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16417 **[Test build #70706 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70706/testReport)** for PR 16417 at commit [`8a5fe25`](https://github.com/apache/spark/commit/8a5fe253215eabc1f2686d02e4787c1fc692866f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94106620 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -300,10 +300,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// Describe column are not supported yet. Return null and let the parser decide -// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null) { - null + DescribeColumnCommand( --- End diff -- yes we should --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94106492 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, +isFormatted: Boolean) + extends RunnableCommand { + + override val output: Seq[Attribute] = +// Column names are based on Hive. --- End diff -- I got these names by running hive. I can't find any document about the names, but I'll add a link of the corresponding JIRA of Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16387 cc @JoshRosen Can you take a look? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...
Github user kiszk closed the pull request at: https://github.com/apache/spark/pull/13758 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 Issues, which this PR addresses, have been solved by other approaches (i.e. use `UnsafePrimitiveArray`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16431: [SPARK-19020] [SQL] Cardinality estimation of aggregate ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16431 **[Test build #70710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70710/testReport)** for PR 16431 at commit [`c064595`](https://github.com/apache/spark/commit/c0645952aef0455ac0ad6cefc6de9e624813c7bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16233 Seems it is good to know what issues need to be addressed before we can switch to this new approach. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16430 **[Test build #70711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70711/testReport)** for PR 16430 at commit [`d222020`](https://github.com/apache/spark/commit/d222020c8b225174724f3ec4ad1b1fee3c0128e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16431: [SPARK-19020] [SQL] Cardinality estimation of aggregate ...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16431 cc @rxin @hvanhovell @cloud-fan @srinathshankar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16431: [SPARK-19020] [SQL] Cardinality estimation of agg...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16431 [SPARK-19020] [SQL] Cardinality estimation of aggregate operator ## What changes were proposed in this pull request? Support cardinality estimation of aggregate operator ## How was this patch tested? Add test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark aggEstimation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16431.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16431 commit c0645952aef0455ac0ad6cefc6de9e624813c7bc Author: Zhenhua Wang Date: 2016-12-29T06:34:40Z agg estimation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70709/testReport)** for PR 16371 at commit [`965a4a9`](https://github.com/apache/spark/commit/965a4a93ddc79f4f34a7a32a3b3ef560ebb08665). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16228 **[Test build #70708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70708/testReport)** for PR 16228 at commit [`df839f8`](https://github.com/apache/spark/commit/df839f8d6143bd45f82b1beecf8ab151abe8ea35). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16371 **[Test build #70707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70707/testReport)** for PR 16371 at commit [`f60e824`](https://github.com/apache/spark/commit/f60e824e6913edb21ae5e0fa6207dcaccdf8de20). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16233 Changes look good to me. @gatorsmile @hvanhovell what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94104622 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +val relation = lookupTableFromCatalog(u, defaultDatabase) +resolveRelation(relation, defaultDatabase) + // Hive support is required to resolve a persistent view, the logical plan returned by + // catalog.lookupRelation() should be: --- End diff -- oh, we are not parsing view text when we do not have hive support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94104643 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -29,12 +30,10 @@ import org.apache.spark.sql.types._ /** * The Collect aggregate function collects all seen expression values into a list of values. * - * The operator is bound to the slower sort based aggregation path because the number of - * elements (and their memory usage) can not be determined in advance. This also means that the - * collected elements are stored on heap, and that too many elements can cause GC pauses and - * eventually Out of Memory Errors. + * We have to store all the collected elements in memory, and that too many elements can cause GC + * paused and eventually OutOfMemory Errors. */ -abstract class Collect extends ImperativeAggregate { +abstract class Collect[T] extends TypedImperativeAggregate[T] { --- End diff -- yeah, that's good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94103444 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default --- End diff -- oh, nvm, even if we have `database.viewname`, dbname and viewname will be inside the table identifier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94103404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default --- End diff -- btw, will we allow cases like `CREATE VIEW database.viewname`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70704/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15996 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #70704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70704/testReport)** for PR 14158 at commit [`83cbb58`](https://github.com/apache/spark/commit/83cbb588320d945d34dfd28e49eba90bcf63b064). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15996 LGTM. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16430 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16430 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70703/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16430 **[Test build #70703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70703/testReport)** for PR 16430 at commit [`12a48fa`](https://github.com/apache/spark/commit/12a48fa528bfcd8119251cd585bcef9dd555d3a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16233#discussion_r94102704 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -510,32 +510,93 @@ class Analyzer( * Replaces [[UnresolvedRelation]]s with concrete relations from the catalog. */ object ResolveRelations extends Rule[LogicalPlan] { -private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan = { + +// If the unresolved relation is running directly on files, we just return the original +// UnresolvedRelation, the plan will get resolved later. Else we look up the table from catalog +// and change the default database name if it is a view. +// We usually look up a table from the default database if the table identifier has an empty +// database part, for a view the default database should be the currentDb when the view was +// created. When the case comes to resolving a nested view, the view may have different default +// database with that the referenced view has, so we need to use the variable `defaultDatabase` +// to track the current default database. +// When the relation we resolve is a view, we fetch the view.desc(which is a CatalogTable), and +// then set the value of `CatalogTable.viewDefaultDatabase` to the variable `defaultDatabase`, +// we look up the relations that the view references using the default database. +// For example: +// |- view1 (defaultDatabase = db1) +// |- operator +// |- table2 (defaultDatabase = db1) +// |- view2 (defaultDatabase = db2) +//|- view3 (defaultDatabase = db3) +// |- view4 (defaultDatabase = db4) +// In this case, the view `view1` is a nested view, it directly references `table2`ã`view2` +// and `view4`, the view `view2` references `view3`. On resolving the table, we look up the +// relations `table2`ã`view2`ã`view4` using the default database `db1`, and look up the +// relation `view3` using the default database `db2`. +// +// Note this is compatible with the views defined by older versions of Spark(before 2.2), which +// have empty defaultDatabase and all the relations in viewText have database part defined. +def resolveRelation( +plan: LogicalPlan, +defaultDatabase: Option[String] = None): LogicalPlan = plan match { + case u @ UnresolvedRelation(table: TableIdentifier, _) if isRunningDirectlyOnFiles(table) => +u + case u: UnresolvedRelation => +val relation = lookupTableFromCatalog(u, defaultDatabase) +resolveRelation(relation, defaultDatabase) + // Hive support is required to resolve a persistent view, the logical plan returned by + // catalog.lookupRelation() should be: --- End diff -- Sorry. I probably missed something. A persistent view is stored in the external catalog. So, we can always have persistent views, right? (we have a InMemoryCatalog, which is another external catalog. It is not very useful though. But it is still an external catalog) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94102016 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -44,39 +43,52 @@ abstract class Collect extends ImperativeAggregate { override def dataType: DataType = ArrayType(child.dataType) - override def supportsPartial: Boolean = false - - override def aggBufferAttributes: Seq[AttributeReference] = Nil - - override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) - - override def inputAggBufferAttributes: Seq[AttributeReference] = Nil - // Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the // actual order of input rows. override def deterministic: Boolean = false - protected[this] val buffer: Growable[Any] with Iterable[Any] - - override def initialize(b: InternalRow): Unit = { -buffer.clear() - } - - override def update(b: InternalRow, input: InternalRow): Unit = { -// Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. -// See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator -val value = child.eval(input) -if (value != null) { - buffer += value + protected def _serialize(obj: Iterable[Any]): Array[Byte] = { --- End diff -- Also, the serialized format (child.dataType, LongType) of `Percentile` is different than `CollectList` and `CollectSet` (i.e., child.dataType). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70698/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14461 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14461 **[Test build #70698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70698/testReport)** for PR 14461 at commit [`d598e55`](https://github.com/apache/spark/commit/d598e55cbc60b53319742cca7b3843e155f68221). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94101640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, +isFormatted: Boolean) + extends RunnableCommand { + + override val output: Seq[Attribute] = +// Column names are based on Hive. +if (isFormatted) { + Seq( +AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), +AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), +AttributeReference("min", StringType, nullable = true, + new MetadataBuilder().putString("comment", "min value of the column").build())(), +AttributeReference("max", StringType, nullable = true, + new MetadataBuilder().putString("comment", "max value of the column").build())(), +AttributeReference("num_nulls", StringType, nullable = true, + new MetadataBuilder().putString("comment", "number of nulls of the column").build())(), +AttributeReference("distinct_count", StringType, nullable = true, + new MetadataBuilder().putString("comment", "distinct count of the column").build())(), +AttributeReference("avg_col_len", StringType, nullable = true, + new MetadataBuilder().putString("comment", +"average length of the values of the column").build())(), +AttributeReference("max_col_len", StringType, nullable = true, + new MetadataBuilder().putString("comment", +"max length of the values of the column").build())(), +AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())()) +} else { + Seq( +AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), +AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), +AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())()) +} + + override def run(sparkSession: SparkSession): Seq[Row] = { +val result = new ArrayBuffer[Row] +val catalog = sparkSession.sessionState.catalog +// Get the attribute referring to the given column +val attribute = sparkSession.sessionState.executePlan( --- End diff -- I don't get it, so you get the attribute just for column comment and name and data type? I think `CatalogTable.schema` already have this information. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r94101576 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -211,6 +211,55 @@ object JdbcUtils extends Logging { } /** + * Returns the schema if the table already exists in the JDBC database. + */ + def getSchema(conn: Connection, url: String, table: String): Option[StructType] = { +val dialect = JdbcDialects.get(url) + +try { + val statement = conn.prepareStatement(dialect.getSchemaQuery(table)) + try { +Some(getSchema(statement.executeQuery(), dialect)) --- End diff -- Then, we can keep the existing way. See https://github.com/apache/spark/compare/master...gatorsmile:pr-15664Changed1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94101475 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, +isFormatted: Boolean) + extends RunnableCommand { + + override val output: Seq[Attribute] = +// Column names are based on Hive. +if (isFormatted) { + Seq( +AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), +AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), +AttributeReference("min", StringType, nullable = true, + new MetadataBuilder().putString("comment", "min value of the column").build())(), +AttributeReference("max", StringType, nullable = true, + new MetadataBuilder().putString("comment", "max value of the column").build())(), +AttributeReference("num_nulls", StringType, nullable = true, + new MetadataBuilder().putString("comment", "number of nulls of the column").build())(), +AttributeReference("distinct_count", StringType, nullable = true, + new MetadataBuilder().putString("comment", "distinct count of the column").build())(), +AttributeReference("avg_col_len", StringType, nullable = true, + new MetadataBuilder().putString("comment", +"average length of the values of the column").build())(), +AttributeReference("max_col_len", StringType, nullable = true, + new MetadataBuilder().putString("comment", +"max length of the values of the column").build())(), +AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())()) +} else { + Seq( +AttributeReference("col_name", StringType, nullable = false, + new MetadataBuilder().putString("comment", "name of the column").build())(), +AttributeReference("data_type", StringType, nullable = false, + new MetadataBuilder().putString("comment", "data type of the column").build())(), +AttributeReference("comment", StringType, nullable = true, + new MetadataBuilder().putString("comment", "comment of the column").build())()) +} + + override def run(sparkSession: SparkSession): Seq[Row] = { +val result = new ArrayBuffer[Row] --- End diff -- why we create an `ArrayBuffer`? Doesn't it always return a single row? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94101455 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -44,39 +43,52 @@ abstract class Collect extends ImperativeAggregate { override def dataType: DataType = ArrayType(child.dataType) - override def supportsPartial: Boolean = false - - override def aggBufferAttributes: Seq[AttributeReference] = Nil - - override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) - - override def inputAggBufferAttributes: Seq[AttributeReference] = Nil - // Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the // actual order of input rows. override def deterministic: Boolean = false - protected[this] val buffer: Growable[Any] with Iterable[Any] - - override def initialize(b: InternalRow): Unit = { -buffer.clear() - } - - override def update(b: InternalRow, input: InternalRow): Unit = { -// Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. -// See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator -val value = child.eval(input) -if (value != null) { - buffer += value + protected def _serialize(obj: Iterable[Any]): Array[Byte] = { --- End diff -- We can generalize this to cover `Percentile`, but we may not use `T <: Growable[Any] with Iterable[Any]` but `T <: Iterable[Any]` then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94101445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, --- End diff -- where do we use `isExtended`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94101427 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like + * {{{ + * DESCRIBE [EXTENDED|FORMATTED] table_name column_name; + * }}} + */ +case class DescribeColumnCommand( +table: TableIdentifier, +column: String, +isExtended: Boolean, +isFormatted: Boolean) + extends RunnableCommand { + + override val output: Seq[Attribute] = +// Column names are based on Hive. --- End diff -- can you add a link to the hive spec about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16417 **[Test build #70706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70706/testReport)** for PR 16417 at commit [`8a5fe25`](https://github.com/apache/spark/commit/8a5fe253215eabc1f2686d02e4787c1fc692866f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70697/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94101415 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -586,6 +587,122 @@ case class DescribeTableCommand( } } +/** + * Command that looks like --- End diff -- please follow other commands and add more description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14204 **[Test build #70697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70697/testReport)** for PR 14204 at commit [`11fe868`](https://github.com/apache/spark/commit/11fe868569431c0bc957e08dfb551a7a37a8aaf0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16371#discussion_r94101411 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala --- @@ -44,39 +43,52 @@ abstract class Collect extends ImperativeAggregate { override def dataType: DataType = ArrayType(child.dataType) - override def supportsPartial: Boolean = false - - override def aggBufferAttributes: Seq[AttributeReference] = Nil - - override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) - - override def inputAggBufferAttributes: Seq[AttributeReference] = Nil - // Both `CollectList` and `CollectSet` are non-deterministic since their results depend on the // actual order of input rows. override def deterministic: Boolean = false - protected[this] val buffer: Growable[Any] with Iterable[Any] - - override def initialize(b: InternalRow): Unit = { -buffer.clear() - } - - override def update(b: InternalRow, input: InternalRow): Unit = { -// Do not allow null values. We follow the semantics of Hive's collect_list/collect_set here. -// See: org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator -val value = child.eval(input) -if (value != null) { - buffer += value + protected def _serialize(obj: Iterable[Any]): Array[Byte] = { --- End diff -- `Percentile` takes `OpenHashMap`, which has a bit different behavior than `Growable[]`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16422#discussion_r94101398 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -300,10 +300,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { * Create a [[DescribeTableCommand]] logical plan. */ override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan = withOrigin(ctx) { -// Describe column are not supported yet. Return null and let the parser decide -// what to do with this (create an exception or pass it on to a different system). if (ctx.describeColName != null) { - null + DescribeColumnCommand( --- End diff -- shall we throw exception here if partition spec is given? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 Because the actual improvement of this de-duplication depends on the complexity of disjunctive predicate pushdown and CTE subquery, if we can't have general rule to decide whether de-duplicate or not, I think can we have a config (default off) to enable/disable this subquery de-duplication. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16429 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70705/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16429 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16429 **[Test build #70705 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70705/testReport)** for PR 16429 at commit [`b688e89`](https://github.com/apache/spark/commit/b688e89ffa266fe4713e58b1052a2416f75cbb81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 I just checked each is fine in a concatenated log file - https://gist.github.com/HyukjinKwon/8851815ede9dcae80632a5378b74d1ae --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15664 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70695/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15664 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16427: [SPARK-19012][SQL] Fix `createTempViewCommand` to throw ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70693/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16427: [SPARK-19012][SQL] Fix `createTempViewCommand` to throw ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16427 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15664 **[Test build #70695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70695/testReport)** for PR 15664 at commit [`86ab6df`](https://github.com/apache/spark/commit/86ab6df26c052ef979e42006ae9d4159a01e83b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16427: [SPARK-19012][SQL] Fix `createTempViewCommand` to throw ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16427 **[Test build #70693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70693/testReport)** for PR 16427 at commit [`2ec0035`](https://github.com/apache/spark/commit/2ec0035f160047ba00cdb756b845c05486af9626). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16429 **[Test build #70705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70705/testReport)** for PR 16429 at commit [`b688e89`](https://github.com/apache/spark/commit/b688e89ffa266fe4713e58b1052a2416f75cbb81). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16429 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70701/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16429 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16429 **[Test build #70701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70701/testReport)** for PR 16429 at commit [`9ac9d01`](https://github.com/apache/spark/commit/9ac9d012fc16496a5bea2d660b791c69a39b5f81). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16430 cc @rxin @hvanhovell @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r94099067 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -211,6 +211,55 @@ object JdbcUtils extends Logging { } /** + * Returns the schema if the table already exists in the JDBC database. + */ + def getSchema(conn: Connection, url: String, table: String): Option[StructType] = { +val dialect = JdbcDialects.get(url) + +try { + val statement = conn.prepareStatement(dialect.getSchemaQuery(table)) + try { +Some(getSchema(statement.executeQuery(), dialect)) --- End diff -- yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16429 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70702/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16429 **[Test build #70702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70702/testReport)** for PR 16429 at commit [`f4c56c8`](https://github.com/apache/spark/commit/f4c56c8c0ae3a925e389dadc75243f0cbb551f62). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16429 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r94099004 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -211,6 +211,55 @@ object JdbcUtils extends Logging { } /** + * Returns the schema if the table already exists in the JDBC database. + */ + def getSchema(conn: Connection, url: String, table: String): Option[StructType] = { +val dialect = JdbcDialects.get(url) + +try { + val statement = conn.prepareStatement(dialect.getSchemaQuery(table)) + try { +Some(getSchema(statement.executeQuery(), dialect)) --- End diff -- For unsupported types, it will throw an `SQLException`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #70704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70704/testReport)** for PR 14158 at commit [`83cbb58`](https://github.com/apache/spark/commit/83cbb588320d945d34dfd28e49eba90bcf63b064). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16430 **[Test build #70703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70703/testReport)** for PR 16430 at commit [`12a48fa`](https://github.com/apache/spark/commit/12a48fa528bfcd8119251cd585bcef9dd555d3a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14158 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation projec...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16430 [SPARK-17077] [SQL] Cardinality estimation project operator ## What changes were proposed in this pull request? Support cardinality estimation for project operator. ## How was this patch tested? Add a test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark projectEstimation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16430.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16430 commit 12a48fa528bfcd8119251cd585bcef9dd555d3a3 Author: Zhenhua Wang Date: 2016-12-29T03:07:11Z project estimation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r94098773 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -211,6 +211,55 @@ object JdbcUtils extends Logging { } /** + * Returns the schema if the table already exists in the JDBC database. + */ + def getSchema(conn: Connection, url: String, table: String): Option[StructType] = { +val dialect = JdbcDialects.get(url) + +try { + val statement = conn.prepareStatement(dialect.getSchemaQuery(table)) + try { +Some(getSchema(statement.executeQuery(), dialect)) + } catch { +case _: SQLException => None + } finally { +statement.close() + } +} catch { + case _: SQLException => None +} + } + + /** + * Returns a schema using rddSchema's column sequence and tableSchema's column names. + * + * When appending data into some case-sensitive DBMSs like PostgreSQL/Oracle, we need to respect + * the existing case-sensitive column names instead of RDD column names for user convenience. + * See SPARK-18123 for more details. + */ + def normalizeSchema( + rddSchema: StructType, + tableSchema: StructType, + caseSensitive: Boolean): StructType = { +val nameMap = tableSchema.fields.map(f => f.name -> f).toMap +val lowercaseNameMap = tableSchema.fields.map(f => f.name.toLowerCase -> f).toMap + +var schema = new StructType() +rddSchema.fields.foreach { f => + if (nameMap.isDefinedAt(f.name)) { +// identical names +schema = schema.add(nameMap(f.name)) + } else if (!caseSensitive && lowercaseNameMap.isDefinedAt(f.name.toLowerCase)) { +// identical names in a case-insensitive way +schema = schema.add(lowercaseNameMap(f.name.toLowerCase)) + } else { +throw new AnalysisException(s"""Column "${f.name}" not found""") --- End diff -- The error message looks ambiguous. Maybe `Column "${f.name}" in RDD not found in table schema`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r94098761 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -568,10 +617,9 @@ object JdbcUtils extends Logging { conn.setAutoCommit(false) // Everything in the same db transaction. conn.setTransactionIsolation(finalIsolationLevel) } - val stmt = insertStatement(conn, table, rddSchema, dialect) - val setters: Array[JDBCValueSetter] = rddSchema.fields.map(_.dataType) -.map(makeSetter(conn, dialect, _)).toArray - val numFields = rddSchema.fields.length + val stmt = insertStatement(conn, table, schema, dialect) --- End diff -- This is my rough idea: https://github.com/apache/spark/compare/master...gatorsmile:pr-15664Changed1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16371 @cloud-fan @marmbrus ok. I will first address @hvanhovell's comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70700/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14158 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14158 **[Test build #70700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70700/testReport)** for PR 14158 at commit [`83cbb58`](https://github.com/apache/spark/commit/83cbb588320d945d34dfd28e49eba90bcf63b064). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...
Github user zdh2292390 commented on the issue: https://github.com/apache/spark/pull/16415 @jkbradley I test with MEMORY_ONLY and got the same problem: `(ExecutorLostFailure (executor 6 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 10.2 GB of 10 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.)` and it cost more than 1 hour. So my code should just change the storageLevel in predErrorCheckpointer ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r94098595 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -211,6 +211,55 @@ object JdbcUtils extends Logging { } /** + * Returns the schema if the table already exists in the JDBC database. + */ + def getSchema(conn: Connection, url: String, table: String): Option[StructType] = { +val dialect = JdbcDialects.get(url) + +try { + val statement = conn.prepareStatement(dialect.getSchemaQuery(table)) + try { +Some(getSchema(statement.executeQuery(), dialect)) --- End diff -- `getSchema` will throw an exception when the schema contains an unsupported type. Now we use it to check if the table exists. Does it change current behavior? E.g., the insertion working before now fails. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16371 +1 I think we can move forward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16371 I think we should support partial aggregation for collect, in the future we can define an interface to declare "should partial aggregate", so that a single collect function won't cause a 2-phase aggregation. I think they are orthogonal and we can go ahead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...
Github user zdh2292390 commented on the issue: https://github.com/apache/spark/pull/16415 @jkbradley Can I ask what other uses are for "The periodic checkpointer caches more because of some other use cases" ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13758 I think we can close this now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16422 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70691/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collections.name...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16429 **[Test build #70702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70702/testReport)** for PR 16429 at commit [`f4c56c8`](https://github.com/apache/spark/commit/f4c56c8c0ae3a925e389dadc75243f0cbb551f62). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16422 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16422 **[Test build #70691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70691/testReport)** for PR 16422 at commit [`d41a9cd`](https://github.com/apache/spark/commit/d41a9cd5980c5c9cadefd65a09d059ff8b9266e4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13909 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13909 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collections.name...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16429 **[Test build #70701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70701/testReport)** for PR 16429 at commit [`9ac9d01`](https://github.com/apache/spark/commit/9ac9d012fc16496a5bea2d660b791c69a39b5f81). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14461 @ajbozarth I finally have a chance to rebase it in the winter break. Could you please have a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14158 @ajbozarth I finally have a chance to rebase it in the winter break. Could you please have a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...
Github user nblintao commented on the issue: https://github.com/apache/spark/pull/14204 @ajbozarth I finally have a chance to rebase it in the winter break. Could you please have a look? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org