[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209333 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + var INFORMATION_SCHEMA = "information_schema" --- End diff -- Oh, thank you for review! Right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209363 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -401,7 +401,9 @@ class SessionCatalog( val db = formatDatabaseName(name.database.getOrElse(currentDb)) val table = formatTableName(name.table) val relation = -if (name.database.isDefined || !tempTables.contains(table)) { +if (db == "information_schema") { + tempTables(s"$db.$table") --- End diff -- Indeed. My bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209455 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + var INFORMATION_SCHEMA = "information_schema" + /** + * Register INFORMATION_SCHEMA database. + */ + def registerInformationSchema(sparkSession: SparkSession) { +sparkSession.sql("CREATE DATABASE IF NOT EXISTS information_schema") --- End diff -- Yep. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,337 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + var INFORMATION_SCHEMA = "information_schema" + /** + * Register INFORMATION_SCHEMA database. + */ + def registerInformationSchema(sparkSession: SparkSession) { --- End diff -- Right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -401,7 +401,9 @@ class SessionCatalog( val db = formatDatabaseName(name.database.getOrElse(currentDb)) val table = formatTableName(name.table) val relation = -if (name.database.isDefined || !tempTables.contains(table)) { +if (db == "information_schema") { + tempTables(s"$db.$table") --- End diff -- Ah, there is a reason. Catalyst can not see `InformationSchema`. `InformationSchema` is designed to be outside of catalyst. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70209581 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -425,7 +427,9 @@ class SessionCatalog( def tableExists(name: TableIdentifier): Boolean = synchronized { val db = formatDatabaseName(name.database.getOrElse(currentDb)) val table = formatTableName(name.table) -if (name.database.isDefined || !tempTables.contains(table)) { +if (db == "information_schema") { --- End diff -- Here, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14116 **[Test build #62081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62081/consoleFull)** for PR 14116 at commit [`34fe4ed`](https://github.com/apache/spark/commit/34fe4ed774e726000acb7af8c4e386619027dc17). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70210589 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -401,7 +401,9 @@ class SessionCatalog( val db = formatDatabaseName(name.database.getOrElse(currentDb)) val table = formatTableName(name.table) val relation = -if (name.database.isDefined || !tempTables.contains(table)) { +if (db == "information_schema") { + tempTables(s"$db.$table") --- End diff -- Then I guess the constant `InformationSchema.INFORMATION_SCHEMA` should live in `sql/catalyst` rather than in `sql/core`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14131 **[Test build #62076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62076/consoleFull)** for PR 14131 at commit [`4d6f654`](https://github.com/apache/spark/commit/4d6f6544be4373a32150fd6d59ba539d3fcb6aab). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14131 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62076/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14131 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70212106 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + val INFORMATION_SCHEMA = "information_schema" + /** + * Register INFORMATION_SCHEMA database. + */ + def registerInformationSchema(sparkSession: SparkSession): Unit = { +sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA") +registerView(sparkSession, new DatabasesRelationProvider, Seq("schemata", "databases")) +registerView(sparkSession, new TablesRelationProvider, Seq("tables")) +registerView(sparkSession, new ViewsRelationProvider, Seq("views")) +registerView(sparkSession, new ColumnsRelationProvider, Seq("columns")) +registerView(sparkSession, new SessionVariablesRelationProvider, Seq("session_variables")) + } + + /** + * Register a INFORMATION_SCHEMA relation provider as a temporary view of Spark Catalog. + */ + private def registerView( + sparkSession: SparkSession, + relationProvider: SchemaRelationProvider, + names: Seq[String]) { +val plan = + LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, null)).analyze +val projectList = plan.output.zip(plan.schema).map { + case (attr, col) => Alias(attr, col.name)() +} +sparkSession.sessionState.executePlan(Project(projectList, plan)) +for (name <- names) { + // TODO(dongjoon): This is a hack to give a database concept for Spark temporary views. + // We should generalize this later. + sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name", +plan, overrideIfExists = true) +} + } + + /** + * Compile filter array into single string condition. + */ + private[systemcatalog] def getConditionExpressionString(filters: Array[Filter]): String = { +val str = filters.flatMap(InformationSchema.compileFilter).map(p => s"($p)").mkString(" AND ") +if (str.length == 0) "TRUE" else str + } + + /** + * Convert filter into string expression. + */ + private[systemcatalog] def compileFilter(f: Filter): Option[String] = { --- End diff -- This whole function do have great merit, but I feel @liancheng had implemented something similar before? @liancheng could you confirm? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14034 **[Test build #62078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62078/consoleFull)** for PR 14034 at commit [`2e6f8d8`](https://github.com/apache/spark/commit/2e6f8d8c8b5007302415b7fd984a38fc51be44bf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14034 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62078/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14130 **[Test build #62075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62075/consoleFull)** for PR 14130 at commit [`2915bf1`](https://github.com/apache/spark/commit/2915bf1e79a28a1df0fbc895068fcf8bee2095b0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14034 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14130 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14130 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62075/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14130 Hm I tried this recently and it failed mima. I think this has to bump the base mima version too but then it will fail because 2.0.0 isn't released. I thought we'd have to wait. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14114 **[Test build #62077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62077/consoleFull)** for PR 14114 at commit [`cac342e`](https://github.com/apache/spark/commit/cac342e7de11d8ecdbec7813591cce1397595a36). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14114 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62077/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14114 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14114 **[Test build #62079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62079/consoleFull)** for PR 14114 at commit [`ac5f5cb`](https://github.com/apache/spark/commit/ac5f5cbebfbe48a307e21fc094dba4aa8fa86ddd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14114 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62079/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14114 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70216583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -401,7 +401,9 @@ class SessionCatalog( val db = formatDatabaseName(name.database.getOrElse(currentDb)) val table = formatTableName(name.table) val relation = -if (name.database.isDefined || !tempTables.contains(table)) { +if (db == "information_schema") { + tempTables(s"$db.$table") --- End diff -- That's a good idea! I made `SessionCatalog` object in the following PR to store 'DEFAULT'. https://github.com/apache/spark/pull/14115/files#diff-b3f9800839b9b9a1df9da9cbfc01adf8R38 We can move `INFORMATION_SCHEMA` to there! Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70216745 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + val INFORMATION_SCHEMA = "information_schema" + /** + * Register INFORMATION_SCHEMA database. + */ + def registerInformationSchema(sparkSession: SparkSession): Unit = { +sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA") +registerView(sparkSession, new DatabasesRelationProvider, Seq("schemata", "databases")) +registerView(sparkSession, new TablesRelationProvider, Seq("tables")) +registerView(sparkSession, new ViewsRelationProvider, Seq("views")) +registerView(sparkSession, new ColumnsRelationProvider, Seq("columns")) +registerView(sparkSession, new SessionVariablesRelationProvider, Seq("session_variables")) + } + + /** + * Register a INFORMATION_SCHEMA relation provider as a temporary view of Spark Catalog. + */ + private def registerView( + sparkSession: SparkSession, + relationProvider: SchemaRelationProvider, + names: Seq[String]) { +val plan = + LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, null)).analyze +val projectList = plan.output.zip(plan.schema).map { + case (attr, col) => Alias(attr, col.name)() +} +sparkSession.sessionState.executePlan(Project(projectList, plan)) +for (name <- names) { + // TODO(dongjoon): This is a hack to give a database concept for Spark temporary views. + // We should generalize this later. + sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name", +plan, overrideIfExists = true) +} + } + + /** + * Compile filter array into single string condition. + */ + private[systemcatalog] def getConditionExpressionString(filters: Array[Filter]): String = { +val str = filters.flatMap(InformationSchema.compileFilter).map(p => s"($p)").mkString(" AND ") +if (str.length == 0) "TRUE" else str + } + + /** + * Convert filter into string expression. + */ + private[systemcatalog] def compileFilter(f: Filter): Option[String] = { --- End diff -- This one comes from JDBC module. I intentionally copy the code, not calling them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62080/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14034 **[Test build #62080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62080/consoleFull)** for PR 14034 at commit [`d66870b`](https://github.com/apache/spark/commit/d66870bb274a206f16d33f56214246b17953a90e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14034 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14048: [SPARK-16370][SQL] Union queries should not be executed ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14048 A more fruitful approach might be to introduce a separate operator for multi-insert. Lets leave that for another day :)... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14034 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14034 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70218352 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + val INFORMATION_SCHEMA = "information_schema" + /** + * Register INFORMATION_SCHEMA database. + */ + def registerInformationSchema(sparkSession: SparkSession): Unit = { +sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA") +registerView(sparkSession, new DatabasesRelationProvider, Seq("schemata", "databases")) +registerView(sparkSession, new TablesRelationProvider, Seq("tables")) +registerView(sparkSession, new ViewsRelationProvider, Seq("views")) +registerView(sparkSession, new ColumnsRelationProvider, Seq("columns")) +registerView(sparkSession, new SessionVariablesRelationProvider, Seq("session_variables")) + } + + /** + * Register a INFORMATION_SCHEMA relation provider as a temporary view of Spark Catalog. + */ + private def registerView( + sparkSession: SparkSession, + relationProvider: SchemaRelationProvider, + names: Seq[String]) { +val plan = + LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, null)).analyze +val projectList = plan.output.zip(plan.schema).map { + case (attr, col) => Alias(attr, col.name)() +} +sparkSession.sessionState.executePlan(Project(projectList, plan)) +for (name <- names) { + // TODO(dongjoon): This is a hack to give a database concept for Spark temporary views. + // We should generalize this later. + sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name", +plan, overrideIfExists = true) +} + } + + /** + * Compile filter array into single string condition. + */ + private[systemcatalog] def getConditionExpressionString(filters: Array[Filter]): String = { +val str = filters.flatMap(InformationSchema.compileFilter).map(p => s"($p)").mkString(" AND ") +if (str.length == 0) "TRUE" else str + } + + /** + * Convert filter into string expression. + */ + private[systemcatalog] def compileFilter(f: Filter): Option[String] = { --- End diff -- Oh I see. Then if we won't dedup this for now, let's leave some comments saying if we should change some piece of code please don't forget to change the other? What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/14132 [SPARK-16475][SQL] Broadcast Hint for SQL Queries ## What changes were proposed in this pull request? Broadcast hint is a way for users to manually annotate a query and suggest to the query optimizer the join method. It is very useful when the query optimizer cannot make optimal decision with respect to join methods due to conservativeness or the lack of proper statistics. The DataFrame API has broadcast hint since Spark 1.5. However, we do not have an equivalent functionality in SQL queries. We propose adding Hive-style broadcast hint to Spark SQL. For more information, please see the [design document](https://issues.apache.org/jira/secure/attachment/12817061/BroadcastHintinSparkSQL.pdf). - [x] SUPPORT `MAPJOIN` SYNTAX - [ ] SUPPORT `BROADCAST JOIN` SYNTAX ## How was this patch tested? Pass the Jenkins tests with new testcases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-16475 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14132.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14132 commit 6bd704e5d281bd8c7a1bddaee300688d016bc597 Author: Dongjoon Hyun Date: 2016-07-11T08:04:51Z [SPARK-16475][SQL] Broadcast Hint for SQL Queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62082/consoleFull)** for PR 14132 at commit [`6bd704e`](https://github.com/apache/spark/commit/6bd704e5d281bd8c7a1bddaee300688d016bc597). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14116#discussion_r70220191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala --- @@ -0,0 +1,336 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.systemcatalog + +import java.sql.{Date, Timestamp} + +import scala.collection.mutable.ArrayBuffer + +import org.apache.commons.lang3.StringUtils + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.Alias +import org.apache.spark.sql.catalyst.plans.logical.Project +import org.apache.spark.sql.execution.datasources._ +import org.apache.spark.sql.sources._ +import org.apache.spark.sql.types._ + +/** + * INFORMATION_SCHEMA is a database consisting views which provide information about all of the + * tables, views, columns in a database. + */ +object InformationSchema { + val INFORMATION_SCHEMA = "information_schema" + /** + * Register INFORMATION_SCHEMA database. + */ + def registerInformationSchema(sparkSession: SparkSession): Unit = { +sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA") +registerView(sparkSession, new DatabasesRelationProvider, Seq("schemata", "databases")) +registerView(sparkSession, new TablesRelationProvider, Seq("tables")) +registerView(sparkSession, new ViewsRelationProvider, Seq("views")) +registerView(sparkSession, new ColumnsRelationProvider, Seq("columns")) +registerView(sparkSession, new SessionVariablesRelationProvider, Seq("session_variables")) + } + + /** + * Register a INFORMATION_SCHEMA relation provider as a temporary view of Spark Catalog. + */ + private def registerView( + sparkSession: SparkSession, + relationProvider: SchemaRelationProvider, + names: Seq[String]) { +val plan = + LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, null)).analyze +val projectList = plan.output.zip(plan.schema).map { + case (attr, col) => Alias(attr, col.name)() +} +sparkSession.sessionState.executePlan(Project(projectList, plan)) +for (name <- names) { + // TODO(dongjoon): This is a hack to give a database concept for Spark temporary views. + // We should generalize this later. + sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name", +plan, overrideIfExists = true) +} + } + + /** + * Compile filter array into single string condition. + */ + private[systemcatalog] def getConditionExpressionString(filters: Array[Filter]): String = { +val str = filters.flatMap(InformationSchema.compileFilter).map(p => s"($p)").mkString(" AND ") +if (str.length == 0) "TRUE" else str + } + + /** + * Convert filter into string expression. + */ + private[systemcatalog] def compileFilter(f: Filter): Option[String] = { --- End diff -- Definitely, it would be good. This PR is still waiting some dependency PRs. While discussing, we can refactoring `compileFilter` into some `Utils` functions, too. @rxin , any advices for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 cc @rxin and @hvanhovell . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse proxy ...
Github user gurvindersingh commented on the issue: https://github.com/apache/spark/pull/13950 @tgravescs Yeah the proposal is only for standalone mode where worker & application UI is accessed through master UI now. Looking at the authn/z settings for standalone, I don't see this patch interfere with any of those. The addFilters() function set the ui.filters for '/*' path so it will apply for this case too. Beyond this let me know if I miss anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 If the direction is right, I can move on adding `BROADCAST JOIN` syntax. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14127 Sounds like a small set of changes; not sure why it was a major version change. _shrug_ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14104: [SPARK-16438] Add Asynchronous Actions documentat...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14104#discussion_r7076 --- Diff: docs/programming-guide.md --- @@ -1099,6 +1099,9 @@ for details. +Spark provide asynchronous actions to execute two or more actions concurrently, these actions are execute asynchronously without blocking calling thread. Refer to the RDD API doc for asynchronous actions ([Scala](api/scala/index.html#org.apache.spark.rdd.AsyncRDDActions),[Java](api/java/org/apache/spark/rdd/AsyncRDDActions.html)) --- End diff -- I still don't think this is accurate. Spark can execute actions concurrently without this API. This merely makes the call non-blocking for the caller. It really adds very little beyond calling the normal API and wrapping in a Future (though it's more useful for the Java API where that's harder). Hence why I thought it was unofficially deprecated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70222526 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -945,8 +955,12 @@ SIMPLE_COMMENT : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN) ; +BRACKETED_EMPTY_COMMENT --- End diff -- Do we need this because the BRACKETED_COMMENT rule is now expecting at least one character? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70222831 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -945,8 +955,12 @@ SIMPLE_COMMENT : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN) ; +BRACKETED_EMPTY_COMMENT --- End diff -- Yes. Could you give me some workaround advice? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70223258 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -945,8 +955,12 @@ SIMPLE_COMMENT : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN) ; +BRACKETED_EMPTY_COMMENT +: '/**/' -> channel(HIDDEN) +; + BRACKETED_COMMENT -: '/*' .*? '*/' -> channel(HIDDEN) +: '/*' ~[+] .*? '*/' -> channel(HIDDEN) --- End diff -- It might be easier to introduce a HINT_PREFIX rule (`'/*+'`) and place this before the BRACKET_COMMENT rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70223712 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -347,6 +347,14 @@ querySpecification windows?) ; +hint +: '/*+' mapJoinHint '*/' +; + +mapJoinHint +: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')' --- End diff -- Is the MAPJOIN optimization the only one we are going to support? This is fine if we aren't. We might want to make this more general (and move the logic into the planner) if we are. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14133: [SPARK-15889] [STREAMING] Follow-up fix to errone...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/14133 [SPARK-15889] [STREAMING] Follow-up fix to erroneous condition in StreamTest ## What changes were proposed in this pull request? A second form of AssertQuery now actually invokes the condition; avoids a build warning too ## How was this patch tested? Jenkins; running StreamTest You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-15889.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14133 commit 719012fd16e574fad8c25ba6b9ecb23df07649de Author: Sean Owen Date: 2016-07-11T09:10:09Z Fix condition; in StreamTest to actually invoke condition --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14116 **[Test build #62081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62081/consoleFull)** for PR 14116 at commit [`34fe4ed`](https://github.com/apache/spark/commit/34fe4ed774e726000acb7af8c4e386619027dc17). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14116 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14116 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62081/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70225259 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case SqlBaseParser.SELECT => // Regular select +// Broadcast hints +var withBroadcastedTable = relation --- End diff -- A few thoughts here: - Should we introduce a separate LogicalPlan (like `With`) and move this logic into an Analyzer rule? The reason that I am proposing this is because we are transforming the `LogicalPlan` here. It is simpler to do it here though. - Lets move the logic here into a separate method. Name it `withHints`, model it after the other `with*` functions, and use `optionalMap(hint)(withHint)`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70226031 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -945,8 +955,12 @@ SIMPLE_COMMENT : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN) ; +BRACKETED_EMPTY_COMMENT +: '/**/' -> channel(HIDDEN) +; + BRACKETED_COMMENT -: '/*' .*? '*/' -> channel(HIDDEN) +: '/*' ~[+] .*? '*/' -> channel(HIDDEN) --- End diff -- Oh, I see. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70226224 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -347,6 +347,14 @@ querySpecification windows?) ; +hint +: '/*+' mapJoinHint '*/' +; + +mapJoinHint +: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')' --- End diff -- Yep. MAPJOIN optimization will be handled in Optimizer. This PR is just accepting the basic syntax. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70226307 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -347,6 +347,14 @@ querySpecification windows?) ; +hint +: '/*+' mapJoinHint '*/' +; + +mapJoinHint +: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')' --- End diff -- Currently, the scope of this PR does. We can generalize if needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70226914 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case SqlBaseParser.SELECT => // Regular select +// Broadcast hints +var withBroadcastedTable = relation +if (ctx.hint != null) { + val broadcastedTables = + hint.mapJoinHint.broadcastedTables.asScala.map(visitTableIdentifier) + for (table <- broadcastedTables) { +var stop = false +withBroadcastedTable = withBroadcastedTable.transformDown { + case r @ BroadcastHint(UnresolvedRelation(_, _)) => r + case r @ UnresolvedRelation(t, _) if !stop && t == table => --- End diff -- What happens if we use the same table multiple times? Only the first one gets a broadcasted hint? What does hive do? What will happen if we do something like this: ```SQL SELECT /*+ MAPJOIN(tbl_b) */ * FROM tbl_a A LEFT JOIN tbl_b B ON B.id = A.id LEFT JOIN (SELECT XA.id FROM tbl_c XA LEFT ANTI JOIN tbl_b XB ON XB.id = XA.id) C ON C.id = A.id ``` Will both the `tbl_b` instances be broadcasted? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70227049 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala --- @@ -153,4 +153,38 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils { cases.foreach(assertBroadcastJoin) } } + + test("select hint") { --- End diff -- Move this to package `org.apache.spark.sql.catalyst.parser.PlanParserSuite` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14132 This looks pretty good! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70227159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case SqlBaseParser.SELECT => // Regular select +// Broadcast hints +var withBroadcastedTable = relation --- End diff -- For the first one, that could be possible. I did here because it's simple and I want to remove the unmatched MAPJOIN hints at the early stage. (You can see that the testcases) We will do the more complex real optimization in Optimizer layer. For the second one, sure. That would better. I'll fix like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70227231 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -347,6 +347,14 @@ querySpecification windows?) ; +hint +: '/*+' mapJoinHint '*/' +; + +mapJoinHint +: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' broadcastedTables+=tableIdentifier)* ')' --- End diff -- Ok - lets keep it this way then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70227379 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case SqlBaseParser.SELECT => // Regular select +// Broadcast hints +var withBroadcastedTable = relation +if (ctx.hint != null) { + val broadcastedTables = + hint.mapJoinHint.broadcastedTables.asScala.map(visitTableIdentifier) + for (table <- broadcastedTables) { +var stop = false +withBroadcastedTable = withBroadcastedTable.transformDown { + case r @ BroadcastHint(UnresolvedRelation(_, _)) => r + case r @ UnresolvedRelation(t, _) if !stop && t == table => --- End diff -- Wow. Thank you for the case. I missed that. Let me check the behavior. This could be a good testcase candidate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14132#discussion_r70227441 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala --- @@ -153,4 +153,38 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils { cases.foreach(assertBroadcastJoin) } } + + test("select hint") { --- End diff -- No problem! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/14104 @srowen i know its nothing major just call normal functions in future but still naive user first time who learn scala and spark dont know what are future and all so at least we should add reference of it. if you suggest i simple add reference link without adding any statement of blocking and non blocking. so atleast user know there is some api like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14104 Although I don't think it's essential to mention this API (it's in the API doc) it wouldn't hurt to point to it here. The mention should be accurate though. I think it's correct to say that the Spark RDD API also exposes asynchronous versions of some actions like foreach, which return a handle immediately to the caller which can be used to wait for its completion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 Thank you for quick review! I'll let you know after updating. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14115 Hi, @hvanhovell . Could you review this PR, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/14104 @srowen so what do you think we should not add this api reference in document? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14132 **[Test build #62082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62082/consoleFull)** for PR 14132 at commit [`6bd704e`](https://github.com/apache/spark/commit/6bd704e5d281bd8c7a1bddaee300688d016bc597). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/14104 @srowen sure i will make appropriate changes and push it again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14132 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62082/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14131 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14131 looks like we also need to backport the improvement to `checkEvaluation` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13494#discussion_r70230056 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, SessionCatalog} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule optimizes the execution of queries that can be answered by looking only at + * partition-level metadata. This applies when all the columns scanned are partition columns, and + * the query has an aggregate operator that satisfies the following conditions: + * 1. aggregate expression is partition columns. + * e.g. SELECT col FROM tbl GROUP BY col. + * 2. aggregate function on partition columns with DISTINCT. + * e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1. + * 3. aggregate function on partition columns which have same result w or w/o DISTINCT keyword. + * e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1. + */ +case class OptimizeMetadataOnlyQuery( +catalog: SessionCatalog, +conf: SQLConf) extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = { +if (!conf.optimizerMetadataOnly) { + return plan +} + +plan.transform { + case a @ Aggregate(_, aggExprs, child @ PartitionedRelation(partAttrs, relation)) => +// We only apply this optimization when only partitioned attributes are scanned. +if (a.references.subsetOf(partAttrs)) { + val aggFunctions = aggExprs.flatMap(_.collect { +case agg: AggregateExpression => agg + }) + val isAllDistinctAgg = aggFunctions.forall { agg => +agg.isDistinct || (agg.aggregateFunction match { + // `Max` and `Min` are always distinct aggregate functions no matter they have + // DISTINCT keyword or not, as the result will be same. + case _: Max => true --- End diff -- First/Last? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r70230473 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2018,6 +2018,8 @@ class Analyzer( fail(child, DateType, walkedTypePath) case (StringType, to: NumericType) => fail(child, to, walkedTypePath) + case (from: ArrayType, to: ArrayType) if !from.containsNull => --- End diff -- Got it. I will add unit tests later. They require a combination of `Cast` and `SimplifyCasts`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14132 Oops. Five errors, too. I'll fix these tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/14104 @srowen now its perfect i think at last from table to one line but yeah you are right it good to just mention that spark provide it and not duplicate table of actions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14115: [SPARK-16459][SQL] Prevent dropping current datab...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/14115#discussion_r70231360 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -878,7 +885,8 @@ class SessionCatalog( * This is mainly used for tests. */ private[sql] def reset(): Unit = synchronized { -val default = "default" +val default = DEFAULT_DATABASE --- End diff -- Shouldn't we just replace the `val default` by `DEFAULT_DATABASE`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r70231367 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1441,6 +1441,26 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { object SimplifyCasts extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { case Cast(e, dataType) if e.dataType == dataType => e +case Cast(e, dataType) => + (e.dataType, dataType) match { +case (fromDt: ArrayType, toDt: ArrayType) => --- End diff -- how about ``` case c @ Cast(e, dataType) => (e.dataType, dataType) match { case (ArrayType(from, false), ArrayType(to, true)) if from == to => e case (MapType(fromKey, fromValue, false), ArrayType(toKey, toValue, true)) if fromKey == toKey && fromValue == toValue => e case _ => c } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14115: [SPARK-16459][SQL] Prevent dropping current datab...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14115#discussion_r70232467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -878,7 +885,8 @@ class SessionCatalog( * This is mainly used for tests. */ private[sql] def reset(): Unit = synchronized { -val default = "default" +val default = DEFAULT_DATABASE --- End diff -- Oh, sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14115 Thank you for review, @hvanhovell . I replaced the local variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14134: [Spark-16479] Add Example for asynchronous action
GitHub user phalodi opened a pull request: https://github.com/apache/spark/pull/14134 [Spark-16479] Add Example for asynchronous action ## What changes were proposed in this pull request? Add examples for asynchronous actions ## How was this patch tested? run all tests cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/phalodi/spark SPARK-16479 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14134.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14134 commit aba737f9be520ec33d0841d718f8250fbfeb09e1 Author: sandy Date: 2016-07-08T09:15:12Z Add Asynchronous Actions documentation commit 4afcec803218376200559730e8f1c9873641c140 Author: sandy Date: 2016-07-10T19:04:03Z fix statement to more clear about blocking statement commit 81d1f716e279dab5f5edfdc5eb3624f28d0387f1 Author: sandy Date: 2016-07-11T05:56:11Z remove some statement related to table commit db09b2c3bfb441cf20dfebd5a7def79252c49cec Author: sandy Date: 2016-07-11T10:00:39Z make changes in programming guide doc for asynchronous actions commit 1a54ff52564639a750ffa81e9159f2ae3a6248ee Author: sandy Date: 2016-07-11T10:18:47Z add asynchronous actions example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14134: [Spark-16479] Add Example for asynchronous action
Github user phalodi closed the pull request at: https://github.com/apache/spark/pull/14134 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r70234587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1441,6 +1441,26 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { object SimplifyCasts extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions { case Cast(e, dataType) if e.dataType == dataType => e +case Cast(e, dataType) => + (e.dataType, dataType) match { +case (fromDt: ArrayType, toDt: ArrayType) => --- End diff -- thanks, I like this simple one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14115 LGTM - pending jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14129 @tilumi could show some benchmarks for this? I think that this will have some performance problems. Using an array in a DeclarativeAggregate is not the most efficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14135: [Spark-16479] Add Example for asynchronous action
GitHub user phalodi opened a pull request: https://github.com/apache/spark/pull/14135 [Spark-16479] Add Example for asynchronous action ## What changes were proposed in this pull request? Add Example for asynchronous action ## How was this patch tested? Run test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/phalodi/spark SPARK-16479 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14135 commit 1215cd9ade669ff7e78a8380ea42faf5bfa16387 Author: sandy Date: 2016-07-11T10:35:43Z add example for async action --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14135: [Spark-16479] Add Example for asynchronous action
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14135 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13494#discussion_r70236562 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, SessionCatalog} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule optimizes the execution of queries that can be answered by looking only at + * partition-level metadata. This applies when all the columns scanned are partition columns, and + * the query has an aggregate operator that satisfies the following conditions: + * 1. aggregate expression is partition columns. + * e.g. SELECT col FROM tbl GROUP BY col. + * 2. aggregate function on partition columns with DISTINCT. + * e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1. + * 3. aggregate function on partition columns which have same result w or w/o DISTINCT keyword. + * e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1. + */ +case class OptimizeMetadataOnlyQuery( +catalog: SessionCatalog, +conf: SQLConf) extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = { +if (!conf.optimizerMetadataOnly) { + return plan +} + +plan.transform { + case a @ Aggregate(_, aggExprs, child @ PartitionedRelation(partAttrs, relation)) => +// We only apply this optimization when only partitioned attributes are scanned. +if (a.references.subsetOf(partAttrs)) { + val aggFunctions = aggExprs.flatMap(_.collect { +case agg: AggregateExpression => agg + }) + val isAllDistinctAgg = aggFunctions.forall { agg => +agg.isDistinct || (agg.aggregateFunction match { + // `Max` and `Min` are always distinct aggregate functions no matter they have + // DISTINCT keyword or not, as the result will be same. + case _: Max => true + case _: Min => true + case _ => false +}) + } + if (isAllDistinctAgg) { + a.withNewChildren(Seq(replaceTableScanWithPartitionMetadata(child, relation))) + } else { +a + } +} else { + a +} +} + } + + /** + * Transform the given plan, find its table scan nodes that matches the given relation, and then + * replace the table scan node with its corresponding partition values. + */ + private def replaceTableScanWithPartitionMetadata( + child: LogicalPlan, + relation: LogicalPlan): LogicalPlan = { +child transform { + case plan if plan eq relation => +relation match { + case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _) => +val partColumns = fsRelation.partitionSchema.map(_.name.toLowerCase).toSet +val partAttrs = l.output.filter(a => partColumns.contains(a.name.toLowerCase)) +val partitionData = fsRelation.location.listFiles(filters = Nil) +LocalRelation(partAttrs, partitionData.map(_.values)) + + case relation: CatalogRelation => +val partColumns = relation.catalogTable.partitionColumnNames.map(_.toLowerCase).toSet +val partAttrs = relation.output.filter(a => partColumns.contains(a.name.toLowerCase)) +val partiti
[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13494#discussion_r70236613 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, SessionCatalog} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule optimizes the execution of queries that can be answered by looking only at + * partition-level metadata. This applies when all the columns scanned are partition columns, and + * the query has an aggregate operator that satisfies the following conditions: + * 1. aggregate expression is partition columns. + * e.g. SELECT col FROM tbl GROUP BY col. + * 2. aggregate function on partition columns with DISTINCT. + * e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1. + * 3. aggregate function on partition columns which have same result w or w/o DISTINCT keyword. + * e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1. + */ +case class OptimizeMetadataOnlyQuery( +catalog: SessionCatalog, +conf: SQLConf) extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = { +if (!conf.optimizerMetadataOnly) { + return plan +} + +plan.transform { + case a @ Aggregate(_, aggExprs, child @ PartitionedRelation(partAttrs, relation)) => +// We only apply this optimization when only partitioned attributes are scanned. +if (a.references.subsetOf(partAttrs)) { + val aggFunctions = aggExprs.flatMap(_.collect { +case agg: AggregateExpression => agg + }) + val isAllDistinctAgg = aggFunctions.forall { agg => +agg.isDistinct || (agg.aggregateFunction match { + // `Max` and `Min` are always distinct aggregate functions no matter they have + // DISTINCT keyword or not, as the result will be same. + case _: Max => true + case _: Min => true + case _ => false +}) + } + if (isAllDistinctAgg) { + a.withNewChildren(Seq(replaceTableScanWithPartitionMetadata(child, relation))) + } else { +a + } +} else { + a +} +} + } + + /** + * Transform the given plan, find its table scan nodes that matches the given relation, and then + * replace the table scan node with its corresponding partition values. + */ + private def replaceTableScanWithPartitionMetadata( + child: LogicalPlan, + relation: LogicalPlan): LogicalPlan = { +child transform { + case plan if plan eq relation => +relation match { + case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _) => +val partColumns = fsRelation.partitionSchema.map(_.name.toLowerCase).toSet +val partAttrs = l.output.filter(a => partColumns.contains(a.name.toLowerCase)) +val partitionData = fsRelation.location.listFiles(filters = Nil) +LocalRelation(partAttrs, partitionData.map(_.values)) + + case relation: CatalogRelation => +val partColumns = relation.catalogTable.partitionColumnNames.map(_.toLowerCase).toSet +val partAttrs = relation.output.filter(a => partColumns.contains(a.name.toLowerCase)) +val partiti
[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13704 **[Test build #62087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62087/consoleFull)** for PR 13704 at commit [`1bbe859`](https://github.com/apache/spark/commit/1bbe859804d999e30b8ed7f51b13121e30118d5a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14131 **[Test build #62085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62085/consoleFull)** for PR 14131 at commit [`4d6f654`](https://github.com/apache/spark/commit/4d6f6544be4373a32150fd6d59ba539d3fcb6aab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14129 **[Test build #3178 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3178/consoleFull)** for PR 14129 at commit [`08065d8`](https://github.com/apache/spark/commit/08065d87f9a04f7e03e319950e8caa5c960defb0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14133: [SPARK-15889] [STREAMING] Follow-up fix to erroneous con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14133 **[Test build #62083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62083/consoleFull)** for PR 14133 at commit [`719012f`](https://github.com/apache/spark/commit/719012fd16e574fad8c25ba6b9ecb23df07649de). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13704 **[Test build #62084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62084/consoleFull)** for PR 13704 at commit [`c31729f`](https://github.com/apache/spark/commit/c31729f361b5774f36834daeddb338f49377130e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14115 **[Test build #3177 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3177/consoleFull)** for PR 14115 at commit [`19a3160`](https://github.com/apache/spark/commit/19a31601c248d239aafcdcba1123c7a7c585c924). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14115 **[Test build #62086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62086/consoleFull)** for PR 14115 at commit [`19a3160`](https://github.com/apache/spark/commit/19a31601c248d239aafcdcba1123c7a7c585c924). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13494#discussion_r70238437 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, SessionCatalog} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * This rule optimizes the execution of queries that can be answered by looking only at + * partition-level metadata. This applies when all the columns scanned are partition columns, and + * the query has an aggregate operator that satisfies the following conditions: + * 1. aggregate expression is partition columns. + * e.g. SELECT col FROM tbl GROUP BY col. + * 2. aggregate function on partition columns with DISTINCT. + * e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1. + * 3. aggregate function on partition columns which have same result w or w/o DISTINCT keyword. + * e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1. + */ +case class OptimizeMetadataOnlyQuery( +catalog: SessionCatalog, +conf: SQLConf) extends Rule[LogicalPlan] { + + def apply(plan: LogicalPlan): LogicalPlan = { +if (!conf.optimizerMetadataOnly) { + return plan +} + +plan.transform { + case a @ Aggregate(_, aggExprs, child @ PartitionedRelation(partAttrs, relation)) => +// We only apply this optimization when only partitioned attributes are scanned. +if (a.references.subsetOf(partAttrs)) { + val aggFunctions = aggExprs.flatMap(_.collect { +case agg: AggregateExpression => agg + }) + val isAllDistinctAgg = aggFunctions.forall { agg => +agg.isDistinct || (agg.aggregateFunction match { + // `Max` and `Min` are always distinct aggregate functions no matter they have + // DISTINCT keyword or not, as the result will be same. + case _: Max => true + case _: Min => true + case _ => false +}) + } + if (isAllDistinctAgg) { + a.withNewChildren(Seq(replaceTableScanWithPartitionMetadata(child, relation))) + } else { +a + } +} else { + a +} +} + } + + /** + * Transform the given plan, find its table scan nodes that matches the given relation, and then + * replace the table scan node with its corresponding partition values. + */ + private def replaceTableScanWithPartitionMetadata( + child: LogicalPlan, + relation: LogicalPlan): LogicalPlan = { +child transform { + case plan if plan eq relation => +relation match { + case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _) => +val partColumns = fsRelation.partitionSchema.map(_.name.toLowerCase).toSet +val partAttrs = l.output.filter(a => partColumns.contains(a.name.toLowerCase)) --- End diff -- There is no need to determine the `partAttrs` again; we can just pass them as an argument. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/14136 [SPARK-16282][SQL] Implement percentile SQL function. ## What changes were proposed in this pull request? Implement percentile SQL function. It computes the exact percentile(s) of expr at pc with range in [0, 1]. ## How was this patch tested? Added new testcases in DataFrameAggregateSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark percentile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14136.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14136 commit 1ae3df7463f36ac15c4cb3138bec3bade7eae600 Author: èæå Date: 2016-07-11T11:11:26Z [SPARK-16282][SQL] Implement percentile SQL function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14136 **[Test build #62088 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62088/consoleFull)** for PR 14136 at commit [`1ae3df7`](https://github.com/apache/spark/commit/1ae3df7463f36ac15c4cb3138bec3bade7eae600). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14136 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org