[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/13706 cc @gatorsmile could you take a look at the way this interacts with the session catalog and the function registry? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118846647 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap +if (colToIndex.size != columns.size) { + throw new AnalysisException(s"Cannot support duplicate colNames " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") +} +val macroFunction = funcWrapper.macroFunction.transform { + case u: UnresolvedAttribute => +val index = colToIndex.get(u.name).getOrElse( + throw new AnalysisException(s"Cannot find colName: ${u} " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}")) +BoundReference(index, columns(index).dataType, columns(index).nullable) + case u: UnresolvedFunction => +sparkSession.sessionState.catalog.lookupFunction(u.name, u.children) + case s: SubqueryExpression => +throw new AnalysisException(s"Cannot support Subquery: ${s} " + + s"for CREATE TEMPORARY MACRO $macroName") + case u: UnresolvedGenerator => +throw new AnalysisException(s"Cannot support Generator: ${u} " + + s"for CREATE TEMPORARY MACRO $macroName") +} + +val macroInfo = columns.mkString(",") + " -> " + funcWrapper.macroFunction.toString +val info = new ExpressionInfo(macroInfo, macroName, true) +val builder = (children: Seq[Expression]) => { + if (children.size != columns.size) { +throw new AnalysisException(s"Actual number of columns: ${children.size} != " + + s"expected number of columns: ${columns.size} for Macro $macroName") + } + macroFunction.transform { +// Skip to validate the input type because check it at runtime. --- End diff -- On a related note, we are currently not sure if the macro produces a valid expression. Maybe we should run analysis on the macro expression to make sure it does not fail every query later on, e.g.: ```scala val resolvedMacroFunction = try { val plan = Project(Alias(macroFunction, "m")() :: Nil, OneRowRelation) val analyzed @ Project(Seq(named), OneRowRelation) = sparkSession.sessionState.analyzer.execute(plan) sparkSession.sessionState.analyzer.checkAnalysis(analyzed) named.children.head } catch { case a: AnalysisException => ... } ``` Note that we cannot use generato
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r11884 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap +if (colToIndex.size != columns.size) { + throw new AnalysisException(s"Cannot support duplicate colNames " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") +} +val macroFunction = funcWrapper.macroFunction.transform { + case u: UnresolvedAttribute => +val index = colToIndex.get(u.name).getOrElse( + throw new AnalysisException(s"Cannot find colName: ${u} " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}")) +BoundReference(index, columns(index).dataType, columns(index).nullable) + case u: UnresolvedFunction => +sparkSession.sessionState.catalog.lookupFunction(u.name, u.children) + case s: SubqueryExpression => +throw new AnalysisException(s"Cannot support Subquery: ${s} " + + s"for CREATE TEMPORARY MACRO $macroName") + case u: UnresolvedGenerator => +throw new AnalysisException(s"Cannot support Generator: ${u} " + + s"for CREATE TEMPORARY MACRO $macroName") +} + +val macroInfo = columns.mkString(",") + " -> " + funcWrapper.macroFunction.toString --- End diff -- Can you give an example of what this would look like? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } --- End diff -- Nit: put `}` on a new line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844463 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap +if (colToIndex.size != columns.size) { + throw new AnalysisException(s"Cannot support duplicate colNames " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") +} +val macroFunction = funcWrapper.macroFunction.transform { + case u: UnresolvedAttribute => +val index = colToIndex.get(u.name).getOrElse( + throw new AnalysisException(s"Cannot find colName: ${u} " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}")) +BoundReference(index, columns(index).dataType, columns(index).nullable) + case u: UnresolvedFunction => +sparkSession.sessionState.catalog.lookupFunction(u.name, u.children) + case s: SubqueryExpression => +throw new AnalysisException(s"Cannot support Subquery: ${s} " + + s"for CREATE TEMPORARY MACRO $macroName") + case u: UnresolvedGenerator => +throw new AnalysisException(s"Cannot support Generator: ${u} " + + s"for CREATE TEMPORARY MACRO $macroName") +} + +val macroInfo = columns.mkString(",") + " -> " + funcWrapper.macroFunction.toString +val info = new ExpressionInfo(macroInfo, macroName, true) +val builder = (children: Seq[Expression]) => { + if (children.size != columns.size) { --- End diff -- It is slightly better to `columns.size` in a separate variable, so we do not include `columns` in the closure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844485 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -716,6 +716,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { } /** + * Create a [[CreateMacroCommand]] command. + * + * For example: + * {{{ + * CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ + override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = withOrigin(ctx) { +val arguments = Option(ctx.colTypeList).map(visitColTypeList(_)) --- End diff -- Nit: you can avoid `(_)`... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118845133 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { ) } + test("create/drop temporary macro") { --- End diff -- Should we also test a combination of temporary macros/functions...? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118845109 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { ) } + test("create/drop temporary macro") { --- End diff -- Can you also add a case for a macro without parameters? E.g.: `create temporary macro c() as 3E9` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844675 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -107,6 +110,14 @@ class SimpleFunctionRegistry extends FunctionRegistry { functionBuilders.remove(name).isDefined } + override def dropMacro(name: String): Boolean = synchronized { --- End diff -- A drop function can currently also drop a macro. Can you make sure that this cannot happen? Maybe we should consolidate this into a single drop function with a `macro` flag. cc @gatorsmile WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844322 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { ) } + test("create/drop temporary macro") { --- End diff -- Can you use `SQLQueryTestSuite` instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844357 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap +if (colToIndex.size != columns.size) { + throw new AnalysisException(s"Cannot support duplicate colNames " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") +} +val macroFunction = funcWrapper.macroFunction.transform { + case u: UnresolvedAttribute => +val index = colToIndex.get(u.name).getOrElse( --- End diff -- We should respect the case-sensitivity settings here. So a lookup might not be the best idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844274 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap +if (colToIndex.size != columns.size) { + throw new AnalysisException(s"Cannot support duplicate colNames " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") +} +val macroFunction = funcWrapper.macroFunction.transform { + case u: UnresolvedAttribute => +val index = colToIndex.get(u.name).getOrElse( + throw new AnalysisException(s"Cannot find colName: ${u} " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}")) +BoundReference(index, columns(index).dataType, columns(index).nullable) + case u: UnresolvedFunction => +sparkSession.sessionState.catalog.lookupFunction(u.name, u.children) + case s: SubqueryExpression => +throw new AnalysisException(s"Cannot support Subquery: ${s} " + + s"for CREATE TEMPORARY MACRO $macroName") + case u: UnresolvedGenerator => --- End diff -- Is this what Hive does? I really don't see why we should not support this. Please note that we cannot use generators if we decide that an expression has to be a fully resolved expression at creation time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118845638 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1090,6 +1090,24 @@ class SessionCatalog( } } + /** Create a temporary macro. */ + def createTempMacro( + name: String, + info: ExpressionInfo, + functionBuilder: FunctionBuilder): Unit = { +if (functionRegistry.functionExists(name)) { --- End diff -- I am not entirely sure if we should throw an exception here. It unfortunately depends on the semantics you follow, SQL will throw an exception, whereas the Dataframe API will just overwrite the function. Let's follow Hive for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118846490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => + AttributeReference(col.name, col.dataType, col.nullable, col.metadata)() } +val colToIndex: Map[String, Int] = columns.map(_.name).zipWithIndex.toMap +if (colToIndex.size != columns.size) { + throw new AnalysisException(s"Cannot support duplicate colNames " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}") +} +val macroFunction = funcWrapper.macroFunction.transform { + case u: UnresolvedAttribute => +val index = colToIndex.get(u.name).getOrElse( + throw new AnalysisException(s"Cannot find colName: ${u} " + +s"for CREATE TEMPORARY MACRO $macroName, actual columns: ${columns.mkString(",")}")) +BoundReference(index, columns(index).dataType, columns(index).nullable) + case u: UnresolvedFunction => +sparkSession.sessionState.catalog.lookupFunction(u.name, u.children) + case s: SubqueryExpression => +throw new AnalysisException(s"Cannot support Subquery: ${s} " + + s"for CREATE TEMPORARY MACRO $macroName") + case u: UnresolvedGenerator => +throw new AnalysisException(s"Cannot support Generator: ${u} " + + s"for CREATE TEMPORARY MACRO $macroName") +} + +val macroInfo = columns.mkString(",") + " -> " + funcWrapper.macroFunction.toString +val info = new ExpressionInfo(macroInfo, macroName, true) +val builder = (children: Seq[Expression]) => { + if (children.size != columns.size) { +throw new AnalysisException(s"Actual number of columns: ${children.size} != " + + s"expected number of columns: ${columns.size} for Macro $macroName") + } + macroFunction.transform { +// Skip to validate the input type because check it at runtime. --- End diff -- How do we check at runtime? The current code does not seem to respect the types passed, and rely on the macro's expression to do some type validation, this means you can pass anything to the macro and the user can end up with an unexpected result: ```sql create macro plus(a int, b int) as a + b; select plus(1.0, 1.0) as result -- This returns a decimal, and not an int as expected ``` So I think we should at least validate the input expressions. The hacky way would be to add casts, and have the analyzer fail if the cast cannot be made (this is terrible UX). A better way to would be to create some sentinel expression that makes sure the ana
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) + +/** + * The DDL command that creates a macro. + * To create a temporary macro, the syntax of using this command in SQL is: + * {{{ + *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; + * }}} + */ +case class CreateMacroCommand( +macroName: String, +funcWrapper: MacroFunctionWrapper) + extends RunnableCommand { + + override def run(sparkSession: SparkSession): Seq[Row] = { +val catalog = sparkSession.sessionState.catalog +val columns = funcWrapper.columns.map { col => --- End diff -- It might easier to use `StructType().toAttributes` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13706#discussion_r118844209 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql.{AnalysisException, Row, SparkSession} +import org.apache.spark.sql.catalyst.analysis._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types.StructField + +/** + * This class provides arguments and body expression of the macro function. + */ +case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: Expression) --- End diff -- Why do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18079#discussion_r118842688 --- Diff: sql/core/src/test/resources/sql-tests/inputs/aliases.sql --- @@ -0,0 +1,17 @@ +-- Test data. --- End diff -- How about we name this file `table-aliases.sql`; that seems a little bit less confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18079#discussion_r118842345 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -676,9 +676,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging * Create an aliased table reference. This is typically used in FROM clauses. */ override def visitTableName(ctx: TableNameContext): LogicalPlan = withOrigin(ctx) { -val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier)) - -val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match { +val tableId = visitTableIdentifier(ctx.tableIdentifier) +val table = Option(ctx.tableAlias.identifierList) match { --- End diff -- ...or something like this: ```scala val outputNames = Option(ctx.tableAlias.identifierList).map(visitIdentifierList).getOrElse(Nil) val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier), outputNames) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18079#discussion_r118842292 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -676,9 +676,12 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging * Create an aliased table reference. This is typically used in FROM clauses. */ override def visitTableName(ctx: TableNameContext): LogicalPlan = withOrigin(ctx) { -val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier)) - -val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match { +val tableId = visitTableIdentifier(ctx.tableIdentifier) +val table = Option(ctx.tableAlias.identifierList) match { --- End diff -- Just if/else? Seems a bit heavy weight to wrap in an option... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18079#discussion_r118842186 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -711,7 +711,7 @@ nonReserved | ADD | OVER | PARTITION | RANGE | ROWS | PRECEDING | FOLLOWING | CURRENT | ROW | LAST | FIRST | AFTER | MAP | ARRAY | STRUCT -| LATERAL | WINDOW | REDUCE | TRANSFORM | USING | SERDE | SERDEPROPERTIES | RECORDREADER --- End diff -- I have added as much to the non-reserved keyword list as possible (without creating ambiguities). The reason for this is that many datasources (for instance twitter4j) unfortunately use reserved keywords for column names, and working with these was quite cumbersome. I took the pragmatic approach. If we want to change this, then we need to do the same Hive did and create a config flag. We remove them for Spark 3.0... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18072: [SPARK-20857][SQL] Generic resolved hint node
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18072 LGTM - pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18069: [SPARK-20850][SQL]Improve division and multiplication mi...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18069 I don't think this makes any sense: - The parser is not the place where you should make this change. This should be done in `TypeCoercion`. Now we create different semantics between SQL and the DataFrame API. - This is a breaking change to result type of `Divide`, and will probably break Hive compatibility. This is going break to user code in a lot of places. I also think it is weird that we are going to always return `Double` when a `Decimal` is completely valid. - I think MySql returns a `Decimal` and not a `Double`. So I am going to -1 this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117813681 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1188,6 +1188,12 @@ class Dataset[T] private[sql]( def col(colName: String): Column = colName match { case "*" => Column(ResolvedStar(queryExecution.analyzed.output)) +case ParserUtils.escapedIdentifier(columnNameRegex) --- End diff -- I don't think sql a-like syntax is really useful here. How about we create a special cased `col` function that takes a regex? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18057 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18054: [SPARK-20763][SQL][Backport-2.1] The function of `month`...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18054 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 LGTM - merging to master/2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18030 LGTM - merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117367232 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +val expandedAttributes: Seq[Attribute] = table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(expr)) + // If there is a table, pick out attributes that are part of this table that match expr + case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, t)).nonEmpty) --- End diff -- `input.output.filter(_.qualifier.exists(resolver(_, t)))` is a bit more concise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117379878 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { +case r@escapedIdentifier(i) => + UnresolvedRegex(i, Some(unresolved_attr.name)) +case _ => + UnresolvedAttribute(nameParts :+ attr) + }) + return ret.get +} + UnresolvedAttribute(nameParts :+ attr) case e => UnresolvedExtractValue(e, Literal(attr)) } } /** - * Create an [[UnresolvedAttribute]] expression. + * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] if it is a regex + * quoted in `` */ override def visitColumnReference(ctx: ColumnReferenceContext): Expression = withOrigin(ctx) { +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r --- End diff -- We don't need to compile the same regex over and over. Can you move this to the ParserUtils... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117367155 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { --- End diff -- `expr` is the pattern right? Maybe we should give it a better name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117380037 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { --- End diff -- Using an option here does not add a thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117366828 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction( } /** + * Represents all of the input attributes to a given relational operator, for example in + * "SELECT `(id)?+.+` FROM ...". + * + * @param table an optional table that should be the target of the expansion. If omitted all + * tables' columns are produced. + */ +case class UnresolvedRegex(expr: String, table: Option[String]) extends Star with Unevaluable { + override def expand(input: LogicalPlan, resolver: Resolver): Seq[NamedExpression] = { +val expandedAttributes: Seq[Attribute] = table match { + // If there is no table specified, use all input attributes that match expr + case None => input.output.filter(_.name.matches(expr)) + // If there is a table, pick out attributes that are part of this table that match expr + case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, t)).nonEmpty) +.filter(_.name.matches(expr)) +} + +expandedAttributes.zip(input.output).map { --- End diff -- An `Attribute` is always a `NamedExpression`, why do we need this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117368022 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => --- End diff -- Please use a guard, e.g.: `case unresolved_attr @ UnresolvedAttribute(nameParts) if conf.supportQuotedIdentifiers => `. That makes the logic down the line much simpler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117380055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.fieldName.getStart).map(_.getText match { +case r@escapedIdentifier(i) => + UnresolvedRegex(i, Some(unresolved_attr.name)) +case _ => + UnresolvedAttribute(nameParts :+ attr) + }) + return ret.get +} + UnresolvedAttribute(nameParts :+ attr) case e => UnresolvedExtractValue(e, Literal(attr)) } } /** - * Create an [[UnresolvedAttribute]] expression. + * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] if it is a regex + * quoted in `` */ override def visitColumnReference(ctx: ColumnReferenceContext): Expression = withOrigin(ctx) { +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r + val ret = Option(ctx.getStart).map(_.getText match { --- End diff -- Using an option here does not add a thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117367722 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a dereference expression. The return type depends on the type of the parent, this can - * either be a [[UnresolvedAttribute]] (if the parent is an [[UnresolvedAttribute]]), or an - * [[UnresolvedExtractValue]] if the parent is some expression. + * Create a dereference expression. The return type depends on the type of the parent. + * If the parent is an [[UnresolvedAttribute]], it can be a [[UnresolvedAttribute]] or + * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some other expression, + * it can be [[UnresolvedExtractValue]]. */ override def visitDereference(ctx: DereferenceContext): Expression = withOrigin(ctx) { val attr = ctx.fieldName.getText expression(ctx.base) match { - case UnresolvedAttribute(nameParts) => + case unresolved_attr @ UnresolvedAttribute(nameParts) => +if (conf.supportQuotedIdentifiers) { + val escapedIdentifier = "`(.+)`".r --- End diff -- We don't need to compile the same regex over and over. Can you move this to the ParserUtils... I am also wondering if we shouldn't do the match in the parser it self. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117289000 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { + s""" +long longVal = (long)${lon}; +short shortVal; +if (longVal > 255) { + shortVal = (short)(longVal % 256); +} else { + (short)longVal; +} +if (shortVal == 0) { + ${ev.value} = String.valueOf('\u'); --- End diff -- You really should put this in a constant field. It does not make sense to create the same constant over and over... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117378356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { + s""" +long longVal = (long)${lon}; +short shortVal; +if (longVal > 255) { --- End diff -- It might be easier to something like this: ```java if (lon < 0) { ${ev.value} = UTF8String.fromString(""); // This should be put in a constant } else if (($lon & 0xFF) == 0) { ${ev.value} = UTF8String.fromString("\u"); // This should be put in a variable } else { char c = (char)($lon & 0xFF); // This is quite slow, it might be better to create the bytes directly. ${ev.value} = UTF8String.fromString(String.valueOf(c)); } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117378811 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala --- @@ -117,6 +117,41 @@ class StringFunctionsSuite extends QueryTest with SharedSQLContext { Row(97, 0)) } + test("string ch[a]r function") { --- End diff -- Please move this to catalyst. Look at the `StringExpressionsSuite`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117375988 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { + s""" +long longVal = (long)${lon}; +short shortVal; +if (longVal > 255) { + shortVal = (short)(longVal % 256); +} else { + (short)longVal; +} +if (shortVal == 0) { + ${ev.value} = String.valueOf('\u'); +} else if (shortVal < 0) { + ${ev.value} = ""; +} else { + ${ev.value} = (short)shortVal; --- End diff -- You cannot assign a `short` to a `UTF8String` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117378749 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2138,6 +2138,40 @@ object functions { def ascii(e: Column): Column = withExpr { Ascii(e.expr) } /** + * Returns the ASCII character having the binary equivalent to expr. --- End diff -- let's not do this just yet; adding it to the function registry is enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117250411 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { --- End diff -- I am 100% this is not being tested. Can you check how other expression test the code they generate? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117376928 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { --- End diff -- `defineCodeGen` does not work here. This is a statement instead of an expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117375494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { + s""" +long longVal = (long)${lon}; +short shortVal; +if (longVal > 255) { + shortVal = (short)(longVal % 256); +} else { + (short)longVal; +} +if (shortVal == 0) { + ${ev.value} = String.valueOf('\u'); +} else if (shortVal < 0) { + ${ev.value} = ""; --- End diff -- You cannot assign a `String` to a `UTF8String`. Also make this a constant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117376367 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { + s""" +long longVal = (long)${lon}; --- End diff -- Cast not needed :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117250008 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(expr) - Returns the ASCII character having the binary equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, lon => { + s""" +long longVal = (long)${lon}; +short shortVal; +if (longVal > 255) { --- End diff -- What about negative values? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18030 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18030 LGTM - pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18012 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18012 (let's see if jenkins picks this up) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18019#discussion_r117076510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends UnaryExpression with ImplicitCastInp } /** + * Returns the ASCII character having the binary equivalent to n. + * If n is larger than 256 the result is equivalent to chr(n % 256) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(n) - Returns the ASCII character having the binary equivalent to `n`. If n is larger than 256 the result is equivalent to chr(n % 256)", + extended = """ +Examples: + > SELECT _FUNC_(65); + A + """) +// scalastyle:on line.size.limit +case class Chr(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { + + override def dataType: DataType = StringType + override def inputTypes: Seq[DataType] = Seq(LongType) + + protected override def nullSafeEval(value: Any): Any = { +val longVal = value.asInstanceOf[Long] +val shortVal = if (longVal > 255) (longVal % 256).toShort else longVal.toShort +val stringVal = if (shortVal == 0) { + String.valueOf('\u') +} else if (shortVal < 0) { + "" +} else { + shortVal.toChar.toString +} +UTF8String.fromString(stringVal) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +defineCodeGen(ctx, ev, str => { + s""" +long longVal = ${java.lang.Long.parseLong(new String(str.getBytes))}; --- End diff -- Isn't the input supposed to be a long? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 What do you mean by catalyst blew up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 LGTM pending jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeField...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18005#discussion_r117029865 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala --- @@ -90,7 +90,7 @@ private[parquet] class ParquetWriteSupport extends WriteSupport[InternalRow] wit } -this.rootFieldWriters = schema.map(_.dataType).map(makeWriter) +this.rootFieldWriters = schema.map(_.dataType).map(makeWriter).toArray[ValueWriter] --- End diff -- Either call toIndexedSeq or make the rootFieldWriters an Array. Both are fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeField...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/18005#discussion_r117029723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala --- @@ -90,7 +90,7 @@ private[parquet] class ParquetWriteSupport extends WriteSupport[InternalRow] wit } -this.rootFieldWriters = schema.map(_.dataType).map(makeWriter).toArray +this.rootFieldWriters = schema.map(_.dataType).map(makeWriter).toArray[ValueWriter] --- End diff -- Either call `toIndexedSeq` or make the `rootFieldWriters` an Array. Both are fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil handle the value which is...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18016 +1 on Sean's comment. This is **not** a parser issue. Can you just fix this by adding `LongType` to `Ceil.inputTypes`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 Can you also make sure that we do not use a `Seq` for struct writing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/18005 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17935: [SPARK-20690][SQL] Subqueries in FROM should have...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17935#discussion_r116851595 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -473,7 +473,7 @@ identifierComment relationPrimary : tableIdentifier sample? (AS? strictIdentifier)? #tableName -| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery +| '(' queryNoWith ')' sample? (AS? strictIdentifier) #aliasedQuery --- End diff -- Yay, unclear semantics. Ok, that is fine for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18006: [SQL][TRIVIAL] Lower parser log level to debug
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/18006 [SQL][TRIVIAL] Lower parser log level to debug ## What changes were proposed in this pull request? Currently the parser logs the query it is parsing at `info` level. This is too high, this PR lowers the log level to `debug`. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark lower_parser_log_level Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18006.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18006 commit 58ef75503f66aed7fb347a78277d6ca6b45942e8 Author: Herman van Hovell <hvanhov...@databricks.com> Date: 2017-05-16T20:10:00Z Lower parser log level to debug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17993#discussion_r116794140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] { } } +/** + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding + * value in conjunctive [[Expression Expressions]] + * eg. + * {{{ + * SELECT * FROM table WHERE i = 5 AND j = i + 3 + * ==> SELECT * FROM table WHERE i = 5 AND j = 8 + * }}} + */ +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { + + def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match { +case Not(_) => true +case Or(_, _) => true +case _ => + var result = false + expression.children.foreach { +case Not(_) => result = true +case Or(_, _) => result = true +case other => result = result || containsNonConjunctionPredicates(other) + } + result + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan => q transformExpressionsUp { + case and @ (left And right) +if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) => + +val leftEntries = left.collect { + case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) + case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) +} +val rightEntries = right.collect { + case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) + case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) +} +val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1)) +val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet + +def replaceConstants(expression: Expression) = expression transform { + case a: AttributeReference if constantsMap.contains(a) => +constantsMap.get(a).getOrElse(a) +} + +and transform { + case e @ EqualTo(_, _) if !predicates.contains(e) && +e.references.exists(ref => constantsMap.contains(ref)) => --- End diff -- Building the `references` map is more expensive, shall we just skip this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17993#discussion_r116793889 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] { } } +/** + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding + * value in conjunctive [[Expression Expressions]] + * eg. + * {{{ + * SELECT * FROM table WHERE i = 5 AND j = i + 3 + * ==> SELECT * FROM table WHERE i = 5 AND j = 8 + * }}} + */ +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { + + def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match { +case Not(_) => true +case Or(_, _) => true +case _ => + var result = false + expression.children.foreach { +case Not(_) => result = true +case Or(_, _) => result = true +case other => result = result || containsNonConjunctionPredicates(other) + } + result + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan => q transformExpressionsUp { + case and @ (left And right) +if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) => + +val leftEntries = left.collect { + case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) + case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) +} +val rightEntries = right.collect { + case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) + case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) +} +val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1)) +val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet + +def replaceConstants(expression: Expression) = expression transform { + case a: AttributeReference if constantsMap.contains(a) => --- End diff -- What happens if I do something stupid like `i = 1 and ((j = 1) = (j = i))`? I think `j = 1` might replaced by `1 = 1`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17993#discussion_r116792863 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] { } } +/** + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding + * value in conjunctive [[Expression Expressions]] + * eg. + * {{{ + * SELECT * FROM table WHERE i = 5 AND j = i + 3 + * ==> SELECT * FROM table WHERE i = 5 AND j = 8 + * }}} + */ +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { + + def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match { +case Not(_) => true +case Or(_, _) => true +case _ => + var result = false + expression.children.foreach { +case Not(_) => result = true +case Or(_, _) => result = true +case other => result = result || containsNonConjunctionPredicates(other) + } + result + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan => q transformExpressionsUp { + case and @ (left And right) +if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) => + +val leftEntries = left.collect { + case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) + case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) +} +val rightEntries = right.collect { + case e @ EqualTo(left: AttributeReference, right: Literal) => ((left, right), e) + case e @ EqualTo(left: Literal, right: AttributeReference) => ((right, left), e) +} +val constantsMap = AttributeMap(leftEntries.map(_._1) ++ rightEntries.map(_._1)) +val predicates = (leftEntries.map(_._2) ++ rightEntries.map(_._2)).toSet + +def replaceConstants(expression: Expression) = expression transform { + case a: AttributeReference if constantsMap.contains(a) => --- End diff -- I don't think the double lookup is necessary. `constantsMap.get(a).getOrElse(a)` should cover this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17993#discussion_r116792029 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] { } } +/** + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding + * value in conjunctive [[Expression Expressions]] + * eg. + * {{{ + * SELECT * FROM table WHERE i = 5 AND j = i + 3 + * ==> SELECT * FROM table WHERE i = 5 AND j = 8 + * }}} + */ +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { + + def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match { +case Not(_) => true +case Or(_, _) => true +case _ => + var result = false + expression.children.foreach { +case Not(_) => result = true +case Or(_, _) => result = true +case other => result = result || containsNonConjunctionPredicates(other) + } + result + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan => q transformExpressionsUp { + case and @ (left And right) +if !containsNonConjunctionPredicates(left) && !containsNonConjunctionPredicates(right) => + +val leftEntries = left.collect { --- End diff -- Lets put the collect in a function, so we can avoid the repetition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17993#discussion_r116790844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] { } } +/** + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding + * value in conjunctive [[Expression Expressions]] + * eg. + * {{{ + * SELECT * FROM table WHERE i = 5 AND j = i + 3 + * ==> SELECT * FROM table WHERE i = 5 AND j = 8 + * }}} + */ +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { + + def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match { +case Not(_) => true +case Or(_, _) => true +case _ => + var result = false + expression.children.foreach { +case Not(_) => result = true +case Or(_, _) => result = true +case other => result = result || containsNonConjunctionPredicates(other) + } + result + } + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan => q transformExpressionsUp { + case and @ (left And right) --- End diff -- `case and: And if containsNonConjunctionPredicates(and)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17993#discussion_r116790653 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] { } } +/** + * Substitutes [[Attribute Attributes]] which can be statically evaluated with their corresponding + * value in conjunctive [[Expression Expressions]] + * eg. + * {{{ + * SELECT * FROM table WHERE i = 5 AND j = i + 3 + * ==> SELECT * FROM table WHERE i = 5 AND j = 8 + * }}} + */ +object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper { + + def containsNonConjunctionPredicates(expression: Expression): Boolean = expression match { --- End diff -- This might be more straightfoward: ```scala expression.find { case _: Not | _: Or => true }.isDefined ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17998: [SPARK-20703][SQL][WIP] Add an operator for writing data...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17998 @shaneknapp is amplap jenkins down? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17935: [SPARK-20690][SQL] Subqueries in FROM should have...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17935#discussion_r116756100 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -473,7 +473,7 @@ identifierComment relationPrimary : tableIdentifier sample? (AS? strictIdentifier)? #tableName -| '(' queryNoWith ')' sample? (AS? strictIdentifier)? #aliasedQuery +| '(' queryNoWith ')' sample? (AS? strictIdentifier) #aliasedQuery --- End diff -- Should we also force an alias for the next line? `'(' relation ')' sample? (AS? strictIdentifier)? ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r116719332 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.{RowFrame, SpecifiedWindowFrame, UnboundedFollowing, UnboundedPreceding, UnspecifiedFrame} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor + + +class TransposeWindowSuite extends PlanTest { + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("CollapseProject", FixedPoint(100), CollapseProject, RemoveRedundantProject) :: + Batch("FlipWindow", Once, CollapseWindow, TransposeWindow) :: Nil + } + + val testRelation = LocalRelation('a.string, 'b.string, 'c.int, 'd.string) + + val a = testRelation.output(0) + val b = testRelation.output(1) + val c = testRelation.output(2) + val d = testRelation.output(3) + + val partitionSpec1 = Seq(a) + val partitionSpec2 = Seq(a, b) + val partitionSpec3 = Seq(d) + + val orderSpec1 = Seq(d.asc) + val orderSpec2 = Seq(d.desc) + + test("flip two adjacent windows with compatible partitions in multiple selects") { --- End diff -- I am not entirely sure if we need this test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r116719142 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala --- @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions.{RowFrame, SpecifiedWindowFrame, UnboundedFollowing, UnboundedPreceding, UnspecifiedFrame} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor + + +class TransposeWindowSuite extends PlanTest { + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = + Batch("CollapseProject", FixedPoint(100), CollapseProject, RemoveRedundantProject) :: + Batch("FlipWindow", Once, CollapseWindow, TransposeWindow) :: Nil + } + + val testRelation = LocalRelation('a.string, 'b.string, 'c.int, 'd.string) + + val a = testRelation.output(0) + val b = testRelation.output(1) + val c = testRelation.output(2) + val d = testRelation.output(3) + + val partitionSpec1 = Seq(a) + val partitionSpec2 = Seq(a, b) + val partitionSpec3 = Seq(d) + + val orderSpec1 = Seq(d.asc) + val orderSpec2 = Seq(d.desc) + + test("flip two adjacent windows with compatible partitions in multiple selects") { +val wexpr1 = windowExpr(sum('c), windowSpec(partitionSpec2, Seq.empty, UnspecifiedFrame)) +val wexpr2 = windowExpr(sum('c), windowSpec(partitionSpec1, Seq.empty, UnspecifiedFrame)) + +val query = testRelation + .select('a, 'b, 'c, wexpr1.as('sum_a_2)) + .select('a, 'b, 'c, 'sum_a_2, wexpr2.as('sum_a_1)) + +val optimized = Optimize.execute(query.analyze) + +val query2 = testRelation + .select('a, 'b, 'c) + .select('a, 'b, 'c, wexpr2.as('sum_a_1)) + .select('a, 'b, 'c, wexpr1.as('sum_a_2), 'sum_a_1) + .select('a, 'b, 'c, 'sum_a_2, 'sum_a_1) + +val correctAnswer = Optimize.execute(query2.analyze) + +comparePlans(optimized, correctAnswer) + } + + test("flip two adjacent windows with compatible partitions") { +val query = testRelation + .window(Seq(sum(c).as('sum_a_2)), partitionSpec2, orderSpec2) + .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, orderSpec1) + +val analyzed = query.analyze +val optimized = Optimize.execute(analyzed) + +val correctAnswer = testRelation + .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, orderSpec1) + .window(Seq(sum(c).as('sum_a_2)), partitionSpec2, orderSpec2) + .select('a, 'b, 'c, 'd, 'sum_a_2, 'sum_a_1) + +comparePlans(optimized, correctAnswer.analyze) + } + + test("don't flip two adjacent windows with incompatible partitions") { +val query = testRelation + .window(Seq(sum(c).as('sum_a_2)), partitionSpec3, Seq.empty) + .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, Seq.empty) + +val analyzed = query.analyze +val optimized = Optimize.execute(analyzed) + +val correctAnswer = testRelation + .window(Seq(sum(c).as('sum_a_2)), partitionSpec3, Seq.empty) + .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, Seq.empty) + +comparePlans(optimized, correctAnswer.analyze) --- End diff -- `comparePlans(optimized, analyzed)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infras
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r116718934 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala --- @@ -423,4 +423,25 @@ class DataFrameWindowFunctionsSuite extends QueryTest with SharedSQLContext { df.select(selectList: _*).where($"value" < 2), Seq(Row(3, "1", null, 3.0, 4.0, 3.0), Row(5, "1", false, 4.0, 5.0, 5.0))) } + + test("window functions in multiple selects") { --- End diff -- Why this test? It does not really add anything new. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r116718847 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] { } /** + * Transpose Adjacent Window Expressions. + * - If the partition spec of the parent Window expression is a subset of the partition spec + * of the child window expression, transpose them. + */ +object TransposeWindow extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) +if ps1.length < ps2.length && ps2.containsSlice(ps1) => --- End diff -- We might be able to get a little more milage out of the rule by using `semanticEquals` for comparing the partition expressions, e.g.: ```scala def sliceSemanticEquals(ps1: Seq[Expression], ps2: Seq[Expression]): Boolean = ps1.zip(ps2).forall { case (l, r) => l.semanticEquals(r) } ... sliceSemanticEquals(ps1, ps2) ``` You could even get more leverage if you do not consider the order of the partition spec. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r116696350 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] { } /** + * Transpose Adjacent Window Expressions. + * - If the partition spec of the parent Window expression is a subset of the partition spec + * of the child window expression, transpose them. + */ +object TransposeWindow extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) +if ps1.length < ps2.length && ps2.containsSlice(ps1) => --- End diff -- You need to check that the windows are independent, e.g.: `w1.references.intersect(w2.windowOutputSet).isEmpty` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r116693306 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] { } /** + * Transpose Adjacent Window Expressions. + * - If the partition spec of the parent Window expression is a subset of the partition spec + * of the child window expression, transpose them. + */ +object TransposeWindow extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) +if ps1.length < ps2.length && ps2.containsSlice(ps1) => + Project(w1.output, Window(we2, ps2, os2, Window(we1, ps1, os1, grandChild))) --- End diff -- This probably warrants a follow-up that tries to move projections that are wedged in between two window clauses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to flip adjacent...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17899 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17993 it is weird that jenkins is not kicking off --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17993 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17935: [SPARK-20690][SQL] Analyzer shouldn't add missing...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17935#discussion_r116561216 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -868,6 +868,29 @@ class SubquerySuite extends QueryTest with SharedSQLContext { Row(3, 3.0, 2, 3.0) :: Row(3, 3.0, 2, 3.0) :: Nil) } + test("SPARK-20690: Do not add missing attributes through subqueries") { +withTempView("onerow") { + Seq(1).toDF("c1").createOrReplaceTempView("onerow") + + val e = intercept[AnalysisException] { +sql( + """ +| select 1 +| from (select 1 from onerow t1 LIMIT 1) --- End diff -- Yeah, this seems confusing. Subqueries should be have an alias. Let's try to add that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17964: [SPARK-20725][SQL] partial aggregate should behave corre...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17964 @cloud-fan can you backport this to 2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17964: [SPARK-20725][SQL] partial aggregate should behave corre...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17964 LGTM - merging to master/2.2/2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17964: [SPARK-20725][SQL] partial aggregate should behave corre...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17964 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17960: [SPARK-20719] [SQL] Support LIMIT ALL
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17960 LGTM - merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17964: [SPARK-20725][SQL] partial aggregate should behav...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17964#discussion_r116306324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -429,17 +429,13 @@ object QueryPlan { * with its referenced ordinal from input attributes. It's similar to `BindReferences` but we * do not use `BindReferences` here as the plan may take the expression as a parameter with type * `Attribute`, and replace it with `BoundReference` will cause error. + * Note that, we may have missing attributes, e.g. in the final aggregate of 2-phase aggregation, + * we should normalize missing attributes too, with expr id -1. */ def normalizeExprId[T <: Expression](e: T, input: AttributeSeq): T = { e.transformUp { case s: SubqueryExpression => s.canonicalize(input) - case ar: AttributeReference => -val ordinal = input.indexOf(ar.exprId) -if (ordinal == -1) { - ar -} else { - ar.withExprId(ExprId(ordinal)) -} + case ar: AttributeReference => ar.withExprId(ExprId(input.indexOf(ar.exprId))) --- End diff -- Ok, so this works because of the way we plan aggregates and I am totally fine with this. I am slightly worried about non-complete aggregate expression that cannot be resolved and wreak havok further down the line because `sameResult` falsely evaluated to `true`. Can we special case non-complete aggregate expressions? From an architectural point of view it might be better to add this as a normalize function to Expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17960#discussion_r116300475 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -279,7 +279,12 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { // LIMIT withWindow.optional(limit) { - Limit(typedVisit(limit), withWindow) + if (ALL != null) { +// LIMIT ALL is the same as omitting the LIMIT clause --- End diff -- Can this be true? I don't think you need to check `ALL` if `limit != null` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for v...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17953#discussion_r116138305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1504,6 +1504,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { case ("decimal", precision :: Nil) => DecimalType(precision.getText.toInt, 0) case ("decimal", precision :: scale :: Nil) => DecimalType(precision.getText.toInt, scale.getText.toInt) + case ("void", Nil) => NullType --- End diff -- Apparently Hive can have null typed columns. So this should be the location where you'd want to change this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17953 @LantaoJin Can you add a description and a test case for this? You can take a look at the OrcSourceSuite to get an idea how to work with Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17953 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17939: [SPARK-19447] Remove remaining references to generated r...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17939 LGTM - merging to master/2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17920: [SPARK-17685][SQL] Make SortMergeJoinExec's currentVars ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17920 LGTM - merging to master/2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17921: [SPARK-19876][BUILD] Move Trigger.java to java source hi...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17921 LGTM - merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17899#discussion_r115370303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] { } /** + * Flip Adjacent Window Expressions. + * - If the partition spec of the parent Window expression is a subset of the partition spec + * of the child window expression, flip them. + */ +object FlipWindow extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { +case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild)) +if ps2.containsSlice(ps1) => --- End diff -- You are also changing the order of the columns. You will need to add a projection on top to be sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17666#discussion_r115361494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala --- @@ -57,19 +57,21 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { * A TVF maps argument lists to resolver functions that accept those arguments. Using a map * here allows for function overloading. */ - private type TVF = Map[ArgumentList, Seq[Any] => LogicalPlan] + private type TVF = Map[ArgumentList, (UnresolvedTableValuedFunction, Seq[Any]) => LogicalPlan] /** * TVF builder. */ - private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], LogicalPlan]) - : (ArgumentList, Seq[Any] => LogicalPlan) = { + private def tvf(args: (String, DataType)*)( + pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), LogicalPlan]) --- End diff -- I also think that we should separate the aliasing from constructing the table valued function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17666#discussion_r115361167 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala --- @@ -57,19 +57,21 @@ object ResolveTableValuedFunctions extends Rule[LogicalPlan] { * A TVF maps argument lists to resolver functions that accept those arguments. Using a map * here allows for function overloading. */ - private type TVF = Map[ArgumentList, Seq[Any] => LogicalPlan] + private type TVF = Map[ArgumentList, (UnresolvedTableValuedFunction, Seq[Any]) => LogicalPlan] /** * TVF builder. */ - private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], LogicalPlan]) - : (ArgumentList, Seq[Any] => LogicalPlan) = { + private def tvf(args: (String, DataType)*)( + pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), LogicalPlan]) --- End diff -- Hmmm... this signature is kind of complex. Can we try to use some kind of class/case class that encapsulates this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17666#discussion_r115358211 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -441,4 +440,15 @@ class AnalysisSuite extends AnalysisTest with ShouldMatchers { checkAnalysis(SubqueryAlias("tbl", testRelation).as("tbl2"), testRelation) } + + test("SPARK-20311 range(N) as alias") { +def rangeWithAliases(outputNames: Seq[String]): LogicalPlan = { + SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(7) :: Nil, outputNames)) --- End diff -- Should we also test different cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/17666#discussion_r115357687 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -498,12 +498,16 @@ case class Sort( /** Factory for constructing new `Range` nodes. */ object Range { - def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): Range = { -val output = StructType(StructField("id", LongType, nullable = false) :: Nil).toAttributes + + def apply(start: Long, end: Long, step: Long, numSlices: Option[Int], outputName: Option[String]) --- End diff -- Why make this an option? It is not optional right? The next function should either call Range(start, end, step, Some(numSlices), "id") or this function should have a default parameter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17736 For some reference. In 1.6 we used the Catalyst SqlParser to parse the expression in `Dataframe.filter()`, and we used the Hive (ANTLR based) parser for parsing for SQL commands. In Spark 2.0 we moved all of this to a single parser. When porting the parser, I followed the rules in the Hive parser (incl. the unescaping logic), and this fell through the cracks. Java/scala normal strings make things mind meltingly confusing. I think it is fair that we provide an option to disable the parser's unescaping as a way to get out of this. This might not be the best solution if you use regexes in both pure SQL and in scala at the same time, but it at least is an improvement. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17770 I am not a giant fan of the `resolveOperators*` approach, is is yet another code path that does something similar to the `transfrom*` code path, it introduces some mutable state, and I have been bitten by it before. I personally think that a barrier node (that is be a no-op for any rule) is a far more elegant solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17836 cc @michal-databricks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17838 LGTM - merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17823: [SPARK-20548] Disable ReplSuite.newProductSeqEncoder wit...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17823 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17784: [SPARK-20492][SQL] Do not print empty parentheses for in...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/17784 Yes, it can. Merging to master/2.2. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org