[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13706
  
cc @gatorsmile could you take a look at the way this interacts with the 
session catalog and the function registry?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118846647
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
columns.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != columns.size) {
+  throw new AnalysisException(s"Cannot support duplicate colNames " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}")
+}
+val macroFunction = funcWrapper.macroFunction.transform {
+  case u: UnresolvedAttribute =>
+val index = colToIndex.get(u.name).getOrElse(
+  throw new AnalysisException(s"Cannot find colName: ${u} " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}"))
+BoundReference(index, columns(index).dataType, 
columns(index).nullable)
+  case u: UnresolvedFunction =>
+sparkSession.sessionState.catalog.lookupFunction(u.name, 
u.children)
+  case s: SubqueryExpression =>
+throw new AnalysisException(s"Cannot support Subquery: ${s} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+  case u: UnresolvedGenerator =>
+throw new AnalysisException(s"Cannot support Generator: ${u} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+}
+
+val macroInfo = columns.mkString(",") + " -> " + 
funcWrapper.macroFunction.toString
+val info = new ExpressionInfo(macroInfo, macroName, true)
+val builder = (children: Seq[Expression]) => {
+  if (children.size != columns.size) {
+throw new AnalysisException(s"Actual number of columns: 
${children.size} != " +
+  s"expected number of columns: ${columns.size} for Macro 
$macroName")
+  }
+  macroFunction.transform {
+// Skip to validate the input type because check it at runtime.
--- End diff --

On a related note, we are currently not sure if the macro produces a valid 
expression. Maybe we should run analysis on the macro expression to make sure 
it does not fail every query later on, e.g.:
```scala

val resolvedMacroFunction = try {
  val plan = Project(Alias(macroFunction, "m")() :: Nil, OneRowRelation)
  val analyzed @ Project(Seq(named), OneRowRelation) =
sparkSession.sessionState.analyzer.execute(plan)
  sparkSession.sessionState.analyzer.checkAnalysis(analyzed)
  named.children.head
} catch {
  case a: AnalysisException =>
...
}
```
Note that we cannot use generato

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r11884
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
columns.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != columns.size) {
+  throw new AnalysisException(s"Cannot support duplicate colNames " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}")
+}
+val macroFunction = funcWrapper.macroFunction.transform {
+  case u: UnresolvedAttribute =>
+val index = colToIndex.get(u.name).getOrElse(
+  throw new AnalysisException(s"Cannot find colName: ${u} " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}"))
+BoundReference(index, columns(index).dataType, 
columns(index).nullable)
+  case u: UnresolvedFunction =>
+sparkSession.sessionState.catalog.lookupFunction(u.name, 
u.children)
+  case s: SubqueryExpression =>
+throw new AnalysisException(s"Cannot support Subquery: ${s} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+  case u: UnresolvedGenerator =>
+throw new AnalysisException(s"Cannot support Generator: ${u} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+}
+
+val macroInfo = columns.mkString(",") + " -> " + 
funcWrapper.macroFunction.toString
--- End diff --

Can you give an example of what this would look like?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844341
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
--- End diff --

Nit: put `}` on a new line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844463
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
columns.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != columns.size) {
+  throw new AnalysisException(s"Cannot support duplicate colNames " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}")
+}
+val macroFunction = funcWrapper.macroFunction.transform {
+  case u: UnresolvedAttribute =>
+val index = colToIndex.get(u.name).getOrElse(
+  throw new AnalysisException(s"Cannot find colName: ${u} " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}"))
+BoundReference(index, columns(index).dataType, 
columns(index).nullable)
+  case u: UnresolvedFunction =>
+sparkSession.sessionState.catalog.lookupFunction(u.name, 
u.children)
+  case s: SubqueryExpression =>
+throw new AnalysisException(s"Cannot support Subquery: ${s} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+  case u: UnresolvedGenerator =>
+throw new AnalysisException(s"Cannot support Generator: ${u} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+}
+
+val macroInfo = columns.mkString(",") + " -> " + 
funcWrapper.macroFunction.toString
+val info = new ExpressionInfo(macroInfo, macroName, true)
+val builder = (children: Seq[Expression]) => {
+  if (children.size != columns.size) {
--- End diff --

It is slightly better to `columns.size` in a separate variable, so we do 
not include `columns` in the closure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844485
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -716,6 +716,37 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
   }
 
   /**
+   * Create a [[CreateMacroCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+   * }}}
+   */
+  override def visitCreateMacro(ctx: CreateMacroContext): LogicalPlan = 
withOrigin(ctx) {
+val arguments = Option(ctx.colTypeList).map(visitColTypeList(_))
--- End diff --

Nit: you can avoid `(_)`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118845133
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 )
   }
 
+  test("create/drop temporary macro") {
--- End diff --

Should we also test a combination of temporary macros/functions...?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118845109
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 )
   }
 
+  test("create/drop temporary macro") {
--- End diff --

Can you also add a case for a macro without parameters? E.g.: `create 
temporary macro c() as 3E9`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844675
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -107,6 +110,14 @@ class SimpleFunctionRegistry extends FunctionRegistry {
 functionBuilders.remove(name).isDefined
   }
 
+  override def dropMacro(name: String): Boolean = synchronized {
--- End diff --

A drop function can currently also drop a macro. Can you make sure that 
this cannot happen?

Maybe we should consolidate this into a single drop function with a `macro` 
flag. cc @gatorsmile WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844322
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1516,6 +1516,35 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 )
   }
 
+  test("create/drop temporary macro") {
--- End diff --

Can you use `SQLQueryTestSuite` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844357
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
columns.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != columns.size) {
+  throw new AnalysisException(s"Cannot support duplicate colNames " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}")
+}
+val macroFunction = funcWrapper.macroFunction.transform {
+  case u: UnresolvedAttribute =>
+val index = colToIndex.get(u.name).getOrElse(
--- End diff --

We should respect the case-sensitivity settings here. So a lookup might not 
be the best idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844274
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
columns.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != columns.size) {
+  throw new AnalysisException(s"Cannot support duplicate colNames " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}")
+}
+val macroFunction = funcWrapper.macroFunction.transform {
+  case u: UnresolvedAttribute =>
+val index = colToIndex.get(u.name).getOrElse(
+  throw new AnalysisException(s"Cannot find colName: ${u} " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}"))
+BoundReference(index, columns(index).dataType, 
columns(index).nullable)
+  case u: UnresolvedFunction =>
+sparkSession.sessionState.catalog.lookupFunction(u.name, 
u.children)
+  case s: SubqueryExpression =>
+throw new AnalysisException(s"Cannot support Subquery: ${s} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+  case u: UnresolvedGenerator =>
--- End diff --

Is this what Hive does? I really don't see why we should not support this.

Please note that we cannot use generators if we decide that an expression 
has to be a fully resolved expression at creation time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118845638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -1090,6 +1090,24 @@ class SessionCatalog(
 }
   }
 
+  /** Create a temporary macro. */
+  def createTempMacro(
+  name: String,
+  info: ExpressionInfo,
+  functionBuilder: FunctionBuilder): Unit = {
+if (functionRegistry.functionExists(name)) {
--- End diff --

I am not entirely sure if we should throw an exception here. It 
unfortunately depends on the semantics you follow, SQL will throw an exception, 
whereas the Dataframe API will just overwrite the function. Let's follow Hive 
for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118846490
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
+  AttributeReference(col.name, col.dataType, col.nullable, 
col.metadata)() }
+val colToIndex: Map[String, Int] = 
columns.map(_.name).zipWithIndex.toMap
+if (colToIndex.size != columns.size) {
+  throw new AnalysisException(s"Cannot support duplicate colNames " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}")
+}
+val macroFunction = funcWrapper.macroFunction.transform {
+  case u: UnresolvedAttribute =>
+val index = colToIndex.get(u.name).getOrElse(
+  throw new AnalysisException(s"Cannot find colName: ${u} " +
+s"for CREATE TEMPORARY MACRO $macroName, actual columns: 
${columns.mkString(",")}"))
+BoundReference(index, columns(index).dataType, 
columns(index).nullable)
+  case u: UnresolvedFunction =>
+sparkSession.sessionState.catalog.lookupFunction(u.name, 
u.children)
+  case s: SubqueryExpression =>
+throw new AnalysisException(s"Cannot support Subquery: ${s} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+  case u: UnresolvedGenerator =>
+throw new AnalysisException(s"Cannot support Generator: ${u} " +
+  s"for CREATE TEMPORARY MACRO $macroName")
+}
+
+val macroInfo = columns.mkString(",") + " -> " + 
funcWrapper.macroFunction.toString
+val info = new ExpressionInfo(macroInfo, macroName, true)
+val builder = (children: Seq[Expression]) => {
+  if (children.size != columns.size) {
+throw new AnalysisException(s"Actual number of columns: 
${children.size} != " +
+  s"expected number of columns: ${columns.size} for Macro 
$macroName")
+  }
+  macroFunction.transform {
+// Skip to validate the input type because check it at runtime.
--- End diff --

How do we check at runtime? The current code does not seem to respect the 
types passed, and rely on the macro's expression to do some type validation, 
this means you can pass anything to the macro and the user can end up with an 
unexpected result:
```sql
create macro plus(a int, b int) as a + b;
select plus(1.0, 1.0) as result -- This returns a decimal, and not an int 
as expected
```
So I think we should at least validate the input expressions. The hacky way 
would be to add casts, and have the analyzer fail if the cast cannot be made 
(this is terrible UX). A better way to would be to create some sentinel 
expression that makes sure the ana

[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844406
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
+
+/**
+ * The DDL command that creates a macro.
+ * To create a temporary macro, the syntax of using this command in SQL is:
+ * {{{
+ *CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) 
expression;
+ * }}}
+ */
+case class CreateMacroCommand(
+macroName: String,
+funcWrapper: MacroFunctionWrapper)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val columns = funcWrapper.columns.map { col =>
--- End diff --

It might easier to use `StructType().toAttributes` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13706: [SPARK-15988] [SQL] Implement DDL commands: Creat...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13706#discussion_r118844209
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/macros.scala ---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import org.apache.spark.sql.{AnalysisException, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis._
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.StructField
+
+/**
+ * This class provides arguments and body expression of the macro function.
+ */
+case class MacroFunctionWrapper(columns: Seq[StructField], macroFunction: 
Expression)
--- End diff --

Why do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18079#discussion_r118842688
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/aliases.sql ---
@@ -0,0 +1,17 @@
+-- Test data.
--- End diff --

How about we name this file `table-aliases.sql`; that seems a little bit 
less confusing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18079#discussion_r118842345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -676,9 +676,12 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
* Create an aliased table reference. This is typically used in FROM 
clauses.
*/
   override def visitTableName(ctx: TableNameContext): LogicalPlan = 
withOrigin(ctx) {
-val table = 
UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier))
-
-val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match 
{
+val tableId = visitTableIdentifier(ctx.tableIdentifier)
+val table = Option(ctx.tableAlias.identifierList) match {
--- End diff --

...or something like this:
```scala
val outputNames = 
Option(ctx.tableAlias.identifierList).map(visitIdentifierList).getOrElse(Nil)
val table = UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier), 
outputNames)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18079#discussion_r118842292
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -676,9 +676,12 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
* Create an aliased table reference. This is typically used in FROM 
clauses.
*/
   override def visitTableName(ctx: TableNameContext): LogicalPlan = 
withOrigin(ctx) {
-val table = 
UnresolvedRelation(visitTableIdentifier(ctx.tableIdentifier))
-
-val tableWithAlias = Option(ctx.strictIdentifier).map(_.getText) match 
{
+val tableId = visitTableIdentifier(ctx.tableIdentifier)
+val table = Option(ctx.tableAlias.identifierList) match {
--- End diff --

Just if/else? Seems a bit heavy weight to wrap in an option...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18079: [SPARK-20841][SQL] Support table column aliases i...

2017-05-28 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18079#discussion_r118842186
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -711,7 +711,7 @@ nonReserved
 | ADD
 | OVER | PARTITION | RANGE | ROWS | PRECEDING | FOLLOWING | CURRENT | 
ROW | LAST | FIRST | AFTER
 | MAP | ARRAY | STRUCT
-| LATERAL | WINDOW | REDUCE | TRANSFORM | USING | SERDE | 
SERDEPROPERTIES | RECORDREADER
--- End diff --

I have added as much to the non-reserved keyword list as possible (without 
creating ambiguities). The reason for this is that many datasources (for 
instance twitter4j) unfortunately use reserved keywords for column names, and 
working with these was quite cumbersome. I took the pragmatic approach.

If we want to change this, then we need to do the same Hive did and create 
a config flag. We remove them for Spark 3.0...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18072: [SPARK-20857][SQL] Generic resolved hint node

2017-05-23 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18072
  
LGTM - pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18069: [SPARK-20850][SQL]Improve division and multiplication mi...

2017-05-23 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18069
  
I don't think this makes any sense:

- The parser is not the place where you should make this change. This 
should be done in `TypeCoercion`. Now we create different semantics between SQL 
and the DataFrame API.
- This is a breaking change to result type of `Divide`, and will probably 
break Hive compatibility. This is going break to user code in a lot of places. 
I also think it is weird that we are going to always return `Double` when a 
`Decimal` is completely valid. 
- I think MySql returns a `Decimal` and not a `Double`.

So I am going to -1 this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117813681
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

I don't think sql a-like syntax is really useful here. How about we create 
a special cased `col` function that takes a regex?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18057: [SPARK-20786][SQL][Backport-2.2]Improve ceil and floor h...

2017-05-22 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18057
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18054: [SPARK-20763][SQL][Backport-2.1] The function of `month`...

2017-05-22 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18054
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-19 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18005
  
LGTM - merging to master/2.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-19 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18030
  
LGTM - merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117367232
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+val expandedAttributes: Seq[Attribute] = table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(expr))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, 
t)).nonEmpty)
--- End diff --

`input.output.filter(_.qualifier.exists(resolver(_, t)))` is a bit more 
concise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117379878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
+case r@escapedIdentifier(i) =>
+  UnresolvedRegex(i, Some(unresolved_attr.name))
+case _ =>
+  UnresolvedAttribute(nameParts :+ attr)
+  })
+  return ret.get
+}
+
 UnresolvedAttribute(nameParts :+ attr)
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

We don't need to compile the same regex over and over. Can you move this to 
the ParserUtils...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117367155
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
--- End diff --

`expr` is the pattern right? Maybe we should give it a better name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117380037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
--- End diff --

Using an option here does not add a thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117366828
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+val expandedAttributes: Seq[Attribute] = table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(expr))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, 
t)).nonEmpty)
+.filter(_.name.matches(expr))
+}
+
+expandedAttributes.zip(input.output).map {
--- End diff --

An `Attribute` is always a `NamedExpression`, why do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117368022
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

Please use a guard, e.g.: `case unresolved_attr @ 
UnresolvedAttribute(nameParts) if conf.supportQuotedIdentifiers => `. That 
makes the logic down the line much simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117380055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
+case r@escapedIdentifier(i) =>
+  UnresolvedRegex(i, Some(unresolved_attr.name))
+case _ =>
+  UnresolvedAttribute(nameParts :+ attr)
+  })
+  return ret.get
+}
+
 UnresolvedAttribute(nameParts :+ attr)
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.getStart).map(_.getText match {
--- End diff --

Using an option here does not add a thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117367722
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

We don't need to compile the same regex over and over. Can you move this to 
the ParserUtils...

I am also wondering if we shouldn't do the match in the parser it self.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117289000
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
+  s"""
+long longVal = (long)${lon};
+short shortVal;
+if (longVal > 255) {
+  shortVal = (short)(longVal % 256);
+} else {
+  (short)longVal;
+}
+if (shortVal == 0) {
+  ${ev.value} = String.valueOf('\u');
--- End diff --

You really should put this in a constant field. It does not make sense to 
create the same constant over and over...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117378356
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
+  s"""
+long longVal = (long)${lon};
+short shortVal;
+if (longVal > 255) {
--- End diff --

It might be easier to something like this:
```java
if (lon < 0) {
  ${ev.value} = UTF8String.fromString(""); // This should be put in a 
constant
} else if (($lon & 0xFF) == 0) {
${ev.value} = UTF8String.fromString("\u"); // This should be put in 
a variable
} else {
  char c = (char)($lon & 0xFF);
  // This is quite slow, it might be better to create the bytes directly.
  ${ev.value} = UTF8String.fromString(String.valueOf(c));  
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117378811
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StringFunctionsSuite.scala ---
@@ -117,6 +117,41 @@ class StringFunctionsSuite extends QueryTest with 
SharedSQLContext {
   Row(97, 0))
   }
 
+  test("string ch[a]r function") {
--- End diff --

Please move this to catalyst. Look at the `StringExpressionsSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117375988
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
+  s"""
+long longVal = (long)${lon};
+short shortVal;
+if (longVal > 255) {
+  shortVal = (short)(longVal % 256);
+} else {
+  (short)longVal;
+}
+if (shortVal == 0) {
+  ${ev.value} = String.valueOf('\u');
+} else if (shortVal < 0) {
+  ${ev.value} = "";
+} else {
+  ${ev.value} = (short)shortVal;
--- End diff --

You cannot assign a `short` to a `UTF8String`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117378749
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2138,6 +2138,40 @@ object functions {
   def ascii(e: Column): Column = withExpr { Ascii(e.expr) }
 
   /**
+   * Returns the ASCII character having the binary equivalent to expr.
--- End diff --

let's not do this just yet; adding it to the function registry is enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117250411
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
--- End diff --

I am 100% this is not being tested. Can you check how other expression test 
the code they generate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117376928
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
--- End diff --

`defineCodeGen` does not work here. This is a statement instead of an 
expression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117375494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
+  s"""
+long longVal = (long)${lon};
+short shortVal;
+if (longVal > 255) {
+  shortVal = (short)(longVal % 256);
+} else {
+  (short)longVal;
+}
+if (shortVal == 0) {
+  ${ev.value} = String.valueOf('\u');
+} else if (shortVal < 0) {
+  ${ev.value} = "";
--- End diff --

You cannot assign a `String` to a `UTF8String`. Also make this a constant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117376367
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
+  s"""
+long longVal = (long)${lon};
--- End diff --

Cast not needed :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117250008
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(expr) - Returns the ASCII character having the binary 
equivalent to `expr`. If n is larger than 256 the result is equivalent to chr(n 
% 256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, lon => {
+  s"""
+long longVal = (long)${lon};
+short shortVal;
+if (longVal > 255) {
--- End diff --

What about negative values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18030
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18030: [SPARK-20798] GenerateUnsafeProjection should check if a...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18030
  
LGTM - pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18012
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18012: [SPARK-20779][Examples]The ASF header placed in an incor...

2017-05-18 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18012
  
(let's see if jenkins picks this up)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18019: [SPARK-20748][SQL] Add built-in SQL function CH[A...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18019#discussion_r117076510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1268,6 +1268,59 @@ case class Ascii(child: Expression) extends 
UnaryExpression with ImplicitCastInp
 }
 
 /**
+ * Returns the ASCII character having the binary equivalent to n.
+ * If n is larger than 256 the result is equivalent to chr(n % 256)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(n) - Returns the ASCII character having the binary 
equivalent to `n`. If n is larger than 256 the result is equivalent to chr(n % 
256)",
+  extended = """
+Examples:
+  > SELECT _FUNC_(65);
+   A
+ """)
+// scalastyle:on line.size.limit
+case class Chr(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[DataType] = Seq(LongType)
+
+  protected override def nullSafeEval(value: Any): Any = {
+val longVal = value.asInstanceOf[Long]
+val shortVal = if (longVal > 255) (longVal % 256).toShort else 
longVal.toShort
+val stringVal = if (shortVal == 0) {
+  String.valueOf('\u')
+} else if (shortVal < 0) {
+  ""
+} else {
+  shortVal.toChar.toString
+}
+UTF8String.fromString(stringVal)
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+defineCodeGen(ctx, ev, str => {
+  s"""
+long longVal = ${java.lang.Long.parseLong(new 
String(str.getBytes))};
--- End diff --

Isn't the input supposed to be a long?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18005
  
What do you mean by catalyst blew up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18005
  
LGTM pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeField...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18005#discussion_r117029865
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
 ---
@@ -90,7 +90,7 @@ private[parquet] class ParquetWriteSupport extends 
WriteSupport[InternalRow] wit
 }
 
 
-this.rootFieldWriters = schema.map(_.dataType).map(makeWriter)
+this.rootFieldWriters = 
schema.map(_.dataType).map(makeWriter).toArray[ValueWriter]
--- End diff --

Either call toIndexedSeq or make the rootFieldWriters an Array. Both are 
fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeField...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18005#discussion_r117029723
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
 ---
@@ -90,7 +90,7 @@ private[parquet] class ParquetWriteSupport extends 
WriteSupport[InternalRow] wit
 }
 
 
-this.rootFieldWriters = schema.map(_.dataType).map(makeWriter).toArray
+this.rootFieldWriters = 
schema.map(_.dataType).map(makeWriter).toArray[ValueWriter]
--- End diff --

Either call `toIndexedSeq` or make the `rootFieldWriters` an Array. Both 
are fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil handle the value which is...

2017-05-17 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18016
  
+1 on Sean's comment. This is **not** a parser issue. Can you just fix this 
by adding `LongType` to `Ceil.inputTypes`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18005
  
Can you also make sure that we do not use a `Seq` for struct writing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18005: [SPARK-20773][SQL] ParquetWriteSupport.writeFields is qu...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/18005
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17935: [SPARK-20690][SQL] Subqueries in FROM should have...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17935#discussion_r116851595
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -473,7 +473,7 @@ identifierComment
 
 relationPrimary
 : tableIdentifier sample? (AS? strictIdentifier)?  #tableName
-| '(' queryNoWith ')' sample? (AS? strictIdentifier)?  #aliasedQuery
+| '(' queryNoWith ')' sample? (AS? strictIdentifier)   #aliasedQuery
--- End diff --

Yay, unclear semantics. Ok, that is fine for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18006: [SQL][TRIVIAL] Lower parser log level to debug

2017-05-16 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/18006

[SQL][TRIVIAL] Lower parser log level to debug

## What changes were proposed in this pull request?
Currently the parser logs the query it is parsing at `info` level. This is 
too high, this PR lowers the log level to `debug`.

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark lower_parser_log_level

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18006.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18006


commit 58ef75503f66aed7fb347a78277d6ca6b45942e8
Author: Herman van Hovell <hvanhov...@databricks.com>
Date:   2017-05-16T20:10:00Z

Lower parser log level to debug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17993#discussion_r116794140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitutes [[Attribute Attributes]] which can be statically evaluated 
with their corresponding
+ * value in conjunctive [[Expression Expressions]]
+ * eg.
+ * {{{
+ *   SELECT * FROM table WHERE i = 5 AND j = i + 3
+ *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
+ * }}}
+ */
+object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
+
+  def containsNonConjunctionPredicates(expression: Expression): Boolean = 
expression match {
+case Not(_) => true
+case Or(_, _) => true
+case _ =>
+  var result = false
+  expression.children.foreach {
+case Not(_) => result = true
+case Or(_, _) => result = true
+case other => result = result || 
containsNonConjunctionPredicates(other)
+  }
+  result
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan => q transformExpressionsUp {
+  case and @ (left And right)
+if !containsNonConjunctionPredicates(left) && 
!containsNonConjunctionPredicates(right) =>
+
+val leftEntries = left.collect {
+  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
+  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
+}
+val rightEntries = right.collect {
+  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
+  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
+}
+val constantsMap = AttributeMap(leftEntries.map(_._1) ++ 
rightEntries.map(_._1))
+val predicates = (leftEntries.map(_._2) ++ 
rightEntries.map(_._2)).toSet
+
+def replaceConstants(expression: Expression) = expression 
transform {
+  case a: AttributeReference if constantsMap.contains(a) =>
+constantsMap.get(a).getOrElse(a)
+}
+
+and transform {
+  case e @ EqualTo(_, _) if !predicates.contains(e) &&
+e.references.exists(ref => constantsMap.contains(ref)) =>
--- End diff --

Building the `references` map is more expensive, shall we just skip this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17993#discussion_r116793889
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitutes [[Attribute Attributes]] which can be statically evaluated 
with their corresponding
+ * value in conjunctive [[Expression Expressions]]
+ * eg.
+ * {{{
+ *   SELECT * FROM table WHERE i = 5 AND j = i + 3
+ *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
+ * }}}
+ */
+object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
+
+  def containsNonConjunctionPredicates(expression: Expression): Boolean = 
expression match {
+case Not(_) => true
+case Or(_, _) => true
+case _ =>
+  var result = false
+  expression.children.foreach {
+case Not(_) => result = true
+case Or(_, _) => result = true
+case other => result = result || 
containsNonConjunctionPredicates(other)
+  }
+  result
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan => q transformExpressionsUp {
+  case and @ (left And right)
+if !containsNonConjunctionPredicates(left) && 
!containsNonConjunctionPredicates(right) =>
+
+val leftEntries = left.collect {
+  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
+  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
+}
+val rightEntries = right.collect {
+  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
+  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
+}
+val constantsMap = AttributeMap(leftEntries.map(_._1) ++ 
rightEntries.map(_._1))
+val predicates = (leftEntries.map(_._2) ++ 
rightEntries.map(_._2)).toSet
+
+def replaceConstants(expression: Expression) = expression 
transform {
+  case a: AttributeReference if constantsMap.contains(a) =>
--- End diff --

What happens if I do something stupid like `i = 1 and ((j = 1) = (j = i))`? 
I think `j = 1` might replaced by `1 = 1`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17993#discussion_r116792863
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitutes [[Attribute Attributes]] which can be statically evaluated 
with their corresponding
+ * value in conjunctive [[Expression Expressions]]
+ * eg.
+ * {{{
+ *   SELECT * FROM table WHERE i = 5 AND j = i + 3
+ *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
+ * }}}
+ */
+object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
+
+  def containsNonConjunctionPredicates(expression: Expression): Boolean = 
expression match {
+case Not(_) => true
+case Or(_, _) => true
+case _ =>
+  var result = false
+  expression.children.foreach {
+case Not(_) => result = true
+case Or(_, _) => result = true
+case other => result = result || 
containsNonConjunctionPredicates(other)
+  }
+  result
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan => q transformExpressionsUp {
+  case and @ (left And right)
+if !containsNonConjunctionPredicates(left) && 
!containsNonConjunctionPredicates(right) =>
+
+val leftEntries = left.collect {
+  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
+  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
+}
+val rightEntries = right.collect {
+  case e @ EqualTo(left: AttributeReference, right: Literal) => 
((left, right), e)
+  case e @ EqualTo(left: Literal, right: AttributeReference) => 
((right, left), e)
+}
+val constantsMap = AttributeMap(leftEntries.map(_._1) ++ 
rightEntries.map(_._1))
+val predicates = (leftEntries.map(_._2) ++ 
rightEntries.map(_._2)).toSet
+
+def replaceConstants(expression: Expression) = expression 
transform {
+  case a: AttributeReference if constantsMap.contains(a) =>
--- End diff --

I don't think the double lookup is necessary. 
`constantsMap.get(a).getOrElse(a)` should cover this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17993#discussion_r116792029
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitutes [[Attribute Attributes]] which can be statically evaluated 
with their corresponding
+ * value in conjunctive [[Expression Expressions]]
+ * eg.
+ * {{{
+ *   SELECT * FROM table WHERE i = 5 AND j = i + 3
+ *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
+ * }}}
+ */
+object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
+
+  def containsNonConjunctionPredicates(expression: Expression): Boolean = 
expression match {
+case Not(_) => true
+case Or(_, _) => true
+case _ =>
+  var result = false
+  expression.children.foreach {
+case Not(_) => result = true
+case Or(_, _) => result = true
+case other => result = result || 
containsNonConjunctionPredicates(other)
+  }
+  result
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan => q transformExpressionsUp {
+  case and @ (left And right)
+if !containsNonConjunctionPredicates(left) && 
!containsNonConjunctionPredicates(right) =>
+
+val leftEntries = left.collect {
--- End diff --

Lets put the collect in a function, so we can avoid the repetition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17993#discussion_r116790844
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitutes [[Attribute Attributes]] which can be statically evaluated 
with their corresponding
+ * value in conjunctive [[Expression Expressions]]
+ * eg.
+ * {{{
+ *   SELECT * FROM table WHERE i = 5 AND j = i + 3
+ *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
+ * }}}
+ */
+object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
+
+  def containsNonConjunctionPredicates(expression: Expression): Boolean = 
expression match {
+case Not(_) => true
+case Or(_, _) => true
+case _ =>
+  var result = false
+  expression.children.foreach {
+case Not(_) => result = true
+case Or(_, _) => result = true
+case other => result = result || 
containsNonConjunctionPredicates(other)
+  }
+  result
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan => q transformExpressionsUp {
+  case and @ (left And right)
--- End diff --

`case and: And if containsNonConjunctionPredicates(and)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17993: [SPARK-20758][SQL] Add Constant propagation optim...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17993#discussion_r116790653
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -54,6 +54,59 @@ object ConstantFolding extends Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Substitutes [[Attribute Attributes]] which can be statically evaluated 
with their corresponding
+ * value in conjunctive [[Expression Expressions]]
+ * eg.
+ * {{{
+ *   SELECT * FROM table WHERE i = 5 AND j = i + 3
+ *   ==>  SELECT * FROM table WHERE i = 5 AND j = 8
+ * }}}
+ */
+object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper {
+
+  def containsNonConjunctionPredicates(expression: Expression): Boolean = 
expression match {
--- End diff --

This might be more straightfoward:
```scala
expression.find {
  case _: Not | _: Or => true
}.isDefined
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17998: [SPARK-20703][SQL][WIP] Add an operator for writing data...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17998
  
@shaneknapp is amplap jenkins down?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17935: [SPARK-20690][SQL] Subqueries in FROM should have...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17935#discussion_r116756100
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -473,7 +473,7 @@ identifierComment
 
 relationPrimary
 : tableIdentifier sample? (AS? strictIdentifier)?  #tableName
-| '(' queryNoWith ')' sample? (AS? strictIdentifier)?  #aliasedQuery
+| '(' queryNoWith ')' sample? (AS? strictIdentifier)   #aliasedQuery
--- End diff --

Should we also force an alias for the next line?  `'(' relation ')' sample? 
(AS? strictIdentifier)? `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r116719332
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.{RowFrame, 
SpecifiedWindowFrame, UnboundedFollowing, UnboundedPreceding, UnspecifiedFrame}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+
+class TransposeWindowSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("CollapseProject", FixedPoint(100), CollapseProject, 
RemoveRedundantProject) ::
+  Batch("FlipWindow", Once, CollapseWindow, TransposeWindow) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.string, 'b.string, 'c.int, 'd.string)
+
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+  val d = testRelation.output(3)
+
+  val partitionSpec1 = Seq(a)
+  val partitionSpec2 = Seq(a, b)
+  val partitionSpec3 = Seq(d)
+
+  val orderSpec1 = Seq(d.asc)
+  val orderSpec2 = Seq(d.desc)
+
+  test("flip two adjacent windows with compatible partitions in multiple 
selects") {
--- End diff --

I am not entirely sure if we need this test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r116719142
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TransposeWindowSuite.scala
 ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.{RowFrame, 
SpecifiedWindowFrame, UnboundedFollowing, UnboundedPreceding, UnspecifiedFrame}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, 
LogicalPlan}
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+
+class TransposeWindowSuite extends PlanTest {
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("CollapseProject", FixedPoint(100), CollapseProject, 
RemoveRedundantProject) ::
+  Batch("FlipWindow", Once, CollapseWindow, TransposeWindow) :: Nil
+  }
+
+  val testRelation = LocalRelation('a.string, 'b.string, 'c.int, 'd.string)
+
+  val a = testRelation.output(0)
+  val b = testRelation.output(1)
+  val c = testRelation.output(2)
+  val d = testRelation.output(3)
+
+  val partitionSpec1 = Seq(a)
+  val partitionSpec2 = Seq(a, b)
+  val partitionSpec3 = Seq(d)
+
+  val orderSpec1 = Seq(d.asc)
+  val orderSpec2 = Seq(d.desc)
+
+  test("flip two adjacent windows with compatible partitions in multiple 
selects") {
+val wexpr1 = windowExpr(sum('c), windowSpec(partitionSpec2, Seq.empty, 
UnspecifiedFrame))
+val wexpr2 = windowExpr(sum('c), windowSpec(partitionSpec1, Seq.empty, 
UnspecifiedFrame))
+
+val query = testRelation
+  .select('a, 'b, 'c, wexpr1.as('sum_a_2))
+  .select('a, 'b, 'c, 'sum_a_2, wexpr2.as('sum_a_1))
+
+val optimized = Optimize.execute(query.analyze)
+
+val query2 = testRelation
+  .select('a, 'b, 'c)
+  .select('a, 'b, 'c, wexpr2.as('sum_a_1))
+  .select('a, 'b, 'c, wexpr1.as('sum_a_2), 'sum_a_1)
+  .select('a, 'b, 'c, 'sum_a_2, 'sum_a_1)
+
+val correctAnswer = Optimize.execute(query2.analyze)
+
+comparePlans(optimized, correctAnswer)
+  }
+
+  test("flip two adjacent windows with compatible partitions") {
+val query = testRelation
+  .window(Seq(sum(c).as('sum_a_2)), partitionSpec2, orderSpec2)
+  .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, orderSpec1)
+
+val analyzed = query.analyze
+val optimized = Optimize.execute(analyzed)
+
+val correctAnswer = testRelation
+  .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, orderSpec1)
+  .window(Seq(sum(c).as('sum_a_2)), partitionSpec2, orderSpec2)
+  .select('a, 'b, 'c, 'd, 'sum_a_2, 'sum_a_1)
+
+comparePlans(optimized, correctAnswer.analyze)
+  }
+
+  test("don't flip two adjacent windows with incompatible partitions") {
+val query = testRelation
+  .window(Seq(sum(c).as('sum_a_2)), partitionSpec3, Seq.empty)
+  .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, Seq.empty)
+
+val analyzed = query.analyze
+val optimized = Optimize.execute(analyzed)
+
+val correctAnswer = testRelation
+  .window(Seq(sum(c).as('sum_a_2)), partitionSpec3, Seq.empty)
+  .window(Seq(sum(c).as('sum_a_1)), partitionSpec1, Seq.empty)
+
+comparePlans(optimized, correctAnswer.analyze)
--- End diff --

`comparePlans(optimized, analyzed)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infras

[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r116718934
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
 ---
@@ -423,4 +423,25 @@ class DataFrameWindowFunctionsSuite extends QueryTest 
with SharedSQLContext {
   df.select(selectList: _*).where($"value" < 2),
   Seq(Row(3, "1", null, 3.0, 4.0, 3.0), Row(5, "1", false, 4.0, 5.0, 
5.0)))
   }
+
+  test("window functions in multiple selects") {
--- End diff --

Why this test? It does not really add anything new.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r116718847
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] {
 }
 
 /**
+ * Transpose Adjacent Window Expressions.
+ * - If the partition spec of the parent Window expression is a subset of 
the partition spec
+ *   of the child window expression, transpose them.
+ */
+object TransposeWindow extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild))
+if ps1.length < ps2.length && ps2.containsSlice(ps1) =>
--- End diff --

We might be able to get a little more milage out of the rule by using 
`semanticEquals` for comparing the partition expressions, e.g.:
```scala
def sliceSemanticEquals(ps1: Seq[Expression], ps2: Seq[Expression]): 
Boolean = ps1.zip(ps2).forall {
  case (l, r) => l.semanticEquals(r)
}
...
sliceSemanticEquals(ps1, ps2)
``` 

You could even get more leverage if you do not consider the order of the 
partition spec.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r116696350
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] {
 }
 
 /**
+ * Transpose Adjacent Window Expressions.
+ * - If the partition spec of the parent Window expression is a subset of 
the partition spec
+ *   of the child window expression, transpose them.
+ */
+object TransposeWindow extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild))
+if ps1.length < ps2.length && ps2.containsSlice(ps1) =>
--- End diff --

You need to check that the windows are independent, e.g.: 
`w1.references.intersect(w2.windowOutputSet).isEmpty`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r116693306
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] {
 }
 
 /**
+ * Transpose Adjacent Window Expressions.
+ * - If the partition spec of the parent Window expression is a subset of 
the partition spec
+ *   of the child window expression, transpose them.
+ */
+object TransposeWindow extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild))
+if ps1.length < ps2.length && ps2.containsSlice(ps1) =>
+  Project(w1.output, Window(we2, ps2, os2, Window(we1, ps1, os1, 
grandChild)))
--- End diff --

This probably warrants a follow-up that tries to move projections that are 
wedged in between two window clauses. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17899: [SPARK-20636] Add new optimization rule to flip adjacent...

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17899
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17993
  
it is weird that jenkins is not kicking off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17993: [SPARK-20758][SQL] Add Constant propagation optimization

2017-05-16 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17993
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17935: [SPARK-20690][SQL] Analyzer shouldn't add missing...

2017-05-15 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17935#discussion_r116561216
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -868,6 +868,29 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(3, 3.0, 2, 3.0) :: Row(3, 3.0, 2, 3.0) :: Nil)
   }
 
+  test("SPARK-20690: Do not add missing attributes through subqueries") {
+withTempView("onerow") {
+  Seq(1).toDF("c1").createOrReplaceTempView("onerow")
+
+  val e = intercept[AnalysisException] {
+sql(
+  """
+| select 1
+| from   (select 1 from onerow t1 LIMIT 1)
--- End diff --

Yeah, this seems confusing. Subqueries should be have an alias. Let's try 
to add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17964: [SPARK-20725][SQL] partial aggregate should behave corre...

2017-05-13 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17964
  
@cloud-fan can you backport this to 2.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17964: [SPARK-20725][SQL] partial aggregate should behave corre...

2017-05-13 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17964
  
LGTM - merging to master/2.2/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17964: [SPARK-20725][SQL] partial aggregate should behave corre...

2017-05-13 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17964
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-12 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17960
  
LGTM - merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17964: [SPARK-20725][SQL] partial aggregate should behav...

2017-05-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17964#discussion_r116306324
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
@@ -429,17 +429,13 @@ object QueryPlan {
* with its referenced ordinal from input attributes. It's similar to 
`BindReferences` but we
* do not use `BindReferences` here as the plan may take the expression 
as a parameter with type
* `Attribute`, and replace it with `BoundReference` will cause error.
+   * Note that, we may have missing attributes, e.g. in the final 
aggregate of 2-phase aggregation,
+   * we should normalize missing attributes too, with expr id -1.
*/
   def normalizeExprId[T <: Expression](e: T, input: AttributeSeq): T = {
 e.transformUp {
   case s: SubqueryExpression => s.canonicalize(input)
-  case ar: AttributeReference =>
-val ordinal = input.indexOf(ar.exprId)
-if (ordinal == -1) {
-  ar
-} else {
-  ar.withExprId(ExprId(ordinal))
-}
+  case ar: AttributeReference => 
ar.withExprId(ExprId(input.indexOf(ar.exprId)))
--- End diff --

Ok, so this works because of the way we plan aggregates and I am totally 
fine with this.

I am slightly worried about non-complete aggregate expression that cannot 
be resolved and wreak havok further down the line because `sameResult` falsely 
evaluated to `true`. Can we special case non-complete aggregate expressions?

From an architectural point of view it might be better to add this as a 
normalize function to Expression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17960: [SPARK-20719] [SQL] Support LIMIT ALL

2017-05-12 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17960#discussion_r116300475
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -279,7 +279,12 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
 
 // LIMIT
 withWindow.optional(limit) {
-  Limit(typedVisit(limit), withWindow)
+  if (ALL != null) {
+// LIMIT ALL is the same as omitting the LIMIT clause
--- End diff --

Can this be true? I don't think you need to check `ALL` if `limit != null`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17953: [SPARK-20680][SQL] Spark-sql do not support for v...

2017-05-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17953#discussion_r116138305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1504,6 +1504,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case ("decimal", precision :: Nil) => 
DecimalType(precision.getText.toInt, 0)
   case ("decimal", precision :: scale :: Nil) =>
 DecimalType(precision.getText.toInt, scale.getText.toInt)
+  case ("void", Nil) => NullType
--- End diff --

Apparently Hive can have null typed columns. So this should be the location 
where you'd want to change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-05-11 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17953
  
@LantaoJin Can you add a description and a test case for this? You can take 
a look at the OrcSourceSuite to get an idea how to work with Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-05-11 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17953
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17939: [SPARK-19447] Remove remaining references to generated r...

2017-05-10 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17939
  
LGTM - merging to master/2.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17920: [SPARK-17685][SQL] Make SortMergeJoinExec's currentVars ...

2017-05-09 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17920
  
LGTM - merging to master/2.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17921: [SPARK-19876][BUILD] Move Trigger.java to java source hi...

2017-05-09 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17921
  
LGTM - merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17899: [SPARK-20636] Add new optimization rule to flip a...

2017-05-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17899#discussion_r115370303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -609,6 +610,19 @@ object CollapseWindow extends Rule[LogicalPlan] {
 }
 
 /**
+ * Flip Adjacent Window Expressions.
+ * - If the partition spec of the parent Window expression is a subset of 
the partition spec
+ *   of the child window expression, flip them.
+ */
+object FlipWindow extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case w1 @ Window(we1, ps1, os1, w2 @ Window(we2, ps2, os2, grandChild))
+if ps2.containsSlice(ps1) =>
--- End diff --

You are also changing the order of the columns. You will need to add a 
projection on top to be sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17666#discussion_r115361494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 ---
@@ -57,19 +57,21 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
* A TVF maps argument lists to resolver functions that accept those 
arguments. Using a map
* here allows for function overloading.
*/
-  private type TVF = Map[ArgumentList, Seq[Any] => LogicalPlan]
+  private type TVF = Map[ArgumentList, (UnresolvedTableValuedFunction, 
Seq[Any]) => LogicalPlan]
 
   /**
* TVF builder.
*/
-  private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], 
LogicalPlan])
-  : (ArgumentList, Seq[Any] => LogicalPlan) = {
+  private def tvf(args: (String, DataType)*)(
+  pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), 
LogicalPlan])
--- End diff --

I also think that we should separate the aliasing from constructing the 
table valued function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17666#discussion_r115361167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala
 ---
@@ -57,19 +57,21 @@ object ResolveTableValuedFunctions extends 
Rule[LogicalPlan] {
* A TVF maps argument lists to resolver functions that accept those 
arguments. Using a map
* here allows for function overloading.
*/
-  private type TVF = Map[ArgumentList, Seq[Any] => LogicalPlan]
+  private type TVF = Map[ArgumentList, (UnresolvedTableValuedFunction, 
Seq[Any]) => LogicalPlan]
 
   /**
* TVF builder.
*/
-  private def tvf(args: (String, DataType)*)(pf: PartialFunction[Seq[Any], 
LogicalPlan])
-  : (ArgumentList, Seq[Any] => LogicalPlan) = {
+  private def tvf(args: (String, DataType)*)(
+  pf: PartialFunction[(UnresolvedTableValuedFunction, Seq[Any]), 
LogicalPlan])
--- End diff --

Hmmm... this signature is kind of complex. Can we try to use some kind of 
class/case class that encapsulates this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17666#discussion_r115358211
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala
 ---
@@ -441,4 +440,15 @@ class AnalysisSuite extends AnalysisTest with 
ShouldMatchers {
 
 checkAnalysis(SubqueryAlias("tbl", testRelation).as("tbl2"), 
testRelation)
   }
+
+  test("SPARK-20311 range(N) as alias") {
+def rangeWithAliases(outputNames: Seq[String]): LogicalPlan = {
+  SubqueryAlias("t", UnresolvedTableValuedFunction("range", Literal(7) 
:: Nil, outputNames))
--- End diff --

Should we also test different cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17666: [SPARK-20311][SQL] Support aliases for table valu...

2017-05-08 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/17666#discussion_r115357687
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -498,12 +498,16 @@ case class Sort(
 
 /** Factory for constructing new `Range` nodes. */
 object Range {
-  def apply(start: Long, end: Long, step: Long, numSlices: Option[Int]): 
Range = {
-val output = StructType(StructField("id", LongType, nullable = false) 
:: Nil).toAttributes
+
+  def apply(start: Long, end: Long, step: Long, numSlices: Option[Int], 
outputName: Option[String])
--- End diff --

Why make this an option? It is not optional right? The next function should 
either call Range(start, end, step, Some(numSlices), "id") or this function 
should have a default parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-05-04 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17736
  
For some reference. In 1.6 we used the Catalyst SqlParser to parse the 
expression in `Dataframe.filter()`, and we used the Hive (ANTLR based) parser 
for parsing for SQL commands. In Spark 2.0 we moved all of this to a single 
parser. When porting the parser, I followed the rules in the Hive parser (incl. 
the unescaping logic), and this fell through the cracks.

Java/scala normal strings make things mind meltingly confusing. I think it 
is fair that we provide an option to disable the parser's unescaping as a way 
to get out of this. This might not be the best solution if you use regexes in 
both pure SQL and in scala at the same time, but it at least is an improvement.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-03 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17770
  
I am not a giant fan of the `resolveOperators*` approach, is is yet another 
code path that does something similar to the `transfrom*` code path, it 
introduces some mutable state, and I have been bitten by it before. I 
personally think that a barrier node (that is be a no-op for any rule) is a far 
more elegant solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17836: [SPARK-20566][SQL] ColumnVector should support `appendFl...

2017-05-02 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17836
  
cc @michal-databricks 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17838: [SPARK-20567] Lazily bind in GenerateExec

2017-05-02 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17838
  
LGTM - merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17823: [SPARK-20548] Disable ReplSuite.newProductSeqEncoder wit...

2017-05-01 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17823
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17784: [SPARK-20492][SQL] Do not print empty parentheses for in...

2017-04-30 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17784
  
Yes, it can. Merging to master/2.2. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    3   4   5   6   7   8   9   10   11   12   >