date:20160711

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70209333
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  var INFORMATION_SCHEMA = "information_schema"
--- End diff --

Oh, thank you for review! Right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70209363
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -401,7 +401,9 @@ class SessionCatalog(
   val db = formatDatabaseName(name.database.getOrElse(currentDb))
   val table = formatTableName(name.table)
   val relation =
-if (name.database.isDefined || !tempTables.contains(table)) {
+if (db == "information_schema") {
+  tempTables(s"$db.$table")
--- End diff --

Indeed. My bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70209455
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  var INFORMATION_SCHEMA = "information_schema"
+  /**
+   * Register INFORMATION_SCHEMA database.
+   */
+  def registerInformationSchema(sparkSession: SparkSession) {
+sparkSession.sql("CREATE DATABASE IF NOT EXISTS information_schema")
--- End diff --

Yep.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70209478
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,337 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  var INFORMATION_SCHEMA = "information_schema"
+  /**
+   * Register INFORMATION_SCHEMA database.
+   */
+  def registerInformationSchema(sparkSession: SparkSession) {
--- End diff --

Right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70209567
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -401,7 +401,9 @@ class SessionCatalog(
   val db = formatDatabaseName(name.database.getOrElse(currentDb))
   val table = formatTableName(name.table)
   val relation =
-if (name.database.isDefined || !tempTables.contains(table)) {
+if (db == "information_schema") {
+  tempTables(s"$db.$table")
--- End diff --

Ah, there is a reason. Catalyst can not see `InformationSchema`. 
`InformationSchema` is designed to be outside of catalyst.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70209581
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -425,7 +427,9 @@ class SessionCatalog(
   def tableExists(name: TableIdentifier): Boolean = synchronized {
 val db = formatDatabaseName(name.database.getOrElse(currentDb))
 val table = formatTableName(name.table)
-if (name.database.isDefined || !tempTables.contains(table)) {
+if (db == "information_schema") {
--- End diff --

Here, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14116
  
**[Test build #62081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62081/consoleFull)**
 for PR 14116 at commit 
[`34fe4ed`](https://github.com/apache/spark/commit/34fe4ed774e726000acb7af8c4e386619027dc17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70210589
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -401,7 +401,9 @@ class SessionCatalog(
   val db = formatDatabaseName(name.database.getOrElse(currentDb))
   val table = formatTableName(name.table)
   val relation =
-if (name.database.isDefined || !tempTables.contains(table)) {
+if (db == "information_schema") {
+  tempTables(s"$db.$table")
--- End diff --

Then I guess the constant `InformationSchema.INFORMATION_SCHEMA` should 
live in `sql/catalyst` rather than in `sql/core`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14131
  
**[Test build #62076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62076/consoleFull)**
 for PR 14131 at commit 
[`4d6f654`](https://github.com/apache/spark/commit/4d6f6544be4373a32150fd6d59ba539d3fcb6aab).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14131
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62076/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14131
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70212106
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  val INFORMATION_SCHEMA = "information_schema"
+  /**
+   * Register INFORMATION_SCHEMA database.
+   */
+  def registerInformationSchema(sparkSession: SparkSession): Unit = {
+sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA")
+registerView(sparkSession, new DatabasesRelationProvider, 
Seq("schemata", "databases"))
+registerView(sparkSession, new TablesRelationProvider, Seq("tables"))
+registerView(sparkSession, new ViewsRelationProvider, Seq("views"))
+registerView(sparkSession, new ColumnsRelationProvider, Seq("columns"))
+registerView(sparkSession, new SessionVariablesRelationProvider, 
Seq("session_variables"))
+  }
+
+  /**
+   * Register a INFORMATION_SCHEMA relation provider as a temporary view 
of Spark Catalog.
+   */
+  private def registerView(
+  sparkSession: SparkSession,
+  relationProvider: SchemaRelationProvider,
+  names: Seq[String]) {
+val plan =
+  
LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, 
null)).analyze
+val projectList = plan.output.zip(plan.schema).map {
+  case (attr, col) => Alias(attr, col.name)()
+}
+sparkSession.sessionState.executePlan(Project(projectList, plan))
+for (name <- names) {
+  // TODO(dongjoon): This is a hack to give a database concept for 
Spark temporary views.
+  // We should generalize this later.
+  
sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name",
+plan, overrideIfExists = true)
+}
+  }
+
+  /**
+   * Compile filter array into single string condition.
+   */
+  private[systemcatalog] def getConditionExpressionString(filters: 
Array[Filter]): String = {
+val str = filters.flatMap(InformationSchema.compileFilter).map(p => 
s"($p)").mkString(" AND ")
+if (str.length == 0) "TRUE" else str
+  }
+
+  /**
+   * Convert filter into string expression.
+   */
+  private[systemcatalog] def compileFilter(f: Filter): Option[String] = {
--- End diff --

This whole function do have great merit, but I feel @liancheng had 
implemented something similar before? @liancheng could you confirm? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #62078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62078/consoleFull)**
 for PR 14034 at commit 
[`2e6f8d8`](https://github.com/apache/spark/commit/2e6f8d8c8b5007302415b7fd984a38fc51be44bf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62078/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14130
  
**[Test build #62075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62075/consoleFull)**
 for PR 14130 at commit 
[`2915bf1`](https://github.com/apache/spark/commit/2915bf1e79a28a1df0fbc895068fcf8bee2095b0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14130
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14130
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62075/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14130: [SPARK-16477] Bump master version to 2.1.0-SNAPSHOT

2016-07-11 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14130
  
Hm I tried this recently and it failed mima. I think this has to bump the 
base mima version too but then it will fail because 2.0.0 isn't released. I 
thought we'd have to wait. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14114
  
**[Test build #62077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62077/consoleFull)**
 for PR 14114 at commit 
[`cac342e`](https://github.com/apache/spark/commit/cac342e7de11d8ecdbec7813591cce1397595a36).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62077/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14114
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14114
  
**[Test build #62079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62079/consoleFull)**
 for PR 14114 at commit 
[`ac5f5cb`](https://github.com/apache/spark/commit/ac5f5cbebfbe48a307e21fc094dba4aa8fa86ddd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14114
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62079/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14114: [SPARK-16458][SQL] SessionCatalog should support `listCo...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14114
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70216583
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -401,7 +401,9 @@ class SessionCatalog(
   val db = formatDatabaseName(name.database.getOrElse(currentDb))
   val table = formatTableName(name.table)
   val relation =
-if (name.database.isDefined || !tempTables.contains(table)) {
+if (db == "information_schema") {
+  tempTables(s"$db.$table")
--- End diff --

That's a good idea!

I made `SessionCatalog` object in the following PR to store 'DEFAULT'.


https://github.com/apache/spark/pull/14115/files#diff-b3f9800839b9b9a1df9da9cbfc01adf8R38

We can move `INFORMATION_SCHEMA` to there!
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70216745
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  val INFORMATION_SCHEMA = "information_schema"
+  /**
+   * Register INFORMATION_SCHEMA database.
+   */
+  def registerInformationSchema(sparkSession: SparkSession): Unit = {
+sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA")
+registerView(sparkSession, new DatabasesRelationProvider, 
Seq("schemata", "databases"))
+registerView(sparkSession, new TablesRelationProvider, Seq("tables"))
+registerView(sparkSession, new ViewsRelationProvider, Seq("views"))
+registerView(sparkSession, new ColumnsRelationProvider, Seq("columns"))
+registerView(sparkSession, new SessionVariablesRelationProvider, 
Seq("session_variables"))
+  }
+
+  /**
+   * Register a INFORMATION_SCHEMA relation provider as a temporary view 
of Spark Catalog.
+   */
+  private def registerView(
+  sparkSession: SparkSession,
+  relationProvider: SchemaRelationProvider,
+  names: Seq[String]) {
+val plan =
+  
LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, 
null)).analyze
+val projectList = plan.output.zip(plan.schema).map {
+  case (attr, col) => Alias(attr, col.name)()
+}
+sparkSession.sessionState.executePlan(Project(projectList, plan))
+for (name <- names) {
+  // TODO(dongjoon): This is a hack to give a database concept for 
Spark temporary views.
+  // We should generalize this later.
+  
sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name",
+plan, overrideIfExists = true)
+}
+  }
+
+  /**
+   * Compile filter array into single string condition.
+   */
+  private[systemcatalog] def getConditionExpressionString(filters: 
Array[Filter]): String = {
+val str = filters.flatMap(InformationSchema.compileFilter).map(p => 
s"($p)").mkString(" AND ")
+if (str.length == 0) "TRUE" else str
+  }
+
+  /**
+   * Convert filter into string expression.
+   */
+  private[systemcatalog] def compileFilter(f: Filter): Option[String] = {
--- End diff --

This one comes from JDBC module. I intentionally copy the code, not calling 
them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62080/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14034
  
**[Test build #62080 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62080/consoleFull)**
 for PR 14034 at commit 
[`d66870b`](https://github.com/apache/spark/commit/d66870bb274a206f16d33f56214246b17953a90e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14048: [SPARK-16370][SQL] Union queries should not be executed ...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14048
  
A more fruitful approach might be to introduce a separate operator for 
multi-insert. Lets leave that for another day :)...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When L...

2016-07-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14034: [SPARK-16355] [SPARK-16354] [SQL] Fix Bugs When LIMIT/TA...

2016-07-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14034
  
thanks, merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70218352
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  val INFORMATION_SCHEMA = "information_schema"
+  /**
+   * Register INFORMATION_SCHEMA database.
+   */
+  def registerInformationSchema(sparkSession: SparkSession): Unit = {
+sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA")
+registerView(sparkSession, new DatabasesRelationProvider, 
Seq("schemata", "databases"))
+registerView(sparkSession, new TablesRelationProvider, Seq("tables"))
+registerView(sparkSession, new ViewsRelationProvider, Seq("views"))
+registerView(sparkSession, new ColumnsRelationProvider, Seq("columns"))
+registerView(sparkSession, new SessionVariablesRelationProvider, 
Seq("session_variables"))
+  }
+
+  /**
+   * Register a INFORMATION_SCHEMA relation provider as a temporary view 
of Spark Catalog.
+   */
+  private def registerView(
+  sparkSession: SparkSession,
+  relationProvider: SchemaRelationProvider,
+  names: Seq[String]) {
+val plan =
+  
LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, 
null)).analyze
+val projectList = plan.output.zip(plan.schema).map {
+  case (attr, col) => Alias(attr, col.name)()
+}
+sparkSession.sessionState.executePlan(Project(projectList, plan))
+for (name <- names) {
+  // TODO(dongjoon): This is a hack to give a database concept for 
Spark temporary views.
+  // We should generalize this later.
+  
sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name",
+plan, overrideIfExists = true)
+}
+  }
+
+  /**
+   * Compile filter array into single string condition.
+   */
+  private[systemcatalog] def getConditionExpressionString(filters: 
Array[Filter]): String = {
+val str = filters.flatMap(InformationSchema.compileFilter).map(p => 
s"($p)").mkString(" AND ")
+if (str.length == 0) "TRUE" else str
+  }
+
+  /**
+   * Convert filter into string expression.
+   */
+  private[systemcatalog] def compileFilter(f: Filter): Option[String] = {
--- End diff --

Oh I see. Then if we won't dedup this for now, let's leave some comments 
saying if we should change some piece of code please don't forget to change the 
other? What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-11 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/14132

[SPARK-16475][SQL] Broadcast Hint for SQL Queries

## What changes were proposed in this pull request?

Broadcast hint is a way for users to manually annotate a query and suggest 
to the query optimizer the join method. It is very useful when the query 
optimizer cannot make optimal decision with respect to join methods due to 
conservativeness or the lack of proper statistics. The DataFrame API has 
broadcast hint since Spark 1.5. However, we do not have an equivalent 
functionality in SQL queries. We propose adding Hive-style broadcast hint to 
Spark SQL. For more information, please see the [design 
document](https://issues.apache.org/jira/secure/attachment/12817061/BroadcastHintinSparkSQL.pdf).

- [x] SUPPORT `MAPJOIN` SYNTAX
- [ ] SUPPORT `BROADCAST JOIN` SYNTAX

## How was this patch tested?

Pass the Jenkins tests with new testcases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-16475

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14132


commit 6bd704e5d281bd8c7a1bddaee300688d016bc597
Author: Dongjoon Hyun 
Date:   2016-07-11T08:04:51Z

[SPARK-16475][SQL] Broadcast Hint for SQL Queries




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62082/consoleFull)**
 for PR 14132 at commit 
[`6bd704e`](https://github.com/apache/spark/commit/6bd704e5d281bd8c7a1bddaee300688d016bc597).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHE...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14116#discussion_r70220191
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/systemcatalog/InformationSchema.scala
 ---
@@ -0,0 +1,336 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.systemcatalog
+
+import java.sql.{Date, Timestamp}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.expressions.Alias
+import org.apache.spark.sql.catalyst.plans.logical.Project
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.sources._
+import org.apache.spark.sql.types._
+
+/**
+ * INFORMATION_SCHEMA is a database consisting views which provide 
information about all of the
+ * tables, views, columns in a database.
+ */
+object InformationSchema {
+  val INFORMATION_SCHEMA = "information_schema"
+  /**
+   * Register INFORMATION_SCHEMA database.
+   */
+  def registerInformationSchema(sparkSession: SparkSession): Unit = {
+sparkSession.sql(s"CREATE DATABASE IF NOT EXISTS $INFORMATION_SCHEMA")
+registerView(sparkSession, new DatabasesRelationProvider, 
Seq("schemata", "databases"))
+registerView(sparkSession, new TablesRelationProvider, Seq("tables"))
+registerView(sparkSession, new ViewsRelationProvider, Seq("views"))
+registerView(sparkSession, new ColumnsRelationProvider, Seq("columns"))
+registerView(sparkSession, new SessionVariablesRelationProvider, 
Seq("session_variables"))
+  }
+
+  /**
+   * Register a INFORMATION_SCHEMA relation provider as a temporary view 
of Spark Catalog.
+   */
+  private def registerView(
+  sparkSession: SparkSession,
+  relationProvider: SchemaRelationProvider,
+  names: Seq[String]) {
+val plan =
+  
LogicalRelation(relationProvider.createRelation(sparkSession.sqlContext, null, 
null)).analyze
+val projectList = plan.output.zip(plan.schema).map {
+  case (attr, col) => Alias(attr, col.name)()
+}
+sparkSession.sessionState.executePlan(Project(projectList, plan))
+for (name <- names) {
+  // TODO(dongjoon): This is a hack to give a database concept for 
Spark temporary views.
+  // We should generalize this later.
+  
sparkSession.sessionState.catalog.createTempView(s"$INFORMATION_SCHEMA.$name",
+plan, overrideIfExists = true)
+}
+  }
+
+  /**
+   * Compile filter array into single string condition.
+   */
+  private[systemcatalog] def getConditionExpressionString(filters: 
Array[Filter]): String = {
+val str = filters.flatMap(InformationSchema.compileFilter).map(p => 
s"($p)").mkString(" AND ")
+if (str.length == 0) "TRUE" else str
+  }
+
+  /**
+   * Convert filter into string expression.
+   */
+  private[systemcatalog] def compileFilter(f: Filter): Option[String] = {
--- End diff --

Definitely, it would be good. This PR is still waiting some dependency PRs. 
While discussing, we can refactoring `compileFilter` into some `Utils` 
functions, too. @rxin , any advices for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
cc @rxin and @hvanhovell .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse proxy ...

2016-07-11 Thread gurvindersingh

Github user gurvindersingh commented on the issue:

https://github.com/apache/spark/pull/13950
  
@tgravescs Yeah the proposal is only for standalone mode where worker & 
application UI is accessed through master UI now. Looking at the authn/z 
settings for standalone, I don't see this patch interfere with any of those. 
The addFilters() function set the ui.filters for '/*' path so it will apply for 
this case too. Beyond this let me know if I miss anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
If the direction is right, I can move on adding `BROADCAST JOIN` syntax.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14127: [SPARK-15467][build] update janino version to 3.0.0

2016-07-11 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14127
  
Sounds like a small set of changes; not sure why it was a major version 
change. _shrug_


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14104: [SPARK-16438] Add Asynchronous Actions documentat...

2016-07-11 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14104#discussion_r7076
  
--- Diff: docs/programming-guide.md ---
@@ -1099,6 +1099,9 @@ for details.
 
 
 
+Spark provide asynchronous actions to execute two or more actions 
concurrently, these actions are execute asynchronously without blocking calling 
thread. Refer to the RDD API doc for asynchronous actions 
([Scala](api/scala/index.html#org.apache.spark.rdd.AsyncRDDActions),[Java](api/java/org/apache/spark/rdd/AsyncRDDActions.html))
--- End diff --

I still don't think this is accurate. Spark can execute actions 
concurrently without this API. This merely makes the call non-blocking for the 
caller. It really adds very little beyond calling the normal API and wrapping 
in a Future (though it's more useful for the Java API where that's harder). 
Hence why I thought it was unofficially deprecated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70222526
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
 : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
 ;
 
+BRACKETED_EMPTY_COMMENT
--- End diff --

Do we need this because the BRACKETED_COMMENT rule is now expecting at 
least one character?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70222831
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
 : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
 ;
 
+BRACKETED_EMPTY_COMMENT
--- End diff --

Yes. Could you give me some workaround advice?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70223258
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
 : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
 ;
 
+BRACKETED_EMPTY_COMMENT
+: '/**/' -> channel(HIDDEN)
+;
+
 BRACKETED_COMMENT
-: '/*' .*? '*/' -> channel(HIDDEN)
+: '/*' ~[+] .*? '*/' -> channel(HIDDEN)
--- End diff --

It might be easier to introduce a HINT_PREFIX rule (`'/*+'`) and place this 
before the BRACKET_COMMENT rule. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70223712
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' mapJoinHint '*/'
+;
+
+mapJoinHint
+: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' 
broadcastedTables+=tableIdentifier)* ')'
--- End diff --

Is the MAPJOIN optimization the only one we are going to support? This is 
fine if we aren't. We might want to make this more general (and move the logic 
into the planner) if we are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14133: [SPARK-15889] [STREAMING] Follow-up fix to errone...

2016-07-11 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/14133

[SPARK-15889] [STREAMING] Follow-up fix to erroneous condition in StreamTest

## What changes were proposed in this pull request?

A second form of AssertQuery now actually invokes the condition; avoids a 
build warning too

## How was this patch tested?

Jenkins; running StreamTest

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-15889.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14133.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14133


commit 719012fd16e574fad8c25ba6b9ecb23df07649de
Author: Sean Owen 
Date:   2016-07-11T09:10:09Z

Fix condition; in StreamTest to actually invoke condition




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14116
  
**[Test build #62081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62081/consoleFull)**
 for PR 14116 at commit 
[`34fe4ed`](https://github.com/apache/spark/commit/34fe4ed774e726000acb7af8c4e386619027dc17).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14116
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14116
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62081/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70225259
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Broadcast hints
+var withBroadcastedTable = relation
--- End diff --

A few thoughts here:

- Should we introduce a separate LogicalPlan (like `With`) and move this 
logic into an Analyzer rule? The reason that I am proposing this is because we 
are transforming the `LogicalPlan` here. It is simpler to do it here though.
- Lets move the logic here into a separate method. Name it `withHints`, 
model it after the other `with*` functions, and use 
`optionalMap(hint)(withHint)`. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70226031
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -945,8 +955,12 @@ SIMPLE_COMMENT
 : '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
 ;
 
+BRACKETED_EMPTY_COMMENT
+: '/**/' -> channel(HIDDEN)
+;
+
 BRACKETED_COMMENT
-: '/*' .*? '*/' -> channel(HIDDEN)
+: '/*' ~[+] .*? '*/' -> channel(HIDDEN)
--- End diff --

Oh, I see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70226224
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' mapJoinHint '*/'
+;
+
+mapJoinHint
+: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' 
broadcastedTables+=tableIdentifier)* ')'
--- End diff --

Yep. MAPJOIN optimization will be handled in Optimizer. This PR is just 
accepting the basic syntax.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70226307
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' mapJoinHint '*/'
+;
+
+mapJoinHint
+: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' 
broadcastedTables+=tableIdentifier)* ')'
--- End diff --

Currently, the scope of this PR does. We can generalize if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70226914
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Broadcast hints
+var withBroadcastedTable = relation
+if (ctx.hint != null) {
+  val broadcastedTables =
+
hint.mapJoinHint.broadcastedTables.asScala.map(visitTableIdentifier)
+  for (table <- broadcastedTables) {
+var stop = false
+withBroadcastedTable = withBroadcastedTable.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t == table =>
--- End diff --

What happens if we use the same table multiple times? Only the first one 
gets a broadcasted hint? What does hive do?

What will happen if we do something like this:
```SQL
SELECT /*+ MAPJOIN(tbl_b) */
   *
FROM   tbl_a A
   LEFT JOIN tbl_b B
ON B.id = A.id
   LEFT JOIN (SELECT XA.id
  FROM   tbl_c XA
 LEFT ANTI JOIN tbl_b XB
  ON XB.id = XA.id) C
ON C.id = A.id 
```
Will both the `tbl_b` instances be broadcasted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70227049
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala
 ---
@@ -153,4 +153,38 @@ class BroadcastJoinSuite extends QueryTest with 
SQLTestUtils {
   cases.foreach(assertBroadcastJoin)
 }
   }
+
+  test("select hint") {
--- End diff --

Move this to package `org.apache.spark.sql.catalyst.parser.PlanParserSuite`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14132
  
This looks pretty good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70227159
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Broadcast hints
+var withBroadcastedTable = relation
--- End diff --

For the first one, that could be possible. I did here because it's simple 
and I want to remove the unmatched MAPJOIN hints at the early stage. (You can 
see that the testcases) We will do the more complex real optimization in 
Optimizer layer.

For the second one, sure. That would better. I'll fix like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70227231
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,14 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' mapJoinHint '*/'
+;
+
+mapJoinHint
+: MAPJOIN '(' broadcastedTables+=tableIdentifier (',' 
broadcastedTables+=tableIdentifier)* ')'
--- End diff --

Ok - lets keep it this way then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70227379
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,24 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Broadcast hints
+var withBroadcastedTable = relation
+if (ctx.hint != null) {
+  val broadcastedTables =
+
hint.mapJoinHint.broadcastedTables.asScala.map(visitTableIdentifier)
+  for (table <- broadcastedTables) {
+var stop = false
+withBroadcastedTable = withBroadcastedTable.transformDown {
+  case r @ BroadcastHint(UnresolvedRelation(_, _)) => r
+  case r @ UnresolvedRelation(t, _) if !stop && t == table =>
--- End diff --

Wow. Thank you for the case. I missed that. Let me check the behavior. This 
could be a good testcase candidate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Qu...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r70227441
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala
 ---
@@ -153,4 +153,38 @@ class BroadcastJoinSuite extends QueryTest with 
SQLTestUtils {
   cases.foreach(assertBroadcastJoin)
 }
   }
+
+  test("select hint") {
--- End diff --

No problem!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-11 Thread phalodi

Github user phalodi commented on the issue:

https://github.com/apache/spark/pull/14104
  
@srowen i know its nothing major just call normal functions in future but 
still naive user first time who learn scala and spark dont know what are future 
and all so at least we should add reference of it. if you suggest i simple add 
reference link without adding any statement of blocking and non blocking. so 
atleast user know there is some api like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-11 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14104
  
Although I don't think it's essential to mention this API (it's in the API 
doc) it wouldn't hurt to point to it here. The mention should be accurate 
though. I think it's correct to say that the Spark RDD API also exposes 
asynchronous versions of some actions like foreach, which return a handle 
immediately to the caller which can be used to wait for its completion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Thank you for quick review! I'll let you know after updating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14115
  
Hi, @hvanhovell .
Could you review this PR, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-11 Thread phalodi

Github user phalodi commented on the issue:

https://github.com/apache/spark/pull/14104
  
@srowen so what do you think we should not add this api reference in 
document?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62082/consoleFull)**
 for PR 14132 at commit 
[`6bd704e`](https://github.com/apache/spark/commit/6bd704e5d281bd8c7a1bddaee300688d016bc597).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-11 Thread phalodi

Github user phalodi commented on the issue:

https://github.com/apache/spark/pull/14104
  
@srowen sure i will make appropriate changes and push it again 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62082/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...

2016-07-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14131
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...

2016-07-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14131
  
looks like we also need to backport the improvement to `checkEvaluation`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13494#discussion_r70230056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule optimizes the execution of queries that can be answered by 
looking only at
+ * partition-level metadata. This applies when all the columns scanned are 
partition columns, and
+ * the query has an aggregate operator that satisfies the following 
conditions:
+ * 1. aggregate expression is partition columns.
+ *  e.g. SELECT col FROM tbl GROUP BY col.
+ * 2. aggregate function on partition columns with DISTINCT.
+ *  e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1.
+ * 3. aggregate function on partition columns which have same result w or 
w/o DISTINCT keyword.
+ *  e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1.
+ */
+case class OptimizeMetadataOnlyQuery(
+catalog: SessionCatalog,
+conf: SQLConf) extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (!conf.optimizerMetadataOnly) {
+  return plan
+}
+
+plan.transform {
+  case a @ Aggregate(_, aggExprs, child @ 
PartitionedRelation(partAttrs, relation)) =>
+// We only apply this optimization when only partitioned 
attributes are scanned.
+if (a.references.subsetOf(partAttrs)) {
+  val aggFunctions = aggExprs.flatMap(_.collect {
+case agg: AggregateExpression => agg
+  })
+  val isAllDistinctAgg = aggFunctions.forall { agg =>
+agg.isDistinct || (agg.aggregateFunction match {
+  // `Max` and `Min` are always distinct aggregate functions 
no matter they have
+  // DISTINCT keyword or not, as the result will be same.
+  case _: Max => true
--- End diff --

First/Last?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

2016-07-11 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13704#discussion_r70230473
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2018,6 +2018,8 @@ class Analyzer(
 fail(child, DateType, walkedTypePath)
   case (StringType, to: NumericType) =>
 fail(child, to, walkedTypePath)
+  case (from: ArrayType, to: ArrayType) if !from.containsNull =>
--- End diff --

Got it. I will add unit tests later. They require a combination of `Cast` 
and `SimplifyCasts`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL][WIP] Broadcast Hint for SQL Queries

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14132
  
Oops. Five errors, too. I'll fix these tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14104: [SPARK-16438] Add Asynchronous Actions documentation

2016-07-11 Thread phalodi

Github user phalodi commented on the issue:

https://github.com/apache/spark/pull/14104
  
@srowen now its perfect i think at last from table to one line but yeah you 
are right it good to just mention that spark provide it and not duplicate table 
of actions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14115: [SPARK-16459][SQL] Prevent dropping current datab...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14115#discussion_r70231360
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -878,7 +885,8 @@ class SessionCatalog(
* This is mainly used for tests.
*/
   private[sql] def reset(): Unit = synchronized {
-val default = "default"
+val default = DEFAULT_DATABASE
--- End diff --

Shouldn't we just replace the `val default` by `DEFAULT_DATABASE`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

2016-07-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13704#discussion_r70231367
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1441,6 +1441,26 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
 object SimplifyCasts extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
 case Cast(e, dataType) if e.dataType == dataType => e
+case Cast(e, dataType) =>
+  (e.dataType, dataType) match {
+case (fromDt: ArrayType, toDt: ArrayType) =>
--- End diff --

how about
```
case c @ Cast(e, dataType) => (e.dataType, dataType) match {
  case (ArrayType(from, false), ArrayType(to, true)) if from == to => e
  case (MapType(fromKey, fromValue, false), ArrayType(toKey, toValue, 
true)) if fromKey == toKey && fromValue == toValue => e
  case _ => c
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14115: [SPARK-16459][SQL] Prevent dropping current datab...

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14115#discussion_r70232467
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -878,7 +885,8 @@ class SessionCatalog(
* This is mainly used for tests.
*/
   private[sql] def reset(): Unit = synchronized {
-val default = "default"
+val default = DEFAULT_DATABASE
--- End diff --

Oh, sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database

2016-07-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14115
  
Thank you for review, @hvanhovell .
I replaced the local variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14134: [Spark-16479] Add Example for asynchronous action

2016-07-11 Thread phalodi

GitHub user phalodi opened a pull request:

https://github.com/apache/spark/pull/14134

[Spark-16479] Add Example for asynchronous action

## What changes were proposed in this pull request?

Add examples for asynchronous actions


## How was this patch tested?

run  all tests cases





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/phalodi/spark SPARK-16479

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14134.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14134


commit aba737f9be520ec33d0841d718f8250fbfeb09e1
Author: sandy 
Date:   2016-07-08T09:15:12Z

Add Asynchronous Actions documentation

commit 4afcec803218376200559730e8f1c9873641c140
Author: sandy 
Date:   2016-07-10T19:04:03Z

fix statement to more clear about blocking statement

commit 81d1f716e279dab5f5edfdc5eb3624f28d0387f1
Author: sandy 
Date:   2016-07-11T05:56:11Z

remove some statement related to table

commit db09b2c3bfb441cf20dfebd5a7def79252c49cec
Author: sandy 
Date:   2016-07-11T10:00:39Z

make changes in programming guide doc for asynchronous actions

commit 1a54ff52564639a750ffa81e9159f2ae3a6248ee
Author: sandy 
Date:   2016-07-11T10:18:47Z

add asynchronous actions example




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14134: [Spark-16479] Add Example for asynchronous action

2016-07-11 Thread phalodi

Github user phalodi closed the pull request at:

https://github.com/apache/spark/pull/14134


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

2016-07-11 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13704#discussion_r70234587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1441,6 +1441,26 @@ object PushPredicateThroughJoin extends 
Rule[LogicalPlan] with PredicateHelper {
 object SimplifyCasts extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions 
{
 case Cast(e, dataType) if e.dataType == dataType => e
+case Cast(e, dataType) =>
+  (e.dataType, dataType) match {
+case (fromDt: ArrayType, toDt: ArrayType) =>
--- End diff --

thanks, I like this simple one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14115
  
LGTM - pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14129
  
@tilumi could show some benchmarks for this? I think that this will have 
some performance problems. Using an array in a DeclarativeAggregate is not the 
most efficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14135: [Spark-16479] Add Example for asynchronous action

2016-07-11 Thread phalodi

GitHub user phalodi opened a pull request:

https://github.com/apache/spark/pull/14135

[Spark-16479] Add Example for asynchronous action

## What changes were proposed in this pull request?

Add Example for asynchronous action


## How was this patch tested?

Run test cases 





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/phalodi/spark SPARK-16479

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14135.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14135


commit 1215cd9ade669ff7e78a8380ea42faf5bfa16387
Author: sandy 
Date:   2016-07-11T10:35:43Z

add example for async action




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14135: [Spark-16479] Add Example for asynchronous action

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14135
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13494#discussion_r70236562
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule optimizes the execution of queries that can be answered by 
looking only at
+ * partition-level metadata. This applies when all the columns scanned are 
partition columns, and
+ * the query has an aggregate operator that satisfies the following 
conditions:
+ * 1. aggregate expression is partition columns.
+ *  e.g. SELECT col FROM tbl GROUP BY col.
+ * 2. aggregate function on partition columns with DISTINCT.
+ *  e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1.
+ * 3. aggregate function on partition columns which have same result w or 
w/o DISTINCT keyword.
+ *  e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1.
+ */
+case class OptimizeMetadataOnlyQuery(
+catalog: SessionCatalog,
+conf: SQLConf) extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (!conf.optimizerMetadataOnly) {
+  return plan
+}
+
+plan.transform {
+  case a @ Aggregate(_, aggExprs, child @ 
PartitionedRelation(partAttrs, relation)) =>
+// We only apply this optimization when only partitioned 
attributes are scanned.
+if (a.references.subsetOf(partAttrs)) {
+  val aggFunctions = aggExprs.flatMap(_.collect {
+case agg: AggregateExpression => agg
+  })
+  val isAllDistinctAgg = aggFunctions.forall { agg =>
+agg.isDistinct || (agg.aggregateFunction match {
+  // `Max` and `Min` are always distinct aggregate functions 
no matter they have
+  // DISTINCT keyword or not, as the result will be same.
+  case _: Max => true
+  case _: Min => true
+  case _ => false
+})
+  }
+  if (isAllDistinctAgg) {
+
a.withNewChildren(Seq(replaceTableScanWithPartitionMetadata(child, relation)))
+  } else {
+a
+  }
+} else {
+  a
+}
+}
+  }
+
+  /**
+   * Transform the given plan, find its table scan nodes that matches the 
given relation, and then
+   * replace the table scan node with its corresponding partition values.
+   */
+  private def replaceTableScanWithPartitionMetadata(
+  child: LogicalPlan,
+  relation: LogicalPlan): LogicalPlan = {
+child transform {
+  case plan if plan eq relation =>
+relation match {
+  case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _) =>
+val partColumns = 
fsRelation.partitionSchema.map(_.name.toLowerCase).toSet
+val partAttrs = l.output.filter(a => 
partColumns.contains(a.name.toLowerCase))
+val partitionData = fsRelation.location.listFiles(filters = 
Nil)
+LocalRelation(partAttrs, partitionData.map(_.values))
+
+  case relation: CatalogRelation =>
+val partColumns = 
relation.catalogTable.partitionColumnNames.map(_.toLowerCase).toSet
+val partAttrs = relation.output.filter(a => 
partColumns.contains(a.name.toLowerCase))
+val partiti

[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13494#discussion_r70236613
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule optimizes the execution of queries that can be answered by 
looking only at
+ * partition-level metadata. This applies when all the columns scanned are 
partition columns, and
+ * the query has an aggregate operator that satisfies the following 
conditions:
+ * 1. aggregate expression is partition columns.
+ *  e.g. SELECT col FROM tbl GROUP BY col.
+ * 2. aggregate function on partition columns with DISTINCT.
+ *  e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1.
+ * 3. aggregate function on partition columns which have same result w or 
w/o DISTINCT keyword.
+ *  e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1.
+ */
+case class OptimizeMetadataOnlyQuery(
+catalog: SessionCatalog,
+conf: SQLConf) extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (!conf.optimizerMetadataOnly) {
+  return plan
+}
+
+plan.transform {
+  case a @ Aggregate(_, aggExprs, child @ 
PartitionedRelation(partAttrs, relation)) =>
+// We only apply this optimization when only partitioned 
attributes are scanned.
+if (a.references.subsetOf(partAttrs)) {
+  val aggFunctions = aggExprs.flatMap(_.collect {
+case agg: AggregateExpression => agg
+  })
+  val isAllDistinctAgg = aggFunctions.forall { agg =>
+agg.isDistinct || (agg.aggregateFunction match {
+  // `Max` and `Min` are always distinct aggregate functions 
no matter they have
+  // DISTINCT keyword or not, as the result will be same.
+  case _: Max => true
+  case _: Min => true
+  case _ => false
+})
+  }
+  if (isAllDistinctAgg) {
+
a.withNewChildren(Seq(replaceTableScanWithPartitionMetadata(child, relation)))
+  } else {
+a
+  }
+} else {
+  a
+}
+}
+  }
+
+  /**
+   * Transform the given plan, find its table scan nodes that matches the 
given relation, and then
+   * replace the table scan node with its corresponding partition values.
+   */
+  private def replaceTableScanWithPartitionMetadata(
+  child: LogicalPlan,
+  relation: LogicalPlan): LogicalPlan = {
+child transform {
+  case plan if plan eq relation =>
+relation match {
+  case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _) =>
+val partColumns = 
fsRelation.partitionSchema.map(_.name.toLowerCase).toSet
+val partAttrs = l.output.filter(a => 
partColumns.contains(a.name.toLowerCase))
+val partitionData = fsRelation.location.listFiles(filters = 
Nil)
+LocalRelation(partAttrs, partitionData.map(_.values))
+
+  case relation: CatalogRelation =>
+val partColumns = 
relation.catalogTable.partitionColumnNames.map(_.toLowerCase).toSet
+val partAttrs = relation.output.filter(a => 
partColumns.contains(a.name.toLowerCase))
+val partiti

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62087/consoleFull)**
 for PR 13704 at commit 
[`1bbe859`](https://github.com/apache/spark/commit/1bbe859804d999e30b8ed7f51b13121e30118d5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14131: [SPARK-16318][SQL] Implement all remaining xpath functio...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14131
  
**[Test build #62085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62085/consoleFull)**
 for PR 14131 at commit 
[`4d6f654`](https://github.com/apache/spark/commit/4d6f6544be4373a32150fd6d59ba539d3fcb6aab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14129: [SPARK-16280][SQL][WIP] Implement histogram_numeric SQL ...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14129
  
**[Test build #3178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3178/consoleFull)**
 for PR 14129 at commit 
[`08065d8`](https://github.com/apache/spark/commit/08065d87f9a04f7e03e319950e8caa5c960defb0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14133: [SPARK-15889] [STREAMING] Follow-up fix to erroneous con...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14133
  
**[Test build #62083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62083/consoleFull)**
 for PR 14133 at commit 
[`719012f`](https://github.com/apache/spark/commit/719012fd16e574fad8c25ba6b9ecb23df07649de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62084/consoleFull)**
 for PR 13704 at commit 
[`c31729f`](https://github.com/apache/spark/commit/c31729f361b5774f36834daeddb338f49377130e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14115
  
**[Test build #3177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3177/consoleFull)**
 for PR 14115 at commit 
[`19a3160`](https://github.com/apache/spark/commit/19a31601c248d239aafcdcba1123c7a7c585c924).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14115: [SPARK-16459][SQL] Prevent dropping current database

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14115
  
**[Test build #62086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62086/consoleFull)**
 for PR 14115 at commit 
[`19a3160`](https://github.com/apache/spark/commit/19a31601c248d239aafcdcba1123c7a7c585c924).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13494: [SPARK-15752] [SQL] Optimize metadata only query ...

2016-07-11 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13494#discussion_r70238437
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule optimizes the execution of queries that can be answered by 
looking only at
+ * partition-level metadata. This applies when all the columns scanned are 
partition columns, and
+ * the query has an aggregate operator that satisfies the following 
conditions:
+ * 1. aggregate expression is partition columns.
+ *  e.g. SELECT col FROM tbl GROUP BY col.
+ * 2. aggregate function on partition columns with DISTINCT.
+ *  e.g. SELECT col1, count(DISTINCT col2) FROM tbl GROUP BY col1.
+ * 3. aggregate function on partition columns which have same result w or 
w/o DISTINCT keyword.
+ *  e.g. SELECT col1, Max(col2) FROM tbl GROUP BY col1.
+ */
+case class OptimizeMetadataOnlyQuery(
+catalog: SessionCatalog,
+conf: SQLConf) extends Rule[LogicalPlan] {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (!conf.optimizerMetadataOnly) {
+  return plan
+}
+
+plan.transform {
+  case a @ Aggregate(_, aggExprs, child @ 
PartitionedRelation(partAttrs, relation)) =>
+// We only apply this optimization when only partitioned 
attributes are scanned.
+if (a.references.subsetOf(partAttrs)) {
+  val aggFunctions = aggExprs.flatMap(_.collect {
+case agg: AggregateExpression => agg
+  })
+  val isAllDistinctAgg = aggFunctions.forall { agg =>
+agg.isDistinct || (agg.aggregateFunction match {
+  // `Max` and `Min` are always distinct aggregate functions 
no matter they have
+  // DISTINCT keyword or not, as the result will be same.
+  case _: Max => true
+  case _: Min => true
+  case _ => false
+})
+  }
+  if (isAllDistinctAgg) {
+
a.withNewChildren(Seq(replaceTableScanWithPartitionMetadata(child, relation)))
+  } else {
+a
+  }
+} else {
+  a
+}
+}
+  }
+
+  /**
+   * Transform the given plan, find its table scan nodes that matches the 
given relation, and then
+   * replace the table scan node with its corresponding partition values.
+   */
+  private def replaceTableScanWithPartitionMetadata(
+  child: LogicalPlan,
+  relation: LogicalPlan): LogicalPlan = {
+child transform {
+  case plan if plan eq relation =>
+relation match {
+  case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _) =>
+val partColumns = 
fsRelation.partitionSchema.map(_.name.toLowerCase).toSet
+val partAttrs = l.output.filter(a => 
partColumns.contains(a.name.toLowerCase))
--- End diff --

There is no need to determine the `partAttrs` again; we can just pass them 
as an argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-07-11 Thread jiangxb1987

GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/14136

[SPARK-16282][SQL] Implement percentile SQL function.

## What changes were proposed in this pull request?

Implement percentile SQL function. It computes the exact percentile(s) of 
expr at pc with range in [0, 1].

## How was this patch tested?

Added new testcases in DataFrameAggregateSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark percentile

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14136


commit 1ae3df7463f36ac15c4cb3138bec3bade7eae600
Author: èæå 
Date:   2016-07-11T11:11:26Z

[SPARK-16282][SQL] Implement percentile SQL function.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-07-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #62088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62088/consoleFull)**
 for PR 14136 at commit 
[`1ae3df7`](https://github.com/apache/spark/commit/1ae3df7463f36ac15c4cb3138bec3bade7eae600).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-07-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 660 matches

Mail list logo