date:20160719

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71474127
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -316,27 +340,25 @@ object CreateDataSourceTableUtils extends Logging {
 tableProperties.put(DATASOURCE_PROVIDER, provider)
 
 // Saves optional user specified schema.  Serialized JSON schema 
string may be too long to be
--- End diff --

I think this comment is not correct anymore?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14278: [SPARK-16632][SQL] Use Spark requested schema to guide v...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14278
  
**[Test build #62583 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62583/consoleFull)**
 for PR 14278 at commit 
[`2ade381`](https://github.com/apache/spark/commit/2ade381403080d1390a34b44366ade05f42f6d4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14086
  
**[Test build #62582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62582/consoleFull)**
 for PR 14086 at commit 
[`98c81c7`](https://github.com/apache/spark/commit/98c81c7bc14f8514a96a0e63f89cd98da25d43f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14278: [SPARK-16632][SQL] Use Spark requested schema to ...

2016-07-19 Thread liancheng

GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/14278

[SPARK-16632][SQL] Use Spark requested schema to guide vectorized Parquet 
reader initialization

## What changes were proposed in this pull request?

In `SpecificParquetRecordReaderBase`, which is used by the vectorized 
Parquet reader, we convert the Parquet requested schema into a Spark schema to 
guide column reader initialization. However, the Parquet requested schema is 
tailored from the schema of the physical file being scanned, and may have 
inaccurate type information due to bugs of other systems (e.g. HIVE-14294).

On the other hand, we already set the real Spark requested schema into 
Hadoop configuration in [`ParquetFileFormat`][1]. This PR simply reads out this 
schema to replace the converted one.

## How was this patch tested?

New test case added in `ParquetQuerySuite`.

[1]: 
https://github.com/apache/spark/blob/v2.0.0-rc5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L292-L294

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-16632-simpler-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14278


commit 2ade381403080d1390a34b44366ade05f42f6d4f
Author: Cheng Lian 
Date:   2016-07-20T06:31:10Z

Fixes SPARK-16632




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14086
  
The descriptions of PR/code are updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71472687
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

The current exception message is "Column `seq` not found".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71472473
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -95,17 +95,41 @@ case class CreateDataSourceTableCommand(
   }
 
 // Create the relation to validate the arguments before writing the 
metadata to the metastore.
-DataSource(
-  sparkSession = sparkSession,
-  userSpecifiedSchema = userSpecifiedSchema,
-  className = provider,
-  bucketSpec = None,
-  options = optionsWithPath).resolveRelation(checkPathExist = false)
+val dataSource: BaseRelation =
+  DataSource(
+sparkSession = sparkSession,
+userSpecifiedSchema = userSpecifiedSchema,
+className = provider,
+bucketSpec = None,
+options = optionsWithPath).resolveRelation(checkPathExist = false)
+
+val partitionColumns =
--- End diff --

Sure, will do it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62577/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62577/consoleFull)**
 for PR 14132 at commit 
[`e8b7bf0`](https://github.com/apache/spark/commit/e8b7bf0f3d88986048cd586ccc13209ee1611cd7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71472087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

Nope, dropping index does not make sense here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71471996
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

- Drop, Create and Insert: Create and Insert could fail, but we still drop 
the table. 
- Truncate and Insert: Insert could fail, but we always truncate the table. 

I think it is OK to raise an exception here, but check whether the 
exception message is meaningful or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71471942
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,165 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  test("Create partitioned data source table with partitioning columns but 
no schema") {
+import testImplicits._
+
+withTempPath { dir =>
+  val pathToPartitionedTable = new File(dir, "partitioned")
+  val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+  
df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath)
+  val tabName = "tab1"
+  withTable(tabName) {
+spark.sql(
+  s"""
+ |CREATE TABLE $tabName
+ |USING parquet
+ |OPTIONS (
+ |  path '$pathToPartitionedTable'
+ |)
+ |PARTITIONED BY (inexistentColumns)
+   """.stripMargin)
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
--- End diff --

Sure, will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71471878
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

Yeah!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71471414
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

so `SELECT b, b FROM oneToTenPruned` will return 2 columns and `SELECT b as 
alias_b, b FROM oneToTenPruned` only returns one column?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71471142
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

Sure. I'll update the document and PR description more clearly.

Thank you for guidance, @rxin and @gatorsmile .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71471136
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -95,17 +95,41 @@ case class CreateDataSourceTableCommand(
   }
 
 // Create the relation to validate the arguments before writing the 
metadata to the metastore.
-DataSource(
-  sparkSession = sparkSession,
-  userSpecifiedSchema = userSpecifiedSchema,
-  className = provider,
-  bucketSpec = None,
-  options = optionsWithPath).resolveRelation(checkPathExist = false)
+val dataSource: BaseRelation =
+  DataSource(
+sparkSession = sparkSession,
+userSpecifiedSchema = userSpecifiedSchema,
+className = provider,
+bucketSpec = None,
+options = optionsWithPath).resolveRelation(checkPathExist = false)
+
+val partitionColumns =
--- End diff --

IIUC, the logic should be: if schema is specified, use the given partition 
columns, else, infer it. Maybe it's more clear to write:
```
val partitionColumns = if (userSpecifiedSchema.isEmpty) {
  if (userSpecifiedPartitionColumns.length > 0) {
...
  }
  dataSource match {
...
  }
} else {
  userSpecifiedPartitionColumns
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14098: [WIP][SPARK-16380][SQL][Example]:Update SQL examples and...

2016-07-19 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/14098
  
@liancheng Sorry for replying late. I was on vacation last a few days.

I have addressed most of your comments. Only the .md file is not updated 
yet. 

By the way, I am trying to make the hive example work, but I still can not 
get it work. Any suggestions? I found that pyspark sql is different from the 
corresponding scala hive example.

Thanks!

Miao


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71471006
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

For my understanding, I will ask one question. 

Literally, we should not do whatever we do with drop, e.g., we should not 
drop INDEX, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71470908
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -522,31 +522,31 @@ object DDLUtils {
 table.partitionColumns.nonEmpty || 
table.properties.contains(DATASOURCE_SCHEMA_NUMPARTCOLS)
   }
 
-  // A persisted data source table may not store its schema in the 
catalog. In this case, its schema
-  // will be inferred at runtime when the table is referenced.
-  def getSchemaFromTableProperties(metadata: CatalogTable): 
Option[StructType] = {
+  // A persisted data source table always store its schema in the catalog.
+  def getSchemaFromTableProperties(metadata: CatalogTable): StructType = {
 require(isDatasourceTable(metadata))
+val msgSchemaCorrupted = "Could not read schema from the metastore 
because it is corrupted."
 val props = metadata.properties
 if (props.isDefinedAt(DATASOURCE_SCHEMA)) {
--- End diff --

Sure, let me change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71470907
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -522,31 +522,31 @@ object DDLUtils {
 table.partitionColumns.nonEmpty || 
table.properties.contains(DATASOURCE_SCHEMA_NUMPARTCOLS)
   }
 
-  // A persisted data source table may not store its schema in the 
catalog. In this case, its schema
-  // will be inferred at runtime when the table is referenced.
-  def getSchemaFromTableProperties(metadata: CatalogTable): 
Option[StructType] = {
+  // A persisted data source table always store its schema in the catalog.
+  def getSchemaFromTableProperties(metadata: CatalogTable): StructType = {
 require(isDatasourceTable(metadata))
+val msgSchemaCorrupted = "Could not read schema from the metastore 
because it is corrupted."
 val props = metadata.properties
 if (props.isDefinedAt(DATASOURCE_SCHEMA)) {
--- End diff --

Sure, let me change it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71470873
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

Currently, it raises exceptions if one of the column names is different.
For the different column type with same column name, it works like 
`SaveMode.Append` operation.
I thought for the trade-off between DROP and TRUNCATE.

Let me think about the decision point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71470834
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

I see. Then, the current implementation looks good to me. 

@dongjoon-hyun Could you summarize the previous discussion and design 
decision we made? Document them in the PR description. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71470616
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,165 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  test("Create partitioned data source table with partitioning columns but 
no schema") {
+import testImplicits._
+
+withTempPath { dir =>
+  val pathToPartitionedTable = new File(dir, "partitioned")
+  val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+  
df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath)
+  val tabName = "tab1"
+  withTable(tabName) {
+spark.sql(
+  s"""
+ |CREATE TABLE $tabName
+ |USING parquet
+ |OPTIONS (
+ |  path '$pathToPartitionedTable'
+ |)
+ |PARTITIONED BY (inexistentColumns)
+   """.stripMargin)
+val tableMetadata = 
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName))
--- End diff --

we can abstract common logic into some methods, to remove duplicated code a 
bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71470476
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

First of all, it will raise exceptions if one of the column names are 
different.
For the different column type with same column name, it will work like 
`SaveMode.Append` operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71470367
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

We should do whatever we do with drop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71470370
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -522,31 +522,31 @@ object DDLUtils {
 table.partitionColumns.nonEmpty || 
table.properties.contains(DATASOURCE_SCHEMA_NUMPARTCOLS)
   }
 
-  // A persisted data source table may not store its schema in the 
catalog. In this case, its schema
-  // will be inferred at runtime when the table is referenced.
-  def getSchemaFromTableProperties(metadata: CatalogTable): 
Option[StructType] = {
+  // A persisted data source table always store its schema in the catalog.
+  def getSchemaFromTableProperties(metadata: CatalogTable): StructType = {
 require(isDatasourceTable(metadata))
+val msgSchemaCorrupted = "Could not read schema from the metastore 
because it is corrupted."
 val props = metadata.properties
 if (props.isDefinedAt(DATASOURCE_SCHEMA)) {
--- End diff --

how about
```
props.get(DATASOURCE_SCHEMA).map { schema =>
  // 
  DataType.fromJson(schema).asInstanceOf[StructType]
}.getOrElse {
  props.get(DATASOURCE_SCHEMA_NUMPARTS).map {

  }.getOrElse(throw ...)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71469619
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

: ) Sure. Then, the next design question to @rxin and @srowen 

Should we still truncate the table if the table schema does not match the 
schema of new table? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62576/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14272
  
yea. I think the fix is pretty safe. After discussion with @liancheng, 
seems the more general fix is to just to use the requested catalyst schema to 
initialize the vectorized reader.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62576/consoleFull)**
 for PR 14045 at commit 
[`cc35cab`](https://github.com/apache/spark/commit/cc35cabac105b3778c26afc22ac4f4ca1b295585).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14272
  
Discussed with @yhuai, I'm also merging this to branch-2.0.

@vanzin Thanks for fixing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71468781
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

I'd say no, because user has explicitly specified truncate. They can turn 
if off themselves.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14086: [SPARK-16463][SQL] Support `truncate` option in O...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14086#discussion_r71468419
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -419,8 +422,13 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   }
 
   if (mode == SaveMode.Overwrite && tableExists) {
-JdbcUtils.dropTable(conn, table)
-tableExists = false
+if (extraOptions.getOrElse("truncate", "false").toBoolean &&
+JdbcUtils.isCascadingTruncateTable(url) == Some(false)) {
+  JdbcUtils.truncateTable(conn, table)
--- End diff --

If `truncateTable` failed due to a non fatal exception, should we fall back 
to the previous way (i.e., drop and create)? This is a design decision. CC 
@srowen @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14264: [SPARK-11976][SPARKR] Support "." character in Da...

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14264#discussion_r71467294
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -201,6 +201,8 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   attribute: Attribute): Option[(Attribute, List[String])] = {
 if (!attribute.isGenerated && resolver(attribute.name, 
nameParts.head)) {
   Option((attribute.withName(nameParts.head), nameParts.tail.toList))
+} else if (!attribute.isGenerated && resolver(attribute.name, 
nameParts.mkString("."))) {
+  Option((attribute.withName(nameParts.mkString(".")), Nil))
--- End diff --

Hi, I'm just curious. Is it okay for other Spark module?
> Different from resolveAsTableColumn, this assumes `name` does NOT start 
with a qualifier.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62581/consoleFull)**
 for PR 14207 at commit 
[`1ee1743`](https://github.com/apache/spark/commit/1ee1743906b41ffcc182cb8c74b4134bce8a3006).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62580/consoleFull)**
 for PR 14207 at commit 
[`727ecf8`](https://github.com/apache/spark/commit/727ecf87463d6fe02cd29e0bbf3f488c043b1962).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14112: [SPARK-16240][ML] Model loading backward compatibility f...

2016-07-19 Thread GayathriMurali

Github user GayathriMurali commented on the issue:

https://github.com/apache/spark/pull/14112
  
@jkbradley Can you please help review this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14277
  
**[Test build #62579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62579/consoleFull)**
 for PR 14277 at commit 
[`c517add`](https://github.com/apache/spark/commit/c517addc2a00fca7578b4fcb1f47a7ef6f337e5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14045
  
ping @liancheng @yhuai @rxin Can you review this? I think that we should 
support complex types in vectorization to extend the coverage of performance 
improvement. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14277
  
cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function

2016-07-19 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/14277

[SPARK-16640][SQL] Add codegen for Elt function

## What changes were proposed in this pull request?

Elt function doesn't support codegen execution now. We should add the 
support.

## How was this patch tested?

Jenkins tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 elt-codegen

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14277


commit c517addc2a00fca7578b4fcb1f47a7ef6f337e5c
Author: Liang-Chi Hsieh 
Date:   2016-07-20T05:06:27Z

Add codegen for Elt function.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12054: [SPARK-14262] correct app's state after master leader ch...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12054
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62578/consoleFull)**
 for PR 14132 at commit 
[`9021975`](https://github.com/apache/spark/commit/9021975a2243153edbfa1d4f760f8fcade760513).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14272: [SPARK-16632][sql] Respect Hive schema when mergi...

2016-07-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14272


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71465508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

Also, in the PR description, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14272
  
Would like to add that AFAIK byte and short are the only problematic types 
that we don't handle before this PR. Other Hive-Parquet schema conversion 
quirks like string (translated into `binary` without `UTF8` annotation) and 
timestamp (translated into deprecated `int96`) are already worked around in 
Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14272
  
I'm merging this to master.

@yhuai Do we want this in branch-2.0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14272: [SPARK-16632][sql] Respect Hive schema when merging parq...

2016-07-19 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14272
  
This LGTM. Although it's a little bit hacky since technically the fields in 
requested schema passed to the Parquet record reader may have different 
original types (`INT_8` and `INT_16`) from the actual ones defined in the 
physical file, fortunately Parquet record reader doesn't check for original 
types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71464803
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

Oh, I missed you comment here. It's too far from the bottom now. :)
I'll add more `prerequisite, dependency assumptions` here now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62575/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #62575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62575/consoleFull)**
 for PR 14204 at commit 
[`4d1d47f`](https://github.com/apache/spark/commit/4d1d47fd9c8c4909d182e963c33c064c5bafb3e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62573/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14045
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62573 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62573/consoleFull)**
 for PR 14045 at commit 
[`545a57a`](https://github.com/apache/spark/commit/545a57a718484e61cf77653e810ed368e9381266).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71464346
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

"SELECT" occurs in the followings. But, I didn't added meaning-logic based 
on the testcases.
- aggregateToSQL
- generateToSQL  => This has only "SELECT 1"
- groupingSetToSQL
- projectToSQL
- windowToSQL

I think the test coverage enough due to it generates the above cases. If 
you suggests more testcases, I welcome. I like robustness both for this PR and 
for the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14276: [WIP][SPARK-16638][ML][Optimizer] fix L2 reg comp...

2016-07-19 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at:

https://github.com/apache/spark/pull/14276


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14132
  
**[Test build #62577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62577/consoleFull)**
 for PR 14132 at commit 
[`e8b7bf0`](https://github.com/apache/spark/commit/e8b7bf0f3d88986048cd586ccc13209ee1611cd7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71463733
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

It's the result of Window test query. For the Windows query, there were 
nested Projects.
```
test("broadcast hint with window") {
checkSQL(
  """
|SELECT /*+ MAPJOIN(parquet_t1) */
|   x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY 
x.key)
|FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
  """.stripMargin,
  "broadcast_hint_window")
  }
```

I had the same feeling why some "SELECT" doesn't happen. 

After https://issues.apache.org/jira/browse/SPARK-16576 , I think this kind 
of weirdness will be reduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62574/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14207
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62574/consoleFull)**
 for PR 14207 at commit 
[`e930819`](https://github.com/apache/spark/commit/e93081918b170d3fbd08d992ef251c83af9e433d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71463337
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14174
  
@ooq Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62570/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13704: [SPARK-15985][SQL] Eliminate redundant cast from an arra...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13704
  
**[Test build #62570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62570/consoleFull)**
 for PR 13704 at commit 
[`e4cd571`](https://github.com/apache/spark/commit/e4cd571bc07a2b8c45580d9ba60f66d5b40b7422).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14240: [SPARK-16594] [SQL] Remove Physical Plan Differences whe...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14240
  
@cloud-fan It is not a bug. I prefer to make them consistent. I did a few 
performance test and find it makes sense to return only one column, and then do 
the Filter, and then Project will generate two duplicate columns. This should 
be faster when Filter can remove most of rows. 

However, this optimization condition `projectSet.size == projects.size` is 
very specific in this rare case: `SELECT b, b FROM oneToTenPruned`. It does not 
make sense to write such columns without specifying an alias. If using the 
alias, we will always return one column. This PR removed this condition, 
instead of adding the condition into the `Data Source Table Scan`.

Let me know what is your opinion. Thanks!




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL] Support ArrayType and StructType in v...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62576/consoleFull)**
 for PR 14045 at commit 
[`cc35cab`](https://github.com/apache/spark/commit/cc35cabac105b3778c26afc22ac4f4ca1b295585).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-19 Thread ooq

Github user ooq commented on the issue:

https://github.com/apache/spark/pull/14174
  
hi @viirya , you can find the benchmark numbers here in this PR: 
https://github.com/apache/spark/pull/14266


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71461308
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -153,15 +157,113 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 requireDbExists(db)
 requireDbMatches(db, tableDefinition)
 
-if (
+if (tableDefinition.provider == Some("hive") ||
+tableDefinition.tableType == CatalogTableType.VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  import CreateDataSourceTableUtils._
+
+  val provider = tableDefinition.provider.get
+  val userSpecifiedSchema = tableDefinition.userSpecifiedSchema
+  val partitionColumns = tableDefinition.partitionColumnNames
+
+  val tableProperties = new mutable.HashMap[String, String]
+  tableProperties.put(DATASOURCE_PROVIDER, provider)
+
+  // Saves optional user specified schema.  Serialized JSON schema 
string may be too long to be
+  // stored into a single metastore SerDe property.  In this case, we 
split the JSON string and
+  // store each part as a separate SerDe property.
+  userSpecifiedSchema.foreach { schema =>
+val schemaJsonString = schema.json
+// Split the JSON string.
+val parts = schemaJsonString.grouped(4000).toSeq
--- End diff --

It is related to the limit of VARCHAR:
```
Caused by: ERROR 22001: A truncation error was encountered trying to shrink 
VARCHAR '{"type":"struct","fields":[{"name":"contributors","type":"st&' to 
length 4000.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r71461226
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -153,15 +157,113 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
 requireDbExists(db)
 requireDbMatches(db, tableDefinition)
 
-if (
+if (tableDefinition.provider == Some("hive") ||
+tableDefinition.tableType == CatalogTableType.VIEW) {
+  client.createTable(tableDefinition, ignoreIfExists)
+} else {
+  import CreateDataSourceTableUtils._
+
+  val provider = tableDefinition.provider.get
+  val userSpecifiedSchema = tableDefinition.userSpecifiedSchema
+  val partitionColumns = tableDefinition.partitionColumnNames
+
+  val tableProperties = new mutable.HashMap[String, String]
+  tableProperties.put(DATASOURCE_PROVIDER, provider)
+
+  // Saves optional user specified schema.  Serialized JSON schema 
string may be too long to be
+  // stored into a single metastore SerDe property.  In this case, we 
split the JSON string and
+  // store each part as a separate SerDe property.
+  userSpecifiedSchema.foreach { schema =>
+val schemaJsonString = schema.json
+// Split the JSON string.
+val parts = schemaJsonString.grouped(4000).toSeq
--- End diff --

Found the original PR for this config: 
https://github.com/apache/spark/pull/4795


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71460325
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -518,6 +510,19 @@ case class DescribeTableCommand(table: 
TableIdentifier, isExtended: Boolean, isF
 }
   }
 
+  private def describeSchema(
+  tableDesc: CatalogTable,
+  buffer: ArrayBuffer[Row]): Unit = {
+if (DDLUtils.isDatasourceTable(tableDesc)) {
+  DDLUtils.getSchemaFromTableProperties(tableDesc) match {
--- End diff --

Sure, will do. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71460333
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,115 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  test("Create data source table with partitioning columns but no schema") 
{
+import testImplicits._
+
+val tabName = "tab1"
+withTempPath { dir =>
+  val pathToPartitionedTable = new File(dir, "partitioned")
+  val pathToNonPartitionedTable = new File(dir, "nonPartitioned")
+  val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+  
df.write.format("parquet").save(pathToNonPartitionedTable.getCanonicalPath)
+  
df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath)
+
+  Seq(pathToPartitionedTable, pathToNonPartitionedTable).foreach { 
path =>
+withTable(tabName) {
+  spark.sql(
+s"""
+   |CREATE TABLE $tabName
+   |USING parquet
+   |OPTIONS (
+   |  path '$path'
+   |)
+   |PARTITIONED BY (inexistentColumns)
+ """.stripMargin)
+  val catalog = spark.sessionState.catalog
+  val tableMetadata = 
catalog.getTableMetadata(TableIdentifier(tabName))
+
+  val tableSchema = 
DDLUtils.getSchemaFromTableProperties(tableMetadata)
+  assert(tableSchema.nonEmpty, "the schema of data source tables 
are always recorded")
+  val partCols = 
DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata)
+
+  if (tableMetadata.storage.serdeProperties.get("path") ==
--- End diff --

Ok, no problem


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14259: [SPARK-16622][SQL] Fix NullPointerException when the ret...

2016-07-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14259
  
Use the added test case as example, the generated java code to access the 
second element in a Tuple2 `(false, null)` throws `NullPointerException`:

int value = isNull1? -1 : (Integer) obj._2();

To assign a null to int will cause `NullPointerException`. But `isNull1` 
only checks if `obj` is null or not.








---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62571/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14276
  
**[Test build #62571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62571/consoleFull)**
 for PR 14276 at commit 
[`9d4f7a8`](https://github.com/apache/spark/commit/9d4f7a8cf20bcd1f6ede46097406f235f3581b3b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14276
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...

2016-07-19 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14276
  
cc @srowen Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #62575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62575/consoleFull)**
 for PR 14204 at commit 
[`4d1d47f`](https://github.com/apache/spark/commit/4d1d47fd9c8c4909d182e963c33c064c5bafb3e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71458430
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -518,6 +510,19 @@ case class DescribeTableCommand(table: 
TableIdentifier, isExtended: Boolean, isF
 }
   }
 
+  private def describeSchema(
+  tableDesc: CatalogTable,
+  buffer: ArrayBuffer[Row]): Unit = {
+if (DDLUtils.isDatasourceTable(tableDesc)) {
+  DDLUtils.getSchemaFromTableProperties(tableDesc) match {
--- End diff --

Can we make `DDLUtils.getSchemaFromTableProperties` always return a schema 
and throw exception if it's corrupted? I think it's more consistent with the 
previous behaviour, i.e. throw exception if the expected schema properties 
doesn't exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62572/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #62572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62572/consoleFull)**
 for PR 14204 at commit 
[`f2ab3a3`](https://github.com/apache/spark/commit/f2ab3a35fee03b178e92fd1e2a5fa3763746ff96).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14132
  
Yeah, just answered the JIRA. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71457670
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

uh, yeah! please add two more spaces before `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71457606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

Could you please do more investigation on this? The current solution looks 
not clean to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14207: [SPARK-16552] [SQL] Store the Inferred Schemas into Exte...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14207
  
**[Test build #62574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62574/consoleFull)**
 for PR 14207 at commit 
[`e930819`](https://github.com/apache/spark/commit/e93081918b170d3fbd08d992ef251c83af9e433d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71457330
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -518,6 +510,19 @@ case class DescribeTableCommand(table: 
TableIdentifier, isExtended: Boolean, isF
 }
   }
 
+  private def describeSchema(
+  tableDesc: CatalogTable,
+  buffer: ArrayBuffer[Row]): Unit = {
+if (DDLUtils.isDatasourceTable(tableDesc)) {
+  DDLUtils.getSchemaFromTableProperties(tableDesc) match {
--- End diff --

Now, the message is changed to `"# Schema of this table is corrupted"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71457323
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -252,6 +252,115 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
 }
   }
 
+  test("Create data source table with partitioning columns but no schema") 
{
+import testImplicits._
+
+val tabName = "tab1"
+withTempPath { dir =>
+  val pathToPartitionedTable = new File(dir, "partitioned")
+  val pathToNonPartitionedTable = new File(dir, "nonPartitioned")
+  val df = sparkContext.parallelize(1 to 10).map(i => (i, 
i.toString)).toDF("num", "str")
+  
df.write.format("parquet").save(pathToNonPartitionedTable.getCanonicalPath)
+  
df.write.format("parquet").partitionBy("num").save(pathToPartitionedTable.getCanonicalPath)
+
+  Seq(pathToPartitionedTable, pathToNonPartitionedTable).foreach { 
path =>
+withTable(tabName) {
+  spark.sql(
+s"""
+   |CREATE TABLE $tabName
+   |USING parquet
+   |OPTIONS (
+   |  path '$path'
+   |)
+   |PARTITIONED BY (inexistentColumns)
+ """.stripMargin)
+  val catalog = spark.sessionState.catalog
+  val tableMetadata = 
catalog.getTableMetadata(TableIdentifier(tabName))
+
+  val tableSchema = 
DDLUtils.getSchemaFromTableProperties(tableMetadata)
+  assert(tableSchema.nonEmpty, "the schema of data source tables 
are always recorded")
+  val partCols = 
DDLUtils.getPartitionColumnsFromTableProperties(tableMetadata)
+
+  if (tableMetadata.storage.serdeProperties.get("path") ==
--- End diff --

hmmm, can we separate it into 2 cases instead of doing `Seq(...).foreach`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14259: [SPARK-16622][SQL] Fix NullPointerException when the ret...

2016-07-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14259
  
> When the returned value is null, NullPointerException will be thrown.

Can you explain a bit more about this? Why a method can't return null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR in...

2016-07-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14243


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #62572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62572/consoleFull)**
 for PR 14204 at commit 
[`f2ab3a3`](https://github.com/apache/spark/commit/f2ab3a35fee03b178e92fd1e2a5fa3763746ff96).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14045: [SPARK-16362][SQL][WIP] Support ArrayType and StructType...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14045
  
**[Test build #62573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62573/consoleFull)**
 for PR 14045 at commit 
[`545a57a`](https://github.com/apache/spark/commit/545a57a718484e61cf77653e810ed368e9381266).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include j...

2016-07-19 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14243
  
Thanks @sun-rui - Merging this to master and branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14207: [SPARK-16552] [SQL] Store the Inferred Schemas in...

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14207#discussion_r71456601
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -518,6 +510,19 @@ case class DescribeTableCommand(table: 
TableIdentifier, isExtended: Boolean, isF
 }
   }
 
+  private def describeSchema(
+  tableDesc: CatalogTable,
+  buffer: ArrayBuffer[Row]): Unit = {
+if (DDLUtils.isDatasourceTable(tableDesc)) {
+  DDLUtils.getSchemaFromTableProperties(tableDesc) match {
--- End diff --

For all types of data source tables, we store the schema in the table 
properties. Thus, we should not return None; unless the table properties are 
modified by users using the `Alter Table` command. 

Sorry, forgot to update the message. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14276: [SPARK-16638][ML][Optimizer] fix L2 reg computation in l...

2016-07-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14276
  
**[Test build #62571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62571/consoleFull)**
 for PR 14276 at commit 
[`9d4f7a8`](https://github.com/apache/spark/commit/9d4f7a8cf20bcd1f6ede46097406f235f3581b3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14267: [SPARK-15705] [SQL] Change the default value of spark.sq...

2016-07-19 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/14267
  
Thanks for notifying @rxin. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 592 matches

Mail list logo