from:"yhuai"

spark git commit: [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now

2016-05-27 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 b3845fede -> b430aa98c


[SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now

## What changes were proposed in this pull request?
The test cases for  `list` command added in `CliSuite` by PR #13212 can not run 
in some jenkins jobs after being merged.
However, some jenkins jobs can pass:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.2/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.3/

Others failed on this test case. But the failures on those jobs are at slightly 
different checkpoints among different jobs too. So it seems that CliSuite's 
output capture is flaky for list commands to check for expected output. There 
are test cases already in `HiveQuerySuite` and `SparkContextSuite` to cover the 
cases. So I am ignoring 2 test cases added by PR #13212 .

Author: Xin Wu 

Closes #13276 from xwu0226/SPARK-15431-clisuite.

(cherry picked from commit 6f95c6c030db0057de213733c2bd3453463bc6f2)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b430aa98
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b430aa98
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b430aa98

Branch: refs/heads/branch-2.0
Commit: b430aa98caa16978cd53dd354423cac45410c284
Parents: b3845fe
Author: Xin Wu 
Authored: Fri May 27 08:54:14 2016 -0700
Committer: Yin Huai 
Committed: Fri May 27 08:54:54 2016 -0700

--
 .../scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b430aa98/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
index 2bf0221..656fe97 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
@@ -239,7 +239,7 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll 
with Logging {
   "" -> "This is a test for Spark-11624")
   }
 
-  test("list jars") {
+  ignore("list jars") {
 val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
 runCliWithin(2.minute)(
   s"ADD JAR $jarFile" -> "",
@@ -248,7 +248,7 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll 
with Logging {
 )
   }
 
-  test("list files") {
+  ignore("list files") {
 val dataFilePath = Thread.currentThread().getContextClassLoader
   .getResource("data/files/small_kv.txt")
 runCliWithin(2.minute)(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL][HOTFIX] ignore 'list' comma...

2016-05-27 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13276#issuecomment-222183974
  
Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-27 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r64926605
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for 
Spark-11624\";"))(
   "" -> "This is a test for Spark-11624")
   }
+
+  test("list jars") {
+val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
+runCliWithin(2.minute)(
+  s"ADD JAR $jarFile" -> "",
+  s"LIST JARS" -> "TestUDTF.jar",
+  s"List JAR $jarFile" -> "TestUDTF.jar"
+)
+  }
+
+  test("list files") {
+val dataFilePath = Thread.currentThread().getContextClassLoader
+  .getResource("data/files/small_kv.txt")
+runCliWithin(2.minute)(
+  s"ADD FILE $dataFilePath" -> "",
+  s"LIST FILES" -> "small_kv.txt",
+  s"LIST FILE $dataFilePath" -> "small_kv.txt"
+)
+  }
--- End diff --

Seems it fails all maven builds. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now

2016-05-27 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master d5911d117 -> 6f95c6c03


[SPARK-15431][SQL][HOTFIX] ignore 'list' command testcase from CliSuite for now

## What changes were proposed in this pull request?
The test cases for  `list` command added in `CliSuite` by PR #13212 can not run 
in some jenkins jobs after being merged.
However, some jenkins jobs can pass:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.2/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.3/

Others failed on this test case. But the failures on those jobs are at slightly 
different checkpoints among different jobs too. So it seems that CliSuite's 
output capture is flaky for list commands to check for expected output. There 
are test cases already in `HiveQuerySuite` and `SparkContextSuite` to cover the 
cases. So I am ignoring 2 test cases added by PR #13212 .

Author: Xin Wu 

Closes #13276 from xwu0226/SPARK-15431-clisuite.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f95c6c0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f95c6c0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f95c6c0

Branch: refs/heads/master
Commit: 6f95c6c030db0057de213733c2bd3453463bc6f2
Parents: d5911d1
Author: Xin Wu 
Authored: Fri May 27 08:54:14 2016 -0700
Committer: Yin Huai 
Committed: Fri May 27 08:54:14 2016 -0700

--
 .../scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f95c6c0/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
index 2bf0221..656fe97 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
@@ -239,7 +239,7 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll 
with Logging {
   "" -> "This is a test for Spark-11624")
   }
 
-  test("list jars") {
+  ignore("list jars") {
 val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
 runCliWithin(2.minute)(
   s"ADD JAR $jarFile" -> "",
@@ -248,7 +248,7 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll 
with Logging {
 )
   }
 
-  test("list files") {
+  ignore("list files") {
 val dataFilePath = Thread.currentThread().getContextClassLoader
   .getResource("data/files/small_kv.txt")
 runCliWithin(2.minute)(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL][HOTFIX] ignore 'list' comma...

2016-05-27 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13276#issuecomment-222183379
  
It causes master builds to fail. I am merging it. Is the bug in the 
implementation or in our test code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-27 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13270#discussion_r64922718
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -212,11 +212,46 @@ class SessionCatalog(
* If no such database is specified, create it in the current database.
*/
   def createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): 
Unit = {
-val db = 
formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
-val table = formatTableName(tableDefinition.identifier.table)
+val tableId = tableDefinition.identifier
+val db = 
formatDatabaseName(tableId.database.getOrElse(getCurrentDatabase))
+val table = formatTableName(tableId.table)
 val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
 requireDbExists(db)
-externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
+
+if (
+  // If this is an external data source table...
+  tableDefinition.properties.contains("spark.sql.sources.provider") &&
+  newTableDefinition.tableType == CatalogTableType.EXTERNAL &&
+  // ... that is not persisted as Hive compatible format (external 
tables in Hive compatible
+  // format always set `locationUri` to the actual data location and 
should NOT be hacked as
+  // following.)
+  tableDefinition.storage.locationUri.isEmpty
+) {
+  // !! HACK ALERT !!
+  //
+  // Due to a restriction of Hive metastore, here we have to set 
`locationUri` to a temporary
+  // directory that doesn't exist yet but can definitely be 
successfully created, and then
+  // delete it right after creating the external data source table. 
This location will be
+  // persisted to Hive metastore as standard Hive table location URI, 
but Spark SQL doesn't
+  // really use it. Also, since we only do this workaround for 
external tables, deleting the
+  // directory after the fact doesn't do any harm.
+  //
+  // Please refer to https://issues.apache.org/jira/browse/SPARK-15269 
for more details.
+
+  val tempPath =
+new Path(defaultTablePath(tableId).stripSuffix(Path.SEPARATOR) + 
"-__PLACEHOLDER__")
+
+  try {
+externalCatalog.createTable(
+  db,
+  newTableDefinition.withNewStorage(locationUri = 
Some(tempPath.toString)),
+  ignoreIfExists)
+  } finally {
+FileSystem.get(tempPath.toUri, hadoopConf).delete(tempPath, true)
+  }
+} else {
+  externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
+}
--- End diff --

I think Hive-specific details/hacks should not be exposed in 
`SessionCatalog`. Let's move it into `HiveExternalCatalog`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15565] [SQL] Add the File Scheme to the...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13348#issuecomment-222065198
  
I created the jira because of 
https://issues.apache.org/jira/browse/SPARK-15034?focusedCommentId=15301508=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15301508.
 Since `System.getProperty("user.dir")` is pointing to a local dir, seems it 
makes to explicitly set the scheme. It may introduce confusion if we add the 
default fs scheme to the path generated by `System.getProperty("user.dir")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15565] [SQL] Add the File Scheme to the...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13348#discussion_r64858995
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -55,7 +56,7 @@ object SQLConf {
   val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
 .doc("The default location for managed databases and tables.")
 .stringConf
-.createWithDefault("${system:user.dir}/spark-warehouse")
+.createWithDefault("${system:user.dir}" + File.separator + 
"spark-warehouse")
--- End diff --

Why not just change the default value to 
`file:/${system:user.dir}/spark-warehouse`? I do not really understand why we 
have to change `def warehousePath`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15565] [SQL] Add the File Scheme to the...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13348#discussion_r64858379
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -665,7 +666,8 @@ private[sql] class SQLConf extends Serializable with 
CatalystConf with Logging {
   def variableSubstituteDepth: Int = getConf(VARIABLE_SUBSTITUTE_DEPTH)
 
   def warehousePath: String = {
-getConf(WAREHOUSE_PATH).replace("${system:user.dir}", 
System.getProperty("user.dir"))
+getConf(WAREHOUSE_PATH).replace("${system:user.dir}",
+  new 
File(System.getProperty("user.dir")).toURI.toURL.toString).replace("//", "/")
--- End diff --

When do we need `.replace("//", "/")`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15594][SQL] ALTER TABLE SERDEPROPERTIES...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13343#discussion_r64856240
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -871,6 +879,58 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  private def testSetSerdePartition(isDatasourceTable: Boolean): Unit = {
+val catalog = spark.sessionState.catalog
+val tableIdent = TableIdentifier("tab1", Some("dbx"))
+val spec = Map("a" -> "1", "b" -> "2")
+createDatabase(catalog, "dbx")
+createTable(catalog, tableIdent)
+createTablePartition(catalog, spec, tableIdent)
+createTablePartition(catalog, Map("a" -> "1", "b" -> "3"), tableIdent)
+createTablePartition(catalog, Map("a" -> "2", "b" -> "2"), tableIdent)
+createTablePartition(catalog, Map("a" -> "2", "b" -> "3"), tableIdent)
+if (isDatasourceTable) {
+  convertToDatasourceTable(catalog, tableIdent)
+}
+assert(catalog.getPartition(tableIdent, spec).storage.serde.isEmpty)
+assert(catalog.getPartition(tableIdent, 
spec).storage.serdeProperties.isEmpty)
+// set table serde and/or properties (should fail on datasource tables)
+if (isDatasourceTable) {
+  val e1 = intercept[AnalysisException] {
+sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'whatever'")
+  }
+  val e2 = intercept[AnalysisException] {
+sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'org.apache.madoop' " +
+  "WITH SERDEPROPERTIES ('k' = 'v', 'kay' = 'vee')")
+  }
+  assert(e1.getMessage.contains("datasource"))
+  assert(e2.getMessage.contains("datasource"))
+} else {
+  sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'org.apache.jadoop'")
+  assert(catalog.getPartition(tableIdent, spec).storage.serde == 
Some("org.apache.jadoop"))
+  assert(catalog.getPartition(tableIdent, 
spec).storage.serdeProperties.isEmpty)
+  sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'org.apache.madoop' " +
+"WITH SERDEPROPERTIES ('k' = 'v', 'kay' = 'vee')")
+  assert(catalog.getPartition(tableIdent, spec).storage.serde == 
Some("org.apache.madoop"))
+  assert(catalog.getPartition(tableIdent, 
spec).storage.serdeProperties ==
+Map("k" -> "v", "kay" -> "vee"))
+}
+// set serde properties only
+sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) " +
+  "SET SERDEPROPERTIES ('k' = 'vvv', 'kay' = 'vee')")
--- End diff --

For data source tables, seems we do not store any partition specs. I am 
wondering why metastore does not complain?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15565] [SQL] Add the File Scheme to the...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13348#discussion_r64853158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -55,7 +56,7 @@ object SQLConf {
   val WAREHOUSE_PATH = SQLConfigBuilder("spark.sql.warehouse.dir")
 .doc("The default location for managed databases and tables.")
 .stringConf
-.createWithDefault("${system:user.dir}/spark-warehouse")
+.createWithDefault("${system:user.dir}" + File.separator + 
"spark-warehouse")
--- End diff --

Although the path points to a local dir, this path is actually a hadoop 
filesystem path. So, I am wondering if we need to use `File.separator` at here? 
If we are running on Windows, does a hadoop filesystem path uses `\` as the 
separator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-15583][SQL] Disallow altering datasource properties

2016-05-26 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 702755f92 -> 8e26b74fc


[SPARK-15583][SQL] Disallow altering datasource properties

## What changes were proposed in this pull request?

Certain table properties (and SerDe properties) are in the protected namespace 
`spark.sql.sources.`, which we use internally for datasource tables. The user 
should not be allowed to

(1) Create a Hive table setting these properties
(2) Alter these properties in an existing table

Previously, we threw an exception if the user tried to alter the properties of 
an existing datasource table. However, this is overly restrictive for 
datasource tables and does not do anything for Hive tables.

## How was this patch tested?

DDLSuite

Author: Andrew Or 

Closes #13341 from andrewor14/alter-table-props.

(cherry picked from commit 3fca635b4ed322208debcd89a539e42cdde6bbd4)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8e26b74f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8e26b74f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8e26b74f

Branch: refs/heads/branch-2.0
Commit: 8e26b74fccc8e7b52db1011f6d6e295c6ba0c5aa
Parents: 702755f
Author: Andrew Or 
Authored: Thu May 26 20:11:09 2016 -0700
Committer: Yin Huai 
Committed: Thu May 26 20:11:19 2016 -0700

--
 .../command/createDataSourceTables.scala|  17 +++
 .../spark/sql/execution/command/ddl.scala   |  37 +++--
 .../spark/sql/execution/command/tables.scala|   2 +
 .../spark/sql/execution/command/DDLSuite.scala  | 148 ---
 .../spark/sql/hive/execution/HiveDDLSuite.scala |   2 +-
 5 files changed, 139 insertions(+), 67 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8e26b74f/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
index 6ca66a2..deedb68 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
@@ -255,6 +255,23 @@ case class CreateDataSourceTableAsSelectCommand(
 
 
 object CreateDataSourceTableUtils extends Logging {
+
+  // TODO: Actually replace usages with these variables (SPARK-15584)
+
+  val DATASOURCE_PREFIX = "spark.sql.sources."
+  val DATASOURCE_PROVIDER = DATASOURCE_PREFIX + "provider"
+  val DATASOURCE_WRITEJOBUUID = DATASOURCE_PREFIX + "writeJobUUID"
+  val DATASOURCE_OUTPUTPATH = DATASOURCE_PREFIX + "output.path"
+  val DATASOURCE_SCHEMA_PREFIX = DATASOURCE_PREFIX + "schema."
+  val DATASOURCE_SCHEMA_NUMPARTS = DATASOURCE_SCHEMA_PREFIX + "numParts"
+  val DATASOURCE_SCHEMA_NUMPARTCOLS = DATASOURCE_SCHEMA_PREFIX + "numPartCols"
+  val DATASOURCE_SCHEMA_NUMBUCKETS = DATASOURCE_SCHEMA_PREFIX + "numBuckets"
+  val DATASOURCE_SCHEMA_NUMBUCKETCOLS = DATASOURCE_SCHEMA_PREFIX + 
"numBucketCols"
+  val DATASOURCE_SCHEMA_PART_PREFIX = DATASOURCE_SCHEMA_PREFIX + "part."
+  val DATASOURCE_SCHEMA_PARTCOL_PREFIX = DATASOURCE_SCHEMA_PREFIX + "partCol."
+  val DATASOURCE_SCHEMA_BUCKETCOL_PREFIX = DATASOURCE_SCHEMA_PREFIX + 
"bucketCol."
+  val DATASOURCE_SCHEMA_SORTCOL_PREFIX = DATASOURCE_SCHEMA_PREFIX + "sortCol."
+
   /**
* Checks if the given name conforms the Hive standard ("[a-zA-z_0-9]+"),
* i.e. if this name only contains characters, numbers, and _.

http://git-wip-us.apache.org/repos/asf/spark/blob/8e26b74f/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index 7ce7bb9..15eba3b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -25,6 +25,7 @@ import 
org.apache.spark.sql.catalyst.catalog.{CatalogDatabase, CatalogTable}
 import org.apache.spark.sql.catalyst.catalog.{CatalogTablePartition, 
CatalogTableType, SessionCatalog}
 import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
+import 
org.apache.spark.sql.execution.command.CreateDataSourceTableUtils.DATASOURCE_PREFIX
 import org.apache.spark.sql.execution.datasources.BucketSpec

spark git commit: [SPARK-15583][SQL] Disallow altering datasource properties

2016-05-26 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 6ab973ec5 -> 3fca635b4


[SPARK-15583][SQL] Disallow altering datasource properties

## What changes were proposed in this pull request?

Certain table properties (and SerDe properties) are in the protected namespace 
`spark.sql.sources.`, which we use internally for datasource tables. The user 
should not be allowed to

(1) Create a Hive table setting these properties
(2) Alter these properties in an existing table

Previously, we threw an exception if the user tried to alter the properties of 
an existing datasource table. However, this is overly restrictive for 
datasource tables and does not do anything for Hive tables.

## How was this patch tested?

DDLSuite

Author: Andrew Or 

Closes #13341 from andrewor14/alter-table-props.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3fca635b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3fca635b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3fca635b

Branch: refs/heads/master
Commit: 3fca635b4ed322208debcd89a539e42cdde6bbd4
Parents: 6ab973e
Author: Andrew Or 
Authored: Thu May 26 20:11:09 2016 -0700
Committer: Yin Huai 
Committed: Thu May 26 20:11:09 2016 -0700

--
 .../command/createDataSourceTables.scala|  17 +++
 .../spark/sql/execution/command/ddl.scala   |  37 +++--
 .../spark/sql/execution/command/tables.scala|   2 +
 .../spark/sql/execution/command/DDLSuite.scala  | 148 ---
 .../spark/sql/hive/execution/HiveDDLSuite.scala |   2 +-
 5 files changed, 139 insertions(+), 67 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3fca635b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
index 6ca66a2..deedb68 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
@@ -255,6 +255,23 @@ case class CreateDataSourceTableAsSelectCommand(
 
 
 object CreateDataSourceTableUtils extends Logging {
+
+  // TODO: Actually replace usages with these variables (SPARK-15584)
+
+  val DATASOURCE_PREFIX = "spark.sql.sources."
+  val DATASOURCE_PROVIDER = DATASOURCE_PREFIX + "provider"
+  val DATASOURCE_WRITEJOBUUID = DATASOURCE_PREFIX + "writeJobUUID"
+  val DATASOURCE_OUTPUTPATH = DATASOURCE_PREFIX + "output.path"
+  val DATASOURCE_SCHEMA_PREFIX = DATASOURCE_PREFIX + "schema."
+  val DATASOURCE_SCHEMA_NUMPARTS = DATASOURCE_SCHEMA_PREFIX + "numParts"
+  val DATASOURCE_SCHEMA_NUMPARTCOLS = DATASOURCE_SCHEMA_PREFIX + "numPartCols"
+  val DATASOURCE_SCHEMA_NUMBUCKETS = DATASOURCE_SCHEMA_PREFIX + "numBuckets"
+  val DATASOURCE_SCHEMA_NUMBUCKETCOLS = DATASOURCE_SCHEMA_PREFIX + 
"numBucketCols"
+  val DATASOURCE_SCHEMA_PART_PREFIX = DATASOURCE_SCHEMA_PREFIX + "part."
+  val DATASOURCE_SCHEMA_PARTCOL_PREFIX = DATASOURCE_SCHEMA_PREFIX + "partCol."
+  val DATASOURCE_SCHEMA_BUCKETCOL_PREFIX = DATASOURCE_SCHEMA_PREFIX + 
"bucketCol."
+  val DATASOURCE_SCHEMA_SORTCOL_PREFIX = DATASOURCE_SCHEMA_PREFIX + "sortCol."
+
   /**
* Checks if the given name conforms the Hive standard ("[a-zA-z_0-9]+"),
* i.e. if this name only contains characters, numbers, and _.

http://git-wip-us.apache.org/repos/asf/spark/blob/3fca635b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
index 7ce7bb9..15eba3b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
@@ -25,6 +25,7 @@ import 
org.apache.spark.sql.catalyst.catalog.{CatalogDatabase, CatalogTable}
 import org.apache.spark.sql.catalyst.catalog.{CatalogTablePartition, 
CatalogTableType, SessionCatalog}
 import org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
 import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
+import 
org.apache.spark.sql.execution.command.CreateDataSourceTableUtils.DATASOURCE_PREFIX
 import org.apache.spark.sql.execution.datasources.BucketSpec
 import org.apache.spark.sql.types._
 
@@ -228,15 +229,13 @@ case class AlterTableSetPropertiesCommand(
   extends

[GitHub] spark pull request: [SPARK-15583][SQL] Disallow altering datasourc...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13341#issuecomment-222048281
  
Thanks. Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8428][SPARK-13850] Fix integer overflow...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13336#issuecomment-222045290
  
sorry. It has been fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8428][SPARK-13850] Fix integer overflow...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13336#issuecomment-222045184
  
Seems it breaks 1.6 build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15594][SQL] ALTER TABLE SERDEPROPERTIES...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13343#discussion_r64849728
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -871,6 +879,58 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  private def testSetSerdePartition(isDatasourceTable: Boolean): Unit = {
+val catalog = spark.sessionState.catalog
+val tableIdent = TableIdentifier("tab1", Some("dbx"))
+val spec = Map("a" -> "1", "b" -> "2")
+createDatabase(catalog, "dbx")
+createTable(catalog, tableIdent)
+createTablePartition(catalog, spec, tableIdent)
+createTablePartition(catalog, Map("a" -> "1", "b" -> "3"), tableIdent)
+createTablePartition(catalog, Map("a" -> "2", "b" -> "2"), tableIdent)
+createTablePartition(catalog, Map("a" -> "2", "b" -> "3"), tableIdent)
+if (isDatasourceTable) {
+  convertToDatasourceTable(catalog, tableIdent)
+}
+assert(catalog.getPartition(tableIdent, spec).storage.serde.isEmpty)
+assert(catalog.getPartition(tableIdent, 
spec).storage.serdeProperties.isEmpty)
+// set table serde and/or properties (should fail on datasource tables)
+if (isDatasourceTable) {
+  val e1 = intercept[AnalysisException] {
+sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'whatever'")
+  }
+  val e2 = intercept[AnalysisException] {
+sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'org.apache.madoop' " +
+  "WITH SERDEPROPERTIES ('k' = 'v', 'kay' = 'vee')")
+  }
+  assert(e1.getMessage.contains("datasource"))
+  assert(e2.getMessage.contains("datasource"))
+} else {
+  sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'org.apache.jadoop'")
+  assert(catalog.getPartition(tableIdent, spec).storage.serde == 
Some("org.apache.jadoop"))
+  assert(catalog.getPartition(tableIdent, 
spec).storage.serdeProperties.isEmpty)
+  sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) SET SERDE 
'org.apache.madoop' " +
+"WITH SERDEPROPERTIES ('k' = 'v', 'kay' = 'vee')")
+  assert(catalog.getPartition(tableIdent, spec).storage.serde == 
Some("org.apache.madoop"))
+  assert(catalog.getPartition(tableIdent, 
spec).storage.serdeProperties ==
+Map("k" -> "v", "kay" -> "vee"))
+}
+// set serde properties only
+sql("ALTER TABLE dbx.tab1 PARTITION (a=1, b=2) " +
+  "SET SERDEPROPERTIES ('k' = 'vvv', 'kay' = 'vee')")
--- End diff --

Do data source tables allow this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15583][SQL] Disallow altering datasourc...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13341#issuecomment-222044580
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-15532][SQL] SQLContext/HiveContext's public constructors should use SparkSession.build.getOrCreate

2016-05-26 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4939c2a12 -> d3cd579d7


[SPARK-15532][SQL] SQLContext/HiveContext's public constructors should use 
SparkSession.build.getOrCreate

## What changes were proposed in this pull request?
This PR changes SQLContext/HiveContext's public constructor to use 
SparkSession.build.getOrCreate and removes isRootContext from SQLContext.

## How was this patch tested?
Existing tests.

Author: Yin Huai <yh...@databricks.com>

Closes #13310 from yhuai/SPARK-15532.

(cherry picked from commit 3ac2363d757cc9cebc627974f17ecda3a263efdf)
Signed-off-by: Yin Huai <yh...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3cd579d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3cd579d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3cd579d

Branch: refs/heads/branch-2.0
Commit: d3cd579d7cdc1d2642186041dfbe296b266e4069
Parents: 4939c2a
Author: Yin Huai <yh...@databricks.com>
Authored: Thu May 26 16:53:31 2016 -0700
Committer: Yin Huai <yh...@databricks.com>
Committed: Thu May 26 16:54:03 2016 -0700

--
 .../spark/mllib/classification/LogisticRegression.scala   |  2 +-
 project/MimaExcludes.scala|  2 ++
 .../src/main/scala/org/apache/spark/sql/SQLContext.scala  | 10 ++
 .../main/scala/org/apache/spark/sql/SparkSession.scala|  4 ++--
 .../scala/org/apache/spark/sql/hive/HiveContext.scala | 10 --
 .../scala/org/apache/spark/sql/hive/test/TestHive.scala   |  9 -
 6 files changed, 15 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d3cd579d/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
index adbcdd3..4bba2ea 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
@@ -437,7 +437,7 @@ class LogisticRegressionWithLBFGS
 lr.setMaxIter(optimizer.getNumIterations())
 lr.setTol(optimizer.getConvergenceTol())
 // Convert our input into a DataFrame
-val sqlContext = new SQLContext(input.context)
+val sqlContext = SQLContext.getOrCreate(input.context)
 import sqlContext.implicits._
 val df = input.map(_.asML).toDF()
 // Determine if we should cache the DF

http://git-wip-us.apache.org/repos/asf/spark/blob/d3cd579d/project/MimaExcludes.scala
--
diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala
index 08c575a..73debe9 100644
--- a/project/MimaExcludes.scala
+++ b/project/MimaExcludes.scala
@@ -54,6 +54,8 @@ object MimaExcludes {
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.coalesce"),
 
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.rdd.PartitionCoalescer$LocationIterator"),
 
ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.rdd.PartitionCoalescer"),
+// SPARK-15532 Remove isRootContext flag from SQLContext.
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.sql.SQLContext.isRootContext"),
 // SPARK-12600 Remove SQL deprecated methods
 
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.SQLContext$QueryExecution"),
 
ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.sql.SQLContext$SparkPlanner"),

http://git-wip-us.apache.org/repos/asf/spark/blob/d3cd579d/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
index b17fb8a..66d9aa2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@@ -57,9 +57,7 @@ import org.apache.spark.sql.util.ExecutionListenerManager
  * @groupname Ungrouped Support functions for language integrated queries
  * @since 1.0.0
  */
-class SQLContext private[sql](
-val sparkSession: SparkSession,
-val isRootContext: Boolean)
+class SQLContext private[sql](val sparkSession: SparkSession)
   extends Logging with Serializable {
 
   self =>
@@ -69,13 +67,9 @@ class SQLContext private

[GitHub] spark pull request: [SPARK-15532] [SQL] SQLContext/HiveContext's p...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13310#issuecomment-222026576
  
Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13310#discussion_r64823932
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -69,13 +67,30 @@ class SQLContext private[sql](
   // Note: Since Spark 2.0 this class has become a wrapper of 
SparkSession, where the
   // real functionality resides. This class remains mainly for backward 
compatibility.
 
-  private[sql] def this(sparkSession: SparkSession) = {
-this(sparkSession, true)
-  }
-
   @deprecated("Use SparkSession.builder instead", "2.0.0")
   def this(sc: SparkContext) = {
-this(new SparkSession(sc))
+this {
+  val session = new SparkSession(sc)
+  // If spark.sql.allowMultipleContexts is true, we will throw an 
exception if a user
+  // wants to create a new root SQLContext (a SQLContext that is not 
created by newSession).
+  val allowMultipleContexts = sc.conf.getBoolean(
+SQLConf.ALLOW_MULTIPLE_CONTEXTS.key,
+SQLConf.ALLOW_MULTIPLE_CONTEXTS.defaultValue.get)
+
+  // Assert no root SQLContext/SparkSession is running when 
allowMultipleContexts is false.
+  {
+val defaultSessionExists = SparkSession.getDefaultSession.isDefined
+if (!allowMultipleContexts && defaultSessionExists) {
+  val errMsg = sc.conf.get(
+SQLConf.ALLOW_MULTIPLE_CONTEXTS_ERROR_MESSAGE.key,
+SQLConf.ALLOW_MULTIPLE_CONTEXTS_ERROR_MESSAGE.defaultValue.get)
+  throw new SparkException(errMsg)
+} else if (!defaultSessionExists) {
+  SparkSession.setDefaultSession(session)
+}
+  }
+  session
+}
--- End diff --

We need to do the same thing for HiveContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15538][SPARK-15539][SQL] Truncate table...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13315#issuecomment-221997409
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13445][SQL]Improves error message and a...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/1#issuecomment-221968733
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14543] [SQL] Improve InsertIntoTable co...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12313#issuecomment-221966605
  
@rdblue Thank you for this PR. Those improvements sound good. I chatted 
with others. Here are two questions.
1. `sqlContext.table("source").write.byName.insertInto("destination")` will 
change the behavior of insertInto and make it different from SQL's insertInto. 
Looks like it is better to have another command instead of changing the 
behavior of insertInto in this way.
2. When we use resolution by name, seems we will complain if the data has 
less number of fields. What will happen if the data has more fields?

How about we first improve the position-based resolution part? Then, we 
discuss the name-based part separately?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15538][SPARK-15539][SQL] Truncate table...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13315#discussion_r64794004
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -289,37 +289,53 @@ case class TruncateTableCommand(
 val catalog = sparkSession.sessionState.catalog
 if (!catalog.tableExists(tableName)) {
   throw new AnalysisException(s"Table '$tableName' in TRUNCATE TABLE 
does not exist.")
-} else if (catalog.isTemporaryTable(tableName)) {
+}
+if (catalog.isTemporaryTable(tableName)) {
   throw new AnalysisException(
 s"Operation not allowed: TRUNCATE TABLE on temporary tables: 
'$tableName'")
+}
+val table = catalog.getTableMetadata(tableName)
+if (table.tableType == CatalogTableType.EXTERNAL) {
+  throw new AnalysisException(
+s"Operation not allowed: TRUNCATE TABLE on external tables: 
'$tableName'")
+}
+if (table.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"Operation not allowed: TRUNCATE TABLE on views: '$tableName'")
+}
+if (DDLUtils.isDatasourceTable(table) && partitionSpec.isDefined) {
+  throw new AnalysisException(
+s"Operation not allowed: TRUNCATE TABLE ... PARTITION is not 
supported " +
+s"for tables created using the data sources API: '$tableName'")
+}
+val locations = if (partitionSpec.isDefined) {
+  catalog.listPartitions(tableName, 
partitionSpec).map(_.storage.locationUri)
 } else {
-  val locations = if (partitionSpec.isDefined) {
-catalog.listPartitions(tableName, 
partitionSpec).map(_.storage.locationUri)
+  if (table.partitionColumnNames.nonEmpty) {
+catalog.listPartitions(tableName).map(_.storage.locationUri)
   } else {
-val table = catalog.getTableMetadata(tableName)
-if (table.partitionColumnNames.nonEmpty) {
-  catalog.listPartitions(tableName).map(_.storage.locationUri)
-} else {
-  Seq(table.storage.locationUri)
-}
+Seq(table.storage.locationUri)
   }
-  val hadoopConf = sparkSession.sessionState.newHadoopConf()
-  locations.foreach { location =>
-if (location.isDefined) {
-  val path = new Path(location.get)
-  try {
-val fs = path.getFileSystem(hadoopConf)
-fs.delete(path, true)
-fs.mkdirs(path)
-  } catch {
-case NonFatal(e) =>
-  throw new AnalysisException(
-s"Failed to truncate table '$tableName' when removing data 
of the path: $path " +
-  s"because of ${e.toString}")
-  }
+}
+val hadoopConf = sparkSession.sessionState.newHadoopConf()
+locations.foreach { location =>
+  if (location.isDefined) {
+val path = new Path(location.get)
+try {
+  val fs = path.getFileSystem(hadoopConf)
+  fs.delete(path, true)
+  fs.mkdirs(path)
+} catch {
+  case NonFatal(e) =>
+throw new AnalysisException(
+  s"Failed to truncate table '$tableName' when removing data 
of the path: $path " +
+s"because of ${e.toString}")
 }
   }
 }
+// After deleting the data, invalidate the table to make sure we don't 
keep around a stale
+// file relation in the metastore cache.
+sparkSession.sessionState.invalidateTable(tableName.unquotedString)
--- End diff --

Seems we also need to drop the cached data in the InMemoryColumnarStore?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15538][SPARK-15539][SQL] Truncate table...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13315#discussion_r64793761
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -289,37 +289,53 @@ case class TruncateTableCommand(
 val catalog = sparkSession.sessionState.catalog
 if (!catalog.tableExists(tableName)) {
   throw new AnalysisException(s"Table '$tableName' in TRUNCATE TABLE 
does not exist.")
-} else if (catalog.isTemporaryTable(tableName)) {
+}
+if (catalog.isTemporaryTable(tableName)) {
   throw new AnalysisException(
 s"Operation not allowed: TRUNCATE TABLE on temporary tables: 
'$tableName'")
+}
+val table = catalog.getTableMetadata(tableName)
+if (table.tableType == CatalogTableType.EXTERNAL) {
+  throw new AnalysisException(
+s"Operation not allowed: TRUNCATE TABLE on external tables: 
'$tableName'")
+}
+if (table.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"Operation not allowed: TRUNCATE TABLE on views: '$tableName'")
+}
+if (DDLUtils.isDatasourceTable(table) && partitionSpec.isDefined) {
+  throw new AnalysisException(
+s"Operation not allowed: TRUNCATE TABLE ... PARTITION is not 
supported " +
+s"for tables created using the data sources API: '$tableName'")
+}
+val locations = if (partitionSpec.isDefined) {
+  catalog.listPartitions(tableName, 
partitionSpec).map(_.storage.locationUri)
 } else {
-  val locations = if (partitionSpec.isDefined) {
-catalog.listPartitions(tableName, 
partitionSpec).map(_.storage.locationUri)
+  if (table.partitionColumnNames.nonEmpty) {
+catalog.listPartitions(tableName).map(_.storage.locationUri)
--- End diff --

This is for hive table, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15538][SPARK-15539][SQL] Truncate table...

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13315#issuecomment-221949749
  
{{sql("truncate table emp16") }} should delete all partitions, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9876][SQL]: Update Parquet to 1.8.1.

2016-05-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13280#issuecomment-221935404
  
@rdblue Is there any perf evaluation of this new version that we can refer 
to ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-26 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13310#discussion_r64783746
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -70,6 +70,27 @@ object SQLConf {
   .intConf
   .createWithDefault(10)
 
+  val ALLOW_MULTIPLE_CONTEXTS = 
SQLConfigBuilder("spark.sql.allowMultipleContexts")
+.doc("When set to true, creating multiple SQLContexts/HiveContexts is 
allowed. " +
+  "When set to false, only one SQLContext/HiveContext is allowed to be 
created " +
+  "through the constructor (new SQLContexts/HiveContexts created 
through newSession " +
+  "method is allowed). Please note that this conf needs to be set in 
Spark Conf. Once " +
+  "a SQLContext/HiveContext has been created, changing the value of 
this conf will not " +
+  "have effect.")
+.booleanConf
+.createWithDefault(false)
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15534][SPARK-15535][SQL] Truncate table...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13302#discussion_r64688076
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -288,9 +288,10 @@ case class TruncateTableCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 if (!catalog.tableExists(tableName)) {
-  logError(s"table '$tableName' in TRUNCATE TABLE does not exist.")
+  throw new AnalysisException(s"Table '$tableName' in TRUNCATE TABLE 
does not exist.")
--- End diff --

Sorry Just realized that I had a typo `For drop table with IF 
EXISTS keyword` should be `For drop table without IF EXISTS keyword`.  Seems 
without IF EXISTS keyword, Hive (I tried 1.2.1) does not throw an exception if 
the table does not exist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13290#discussion_r64687394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1448,6 +1450,37 @@ class Analyzer(
   }
 
   /**
+   * Fixes nullability of Attributes in a resolved LogicalPlan by using 
the nullability of
+   * corresponding Attributes of its children output Attributes. This step 
is needed because
+   * users can use a resolved AttributeReference in the Dataset API and 
outer joins
+   * can change the nullability of an AttribtueReference. Without the fix, 
a nullable column's
+   * nullable field can be actually set as non-nullable, which cause 
illegal optimization
+   * (e.g., NULL propagation) and wrong answers.
+   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
+   */
+  object FixNullability extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+  case q: LogicalPlan if q.resolved =>
+val childrenOutput = q.children.flatMap(c => 
c.output).groupBy(_.exprId).flatMap {
+  case (exprId, attributes) =>
+// If there are multiple Attributes having the same ExpirId, 
we need to resolve
+// the conflict of nullable field.
+val nullable = attributes.map(_.nullable).reduce(_ || _)
--- End diff --

I feel it is not very possible. Let me think about it more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13290#discussion_r64686215
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1448,6 +1450,37 @@ class Analyzer(
   }
 
   /**
+   * Fixes nullability of Attributes in a resolved LogicalPlan by using 
the nullability of
+   * corresponding Attributes of its children output Attributes. This step 
is needed because
+   * users can use a resolved AttributeReference in the Dataset API and 
outer joins
+   * can change the nullability of an AttribtueReference. Without the fix, 
a nullable column's
+   * nullable field can be actually set as non-nullable, which cause 
illegal optimization
+   * (e.g., NULL propagation) and wrong answers.
+   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
+   */
+  object FixNullability extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+  case q: LogicalPlan if q.resolved =>
+val childrenOutput = q.children.flatMap(c => 
c.output).groupBy(_.exprId).flatMap {
--- End diff --

Yea. This version tries to fix the nullability for the entire query plan 
tree because I think it is possible to hit this problem in other cases when 
using Dataset API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15539][SQL] DROP TABLE throw exception ...

2016-05-25 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13307#issuecomment-221753653
  
Seems we should convert those disabled tests to our tests? Or we should put 
those disabled tests to HiveQuerySuite or HiveWindowFunctionQuerySuite? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13310#issuecomment-221753174
  
oh, some ml tests failed. Let me take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13310#discussion_r64675651
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -103,7 +103,11 @@ class SparkSession private(
* A wrapped version of this session in the form of a [[SQLContext]], 
for backward compatibility.
*/
   @transient
-  private[sql] val sqlContext: SQLContext = new SQLContext(this)
+  private[sql] val sqlContext: SQLContext = existingSharedState match {
+// If we have an existing SharedState, we should not create a root 
SQLContext at here.
+case Some(sharedState) => new SQLContext(this, false)
+case None => new SQLContext(this, true)
--- End diff --

I am removing isRootContext flag.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13310#discussion_r64675631
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/MultiSQLContextsSuite.scala ---
@@ -0,0 +1,96 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+
+package org.apache.spark.sql
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark._
+import org.apache.spark.sql.internal.SQLConf
+
+class MultiSQLContextsSuite extends SparkFunSuite with BeforeAndAfterAll {
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13310#discussion_r64675634
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -70,6 +70,27 @@ object SQLConf {
   .intConf
   .createWithDefault(10)
 
+  val ALLOW_MULTIPLE_CONTEXTS = 
SQLConfigBuilder("spark.sql.allowMultipleContexts")
+.doc("When set to true, creating multiple SQLContexts/HiveContexts is 
allowed. " +
+  "When set to false, only one SQLContext/HiveContext is allowed to be 
created " +
+  "through the constructor (new SQLContexts/HiveContexts created 
through newSession " +
+  "method is allowed). Please note that this conf needs to be set in 
Spark Conf. Once " +
+  "a SQLContext/HiveContext has been created, changing the value of 
this conf will not " +
+  "have effect.")
+.booleanConf
+.createWithDefault(false)
+
+  val ALLOW_MULTIPLE_CONTEXTS_ERROR_MESSAGE =
+SQLConfigBuilder("spark.sql.allowMultipleContexts.errorMessage")
+  .doc("The error message used by the exception when 
spark.sql.allowMultipleContexts is " +
+"set to false.")
+  .stringConf
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13310#discussion_r64671030
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -70,6 +70,27 @@ object SQLConf {
   .intConf
   .createWithDefault(10)
 
+  val ALLOW_MULTIPLE_CONTEXTS = 
SQLConfigBuilder("spark.sql.allowMultipleContexts")
+.doc("When set to true, creating multiple SQLContexts/HiveContexts is 
allowed. " +
+  "When set to false, only one SQLContext/HiveContext is allowed to be 
created " +
+  "through the constructor (new SQLContexts/HiveContexts created 
through newSession " +
+  "method is allowed). Please note that this conf needs to be set in 
Spark Conf. Once " +
+  "a SQLContext/HiveContext has been created, changing the value of 
this conf will not " +
+  "have effect.")
+.booleanConf
+.createWithDefault(false)
+
+  val ALLOW_MULTIPLE_CONTEXTS_ERROR_MESSAGE =
+SQLConfigBuilder("spark.sql.allowMultipleContexts.errorMessage")
--- End diff --

This name is not great. I am open to any suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...

2016-05-25 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/13310

[SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE_CONTEXTS back and make the 
error message configurable

## What changes were proposed in this pull request?
This PR adds `spark.sql.allowMultipleContexts` back.

## How was this patch tested?
MultiSQLContextsSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-15532

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13310.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13310


commit 313e59f0c74e2344a58b81b41162f8f854ed180b
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-25T23:32:03Z

[SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE_CONTEXTS back and make the 
error message configurable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15534][SPARK-15535][SQL] Truncate table...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13302#discussion_r64667194
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -288,9 +288,10 @@ case class TruncateTableCommand(
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
 if (!catalog.tableExists(tableName)) {
-  logError(s"table '$tableName' in TRUNCATE TABLE does not exist.")
+  throw new AnalysisException(s"Table '$tableName' in TRUNCATE TABLE 
does not exist.")
--- End diff --

For drop table with IF EXISTS keyword, it is unfortunate that Hive does not 
throw an exception when the table does not exist and changing it breaks lots of 
tests in HiveCompatibilitySuite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15534][SPARK-15535][SQL] Truncate table...

2016-05-25 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13302#issuecomment-221700058
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13484][SQL] Prevent illegal NULL propag...

2016-05-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11371#discussion_r64521117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1443,6 +1445,32 @@ class Analyzer(
   }
 
   /**
+   * Corrects attribute references in an expression tree of some operators 
(e.g., filters and
+   * projects) if these operators have a join as a child and the 
references point to columns on the
+   * input relation of the join. This is because some joins change the 
nullability of input columns
+   * and this could cause illegal optimization (e.g., NULL propagation) 
and wrong answers.
+   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
+   */
+  object SolveIllegalReferences extends Rule[LogicalPlan] {
+
+private def replaceReferences(e: Expression, attrMap: 
AttributeMap[Attribute]) = e.transform {
+  case a: AttributeReference => attrMap.get(a).getOrElse(a)
+}
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case q: LogicalPlan =>
+q.transform {
+  case f @ Filter(filterCondition, 
ExtractJoinOutputAttributes(join, joinOutputMap)) =>
+f.copy(condition = replaceReferences(filterCondition, 
joinOutputMap))
--- End diff --

https://github.com/apache/spark/pull/13290/files 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...

2016-05-25 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/13290

[SQL] Prevent illegal NULL propagation when filtering outer-join results

## What changes were proposed in this pull request?
This is another approach for addressing SPARK-13484 (the first approach is 
https://github.com/apache/spark/pull/11371). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-13484

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13290


commit 2dcecc8a934d6224d7680f5d25b7a98c53779d79
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-02-25T15:56:29Z

Avoid illegal NULL propagation

commit 1ebfa80e8d52d959ffc98d191019b9e4242dd4a6
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-02-26T07:48:11Z

Add comments

commit ab5d4f1174839abbd7f33cf587f75ae75cc76b68
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-02-27T15:16:54Z

Add a new rule to solve illegal references

commit 8850cb3046a302afd012e4508bf8bfbd9f59051d
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-03-02T06:19:59Z

Use foreach not map

commit b636b34bbe09f4152567b15b1fae30f76f7ffe55
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-04-15T02:17:01Z

Solve illegal references in Projects

commit dfd952870d0b157dc8571ddee053a9fe91d372ed
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-04-15T04:28:38Z

Add tests in DataFrameJoinSuite

commit 78ed4ef7a5d6c4565efd468cc48d6bfd1321f08b
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2016-04-15T08:44:34Z

Fix test codes in ResolveNaturalJoinSuite

commit 3cec4cce8126faae615990c56f829368f306b55e
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-25T06:16:30Z

Try to have a rule to fix nullability




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13484][SQL] Prevent illegal NULL propag...

2016-05-24 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/11371#discussion_r64506496
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1443,6 +1445,32 @@ class Analyzer(
   }
 
   /**
+   * Corrects attribute references in an expression tree of some operators 
(e.g., filters and
+   * projects) if these operators have a join as a child and the 
references point to columns on the
+   * input relation of the join. This is because some joins change the 
nullability of input columns
+   * and this could cause illegal optimization (e.g., NULL propagation) 
and wrong answers.
+   * See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
+   */
+  object SolveIllegalReferences extends Rule[LogicalPlan] {
+
+private def replaceReferences(e: Expression, attrMap: 
AttributeMap[Attribute]) = e.transform {
+  case a: AttributeReference => attrMap.get(a).getOrElse(a)
+}
+
+def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+  case q: LogicalPlan =>
+q.transform {
+  case f @ Filter(filterCondition, 
ExtractJoinOutputAttributes(join, joinOutputMap)) =>
+f.copy(condition = replaceReferences(filterCondition, 
joinOutputMap))
--- End diff --

How about we use a `q.transformUp` to fix the nullability in a bottom-up 
way? For every node, we create an `AttributeMap` using the output of its child. 
Then, we use `transformExpressions` to fix the nullability if necessary.  Let 
me try it out and ping you when I have a version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-24 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221324600
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-23 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r64325475
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for 
Spark-11624\";"))(
   "" -> "This is a test for Spark-11624")
   }
+
+  test("list jars") {
+val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
+runCliWithin(2.minute)(
+  s"ADD JAR $jarFile" -> "",
+  s"LIST JARS" -> "TestUDTF.jar",
+  s"List JAR $jarFile" -> "TestUDTF.jar"
+)
+  }
+
+  test("list files") {
+val dataFilePath = Thread.currentThread().getContextClassLoader
+  .getResource("data/files/small_kv.txt")
+runCliWithin(2.minute)(
+  s"ADD FILE $dataFilePath" -> "",
+  s"LIST FILES" -> "small_kv.txt",
+  s"LIST FILE $dataFilePath" -> "small_kv.txt"
+)
+  }
--- End diff --

Seems it is failing? Can you take a look? 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/1102/testReport/junit/org.apache.spark.sql.hive.thriftserver/CliSuite/list_files/
 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221133646
  
@rdblue Thanks. That is a good point :) I am closing this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread yhuai

Github user yhuai closed the pull request at:

https://github.com/apache/spark/pull/13236


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221080895
  
We support different versions of Hive metastore (for example, Spark depends 
on Hive 1.2.1 but we can talk to a metastore using Hive 0.12). Also, the Hadoop 
used by this client can be different from the Hadoop used by Spark. We have a 
IsolatedClientLoader for the class isolation purpose. So, it is possible that 
we have to reload Hadoop classes (those classes are from a different version of 
Hadoop than the Hadoop used by Spark). This is the reason that we need to have 
a option to allow us load Hadoop classes instead of always sharing Hadoop 
classes. Because of this, we cannot pass Configuration to HiveClientImpl. 

At here, Hadoop Configuration is only the container of all users' Hadoop 
settings. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221069266
  
I can create a Hadoop Configuration inside HiveClientImpl and use it to 
create the HiveConf. The main issue is that we cannot pass a Hadoop 
Configuration in a HiveClientImpl because there may be two versions of 
`Configuration` loaded by two classloaders. What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-23 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-221050299
  
@rdblue Thank you for looking at this!

The reason that I added the flag to disable sharing Hadoop classes is that 
hadoops used by Spark and metastore client may not be binary compatible (e.g. 
hadoop 2 used by spark and hadoop 1 used by the metastore client).

For HiveClientImpl, it is a wrapper of a Hive's metastore client. As long 
as we can propagate user set configurations to its internal HiveConf (to make 
it talk the metastore correctly), it is fine.

Regarding the classloader, we actually do not use the classloader 
originally associated with the Hadoop Configuration and we always explicitly 
set the classloader associated with the HiveConf created inside HiveClientImpl.

Regarding Configuration.addDefaultResource, before 2.0, we did not pass 
Hadoop Configuration to HiveClientImpl (it was called ClientWrapper in Spark 
1.6). Since we were not relying on Configuration.addDefaultResource, using a 
Map should not change anything.

btw, I am fine to change the default value of the flag to true (sharing 
hadoop classes) if you think it represents common use cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-22 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13236#discussion_r64157490
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -73,7 +74,7 @@ import org.apache.spark.util.{CircularBuffer, Utils}
 private[hive] class HiveClientImpl(
 override val version: HiveVersion,
 sparkConf: SparkConf,
-hadoopConf: Configuration,
+hadoopConf: Map[String, String],
--- End diff --

Also, the reason that I am adding a flag to not Hadoop classes sharing is 
that the versions used by Spark and the HiveClientImpl can be different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-22 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13236#discussion_r64157471
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -73,7 +74,7 @@ import org.apache.spark.util.{CircularBuffer, Utils}
 private[hive] class HiveClientImpl(
 override val version: HiveVersion,
 sparkConf: SparkConf,
-hadoopConf: Configuration,
+hadoopConf: Map[String, String],
--- End diff --

@rdblue FYI. I am changing the type from `Configuration` to `Map[String, 
String]` because it is possible that Hadoop classes are not shared between 
Spark side and the `IsolatedClientLoader` side. When Hadoop classes are not 
shared, if we pass a `Configuration` object to `HiveClientImpl`, the class of 
this `Configuration` object (this class is loaded by Spark's classloader) is 
different from the `Configuration` class loaded by `IsolatedClientLoader`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization to a new class.

2016-05-21 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 201a51f36 -> c18fa464f


[SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization 
to a new class.

## What changes were proposed in this pull request?
Refactoring: Separated ORC serialization logic from OrcOutputWriter and moved 
to a new class called OrcSerializer.

## How was this patch tested?
Manual tests & existing tests.

Author: Ergin Seyfe 

Closes #13066 from seyfe/orc_serializer.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c18fa464
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c18fa464
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c18fa464

Branch: refs/heads/master
Commit: c18fa464f404ed2612f8c4d355cb0544b355975b
Parents: 201a51f
Author: Ergin Seyfe 
Authored: Sat May 21 16:08:31 2016 -0700
Committer: Yin Huai 
Committed: Sat May 21 16:08:31 2016 -0700

--
 .../apache/spark/sql/hive/orc/OrcRelation.scala | 84 +++-
 1 file changed, 45 insertions(+), 39 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c18fa464/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
index 6e55137..38f50c1 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
@@ -149,39 +149,70 @@ private[sql] class DefaultSource
   }
 }
 
-private[orc] class OrcOutputWriter(
-path: String,
-bucketId: Option[Int],
-dataSchema: StructType,
-context: TaskAttemptContext)
-  extends OutputWriter with HiveInspectors {
+private[orc] class OrcSerializer(dataSchema: StructType, conf: Configuration)
+  extends HiveInspectors {
+
+  def serialize(row: InternalRow): Writable = {
+wrapOrcStruct(cachedOrcStruct, structOI, row)
+serializer.serialize(cachedOrcStruct, structOI)
+  }
 
-  private val serializer = {
+  private[this] val serializer = {
 val table = new Properties()
 table.setProperty("columns", dataSchema.fieldNames.mkString(","))
 table.setProperty("columns.types", 
dataSchema.map(_.dataType.catalogString).mkString(":"))
 
 val serde = new OrcSerde
-val configuration = context.getConfiguration
-serde.initialize(configuration, table)
+serde.initialize(conf, table)
 serde
   }
 
-  // Object inspector converted from the schema of the relation to be written.
-  private val structOI = {
+  // Object inspector converted from the schema of the relation to be 
serialized.
+  private[this] val structOI = {
 val typeInfo = 
TypeInfoUtils.getTypeInfoFromTypeString(dataSchema.catalogString)
 OrcStruct.createObjectInspector(typeInfo.asInstanceOf[StructTypeInfo])
   .asInstanceOf[SettableStructObjectInspector]
   }
 
+  private[this] val cachedOrcStruct = structOI.create().asInstanceOf[OrcStruct]
+
+  private[this] def wrapOrcStruct(
+  struct: OrcStruct,
+  oi: SettableStructObjectInspector,
+  row: InternalRow): Unit = {
+val fieldRefs = oi.getAllStructFieldRefs
+var i = 0
+while (i < fieldRefs.size) {
+
+  oi.setStructFieldData(
+struct,
+fieldRefs.get(i),
+wrap(
+  row.get(i, dataSchema(i).dataType),
+  fieldRefs.get(i).getFieldObjectInspector,
+  dataSchema(i).dataType))
+  i += 1
+}
+  }
+}
+
+private[orc] class OrcOutputWriter(
+path: String,
+bucketId: Option[Int],
+dataSchema: StructType,
+context: TaskAttemptContext)
+  extends OutputWriter {
+
+  private[this] val conf = context.getConfiguration
+
+  private[this] val serializer = new OrcSerializer(dataSchema, conf)
+
   // `OrcRecordWriter.close()` creates an empty file if no rows are written at 
all.  We use this
   // flag to decide whether `OrcRecordWriter.close()` needs to be called.
   private var recordWriterInstantiated = false
 
   private lazy val recordWriter: RecordWriter[NullWritable, Writable] = {
 recordWriterInstantiated = true
-
-val conf = context.getConfiguration
 val uniqueWriteJobId = conf.get("spark.sql.sources.writeJobUUID")
 val taskAttemptId = context.getTaskAttemptID
 val partition = taskAttemptId.getTaskID.getId
@@ -206,33 +237,8 @@ private[orc] class OrcOutputWriter(
   override def write(row: Row): Unit =
 throw new UnsupportedOperationException("call writeInternal")
 
-  private def wrapOrcStruct(
-  struct: OrcStruct,
-  oi: SettableStructObjectInspector,
-  row: InternalRow): Unit = {
-val fieldRefs =

spark git commit: [SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization to a new class.

2016-05-21 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 4148a9c2c -> 6871deb93


[SPARK-15280] Input/Output] Refactored OrcOutputWriter and moved serialization 
to a new class.

## What changes were proposed in this pull request?
Refactoring: Separated ORC serialization logic from OrcOutputWriter and moved 
to a new class called OrcSerializer.

## How was this patch tested?
Manual tests & existing tests.

Author: Ergin Seyfe 

Closes #13066 from seyfe/orc_serializer.

(cherry picked from commit c18fa464f404ed2612f8c4d355cb0544b355975b)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6871deb9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6871deb9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6871deb9

Branch: refs/heads/branch-2.0
Commit: 6871deb937fd6d6185b1d2a7a2ea36535ce303ea
Parents: 4148a9c
Author: Ergin Seyfe 
Authored: Sat May 21 16:08:31 2016 -0700
Committer: Yin Huai 
Committed: Sat May 21 16:08:51 2016 -0700

--
 .../apache/spark/sql/hive/orc/OrcRelation.scala | 84 +++-
 1 file changed, 45 insertions(+), 39 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6871deb9/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
index 6e55137..38f50c1 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
@@ -149,39 +149,70 @@ private[sql] class DefaultSource
   }
 }
 
-private[orc] class OrcOutputWriter(
-path: String,
-bucketId: Option[Int],
-dataSchema: StructType,
-context: TaskAttemptContext)
-  extends OutputWriter with HiveInspectors {
+private[orc] class OrcSerializer(dataSchema: StructType, conf: Configuration)
+  extends HiveInspectors {
+
+  def serialize(row: InternalRow): Writable = {
+wrapOrcStruct(cachedOrcStruct, structOI, row)
+serializer.serialize(cachedOrcStruct, structOI)
+  }
 
-  private val serializer = {
+  private[this] val serializer = {
 val table = new Properties()
 table.setProperty("columns", dataSchema.fieldNames.mkString(","))
 table.setProperty("columns.types", 
dataSchema.map(_.dataType.catalogString).mkString(":"))
 
 val serde = new OrcSerde
-val configuration = context.getConfiguration
-serde.initialize(configuration, table)
+serde.initialize(conf, table)
 serde
   }
 
-  // Object inspector converted from the schema of the relation to be written.
-  private val structOI = {
+  // Object inspector converted from the schema of the relation to be 
serialized.
+  private[this] val structOI = {
 val typeInfo = 
TypeInfoUtils.getTypeInfoFromTypeString(dataSchema.catalogString)
 OrcStruct.createObjectInspector(typeInfo.asInstanceOf[StructTypeInfo])
   .asInstanceOf[SettableStructObjectInspector]
   }
 
+  private[this] val cachedOrcStruct = structOI.create().asInstanceOf[OrcStruct]
+
+  private[this] def wrapOrcStruct(
+  struct: OrcStruct,
+  oi: SettableStructObjectInspector,
+  row: InternalRow): Unit = {
+val fieldRefs = oi.getAllStructFieldRefs
+var i = 0
+while (i < fieldRefs.size) {
+
+  oi.setStructFieldData(
+struct,
+fieldRefs.get(i),
+wrap(
+  row.get(i, dataSchema(i).dataType),
+  fieldRefs.get(i).getFieldObjectInspector,
+  dataSchema(i).dataType))
+  i += 1
+}
+  }
+}
+
+private[orc] class OrcOutputWriter(
+path: String,
+bucketId: Option[Int],
+dataSchema: StructType,
+context: TaskAttemptContext)
+  extends OutputWriter {
+
+  private[this] val conf = context.getConfiguration
+
+  private[this] val serializer = new OrcSerializer(dataSchema, conf)
+
   // `OrcRecordWriter.close()` creates an empty file if no rows are written at 
all.  We use this
   // flag to decide whether `OrcRecordWriter.close()` needs to be called.
   private var recordWriterInstantiated = false
 
   private lazy val recordWriter: RecordWriter[NullWritable, Writable] = {
 recordWriterInstantiated = true
-
-val conf = context.getConfiguration
 val uniqueWriteJobId = conf.get("spark.sql.sources.writeJobUUID")
 val taskAttemptId = context.getTaskAttemptID
 val partition = taskAttemptId.getTaskID.getId
@@ -206,33 +237,8 @@ private[orc] class OrcOutputWriter(
   override def write(row: Row): Unit =
 throw new UnsupportedOperationException("call writeInternal")
 
-  private def wrapOrcStruct(
-

[GitHub] spark pull request: [SPARK-15280] [Input/Output] Refactored OrcOut...

2016-05-21 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13066#issuecomment-220805144
  
Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15280] [Input/Output] Refactored OrcOut...

2016-05-21 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13066#issuecomment-220795260
  
LGTM. I will merge this to master and branch 2.0 once the description is 
updated. Also regarding "How was this patch tested?", we can say that manual 
tests and existing tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15280] [Input/Output] Refactored OrcOut...

2016-05-21 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13066#issuecomment-220795180
  
@seyfe Can you also update the description (since we are doing a 
refactoring instead of adding a new public class)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15280] [Input/Output] Refactored OrcOut...

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13066#issuecomment-220750857
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13236#issuecomment-220750748
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13236#discussion_r64123606
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -129,6 +129,14 @@ private[spark] object HiveUtils extends Logging {
 .toSequence
 .createWithDefault(Nil)
 
+  val HIVE_METASTORE_SHARE_HADOOPCLASSES =
+SQLConfigBuilder("spark.sql.hive.metastore.shareHadoopClasses")
+  .doc("When set to true, the classloader used to load Hive metastore 
client will " +
+"not explicitly load Hadoop classes. Instead, Hadoop classes 
loaded by the classloader " +
+"that loads Spark will be used.")
+  .booleanConf
+  .createWithDefault(false)
--- End diff --

@marmbrus I am setting this conf to false by default, which changes the 
behavior (our master always share hadoop clsses).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15455] For IsolatedClientLoader, we nee...

2016-05-20 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/13236

[SPARK-15455] For IsolatedClientLoader, we need to provide a conf to 
disable sharing Hadoop classes

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark notsharehadoop

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13236.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13236


commit 06168a65816e7a0b01f031c58b0cffbf7306114d
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-20T22:12:06Z

Add a conf to control if we want to share Hadoop classes for 
IsolatedClientLoader.

commit 4639c8583f8ff7b45c0b4ceb48df2aefa0288691
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-20T22:54:13Z

Do not pass hadoop Configuration to HiveClientImpl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15279][SQL] Catch conflicting SerDe whe...

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13068#issuecomment-220728799
  
Thanks! LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15438] [SQL] improve explain of whole s...

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13204#issuecomment-220706346
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14130] [SQL] Throw exceptions for ALTER...

2016-05-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12714#discussion_r64079773
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -179,6 +173,11 @@ unsupportedHiveNativeCommands
 | kw1=ALTER kw2=TABLE tableIdentifier kw3=TOUCH
 | kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=COMPACT
 | kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=CONCATENATE
+| kw1=START kw2=TRANSACTION
+| kw1=COMMIT
+| kw1=ROLLBACK
+| kw1=DFS
--- End diff --

Can you explain the reason that we also need to change CLI driver? I guess 
I am missing some context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15280] [Input/Output] Refactored OrcOut...

2016-05-20 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13066#discussion_r64079348
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
@@ -167,39 +166,69 @@ private[sql] class DefaultSource
   }
 }
 
-private[orc] class OrcOutputWriter(
-path: String,
-bucketId: Option[Int],
-dataSchema: StructType,
-context: TaskAttemptContext)
-  extends OutputWriter with HiveInspectors {
+class OrcSerializer(dataSchema: StructType, conf: Configuration) extends 
HiveInspectors {
--- End diff --

Can we add `private[orc]` modifier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsExc...

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12772#issuecomment-220525633
  
At here, `A\tB\tC\t\t` represents 5 fields and `A\tB\tC\t` represents 4 
fields, right?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsExc...

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12772#issuecomment-220525398
  
can you add a regression test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: explain of whole stage codegen

2016-05-20 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13204#issuecomment-220525308
  
I like 3. We can tell what operators are in a single WholeStageCodeGen 
operator and we can also know what are input operators of a WholeStageCodeGen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14998][SQL]fix ArrayIndexOutOfBoundsExc...

2016-05-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12772#issuecomment-220499145
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15424][SQL] Revert SPARK-14807 Create a...

2016-05-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13207#issuecomment-220498914
  
tes this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14261][SQL] Memory leak in Spark Thrift...

2016-05-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12932#issuecomment-220496773
  
@dosoft Thank you for the PR. What is the memory footprint of a leaked 
driver?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14261][SQL] Memory leak in Spark Thrift...

2016-05-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12932#issuecomment-220496577
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15421][SQL] Validate DDL property value...

2016-05-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13205#issuecomment-220476667
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-19 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-220411852
  
All Spark confs are not mutable. For SQL, those metastore related confs are 
also not mutable considering the long-running metastore client that we have. I 
do think it is reasonable to explicitly mark `spark.sql.warehouse.dir` as not 
mutable. We can make it clear in our doc (for example, have a list of immutable 
confs of SQL in Spark SQL programming guide). If we want to throw exceptions, 
it will be good to consider both Spark conf and immutable SQL confs together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10061#discussion_r63822163
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +100,7 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
+  case _ => parse(mapper.writeValueAsString(event))
--- End diff --

> Events are a public API, and they should be carefully crafted, since 
changing them affects user applications (including event logs). If there is 
unnecessary information in the event, then it's a bug in the event definition, 
not here.

Yea. I totally agree. However, my concern is that having this line at here 
will make the developer harder to spot issues during the development. Since the 
serialization works automatically, we are not making a self-review on what will 
be serialized and what methods will be called during serialization a mandatory 
step, which makes the auditing work much harder. Although it introduces more 
work to the developer to make every event explicitly handled, when we review 
the pull request, we can clearly know what will be serialized and how a event 
is serialized when a pull request is submitted. What do you think?

btw, if I am missing any context, please let me know :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63820108
  
--- Diff: 
sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
 ---
@@ -58,4 +58,16 @@ class HiveContext private[hive](
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
 
+  /**
+   * Invalidate and refresh all the cached the metadata of the given 
table. For performance reasons,
+   * Spark SQL or the external data source library it uses might cache 
certain metadata about a
+   * table, such as the location of blocks. When those change outside of 
Spark SQL, users should
+   * call this function to invalidate the cache.
+   *
+   * @since 1.3.0
+   */
+  def refreshTable(tableName: String): Unit = {
--- End diff --

This class is for the compatibility purpose. Let's leave it as is. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15337] [SPARK-15338] [SQL] Enable Run-t...

2016-05-18 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13128#issuecomment-220203789
  
@gatorsmile For 2.0, all confs should go into spark conf or sql conf. What 
is the reason to change this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15320] [SQL] Spark-SQL Cli Ignores Para...

2016-05-18 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13111#issuecomment-220203561
  
@gatorsmile For 2.0, we will use `spark.sql.warehouse.dir` instead of using 
`hive.metastore.warehouse.dir` (the support of this has been dropped).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-15192][SQL] null check for SparkSession.createDataFrame

2016-05-18 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 36acf8856 -> f5784459e


[SPARK-15192][SQL] null check for SparkSession.createDataFrame

## What changes were proposed in this pull request?

This PR adds null check in `SparkSession.createDataFrame`, so that we can make 
sure the passed in rows matches the given schema.

## How was this patch tested?

new tests in `DatasetSuite`

Author: Wenchen Fan 

Closes #13008 from cloud-fan/row-encoder.

(cherry picked from commit ebfe3a1f2c77e6869c3c36ba67afb7fabe6a94f5)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f5784459
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f5784459
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f5784459

Branch: refs/heads/branch-2.0
Commit: f5784459e84e84d6641725bf0d0fb31db742456c
Parents: 36acf88
Author: Wenchen Fan 
Authored: Wed May 18 18:06:38 2016 -0700
Committer: Yin Huai 
Committed: Wed May 18 18:06:52 2016 -0700

--
 .../scala/org/apache/spark/mllib/fpm/FPGrowth.scala|  2 +-
 .../apache/spark/sql/catalyst/ScalaReflection.scala|  4 ++--
 .../spark/sql/catalyst/encoders/RowEncoder.scala   | 10 +++---
 .../sql/catalyst/expressions/BoundAttribute.scala  |  2 +-
 .../sql/catalyst/expressions/objects/objects.scala |  4 +++-
 .../main/scala/org/apache/spark/sql/SparkSession.scala |  4 ++--
 .../scala/org/apache/spark/sql/api/r/SQLUtils.scala|  5 -
 .../test/scala/org/apache/spark/sql/DatasetSuite.scala | 13 +++--
 .../scala/org/apache/spark/sql/test/SQLTestUtils.scala |  6 +-
 9 files changed, 28 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f5784459/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
index 9166faa..28e4966 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
@@ -116,7 +116,7 @@ object FPGrowthModel extends Loader[FPGrowthModel[_]] {
 StructField("freq", LongType))
   val schema = StructType(fields)
   val rowDataRDD = model.freqItemsets.map { x =>
-Row(x.items, x.freq)
+Row(x.items.toSeq, x.freq)
   }
   sqlContext.createDataFrame(rowDataRDD, 
schema).write.parquet(Loader.dataPath(path))
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f5784459/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index cb9a62d..c0fa220 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -113,8 +113,8 @@ object ScalaReflection extends ScalaReflection {
* Returns true if the value of this data type is same between internal and 
external.
*/
   def isNativeType(dt: DataType): Boolean = dt match {
-case BooleanType | ByteType | ShortType | IntegerType | LongType |
- FloatType | DoubleType | BinaryType => true
+case NullType | BooleanType | ByteType | ShortType | IntegerType | 
LongType |
+ FloatType | DoubleType | BinaryType | CalendarIntervalType => true
 case _ => false
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f5784459/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
index a5f39aa..71b39c5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
@@ -70,8 +70,7 @@ object RowEncoder {
   private def serializerFor(
   inputObject: Expression,
   inputType: DataType): Expression = inputType match {
-case NullType | BooleanType | ByteType | ShortType | IntegerType | 
LongType |
- FloatType | DoubleType | BinaryType | CalendarIntervalType => 
inputObject
+case dt if ScalaReflection.isNativeType(dt) => inputObject
 
 case p: PythonUserDefinedType =>

spark git commit: [SPARK-15192][SQL] null check for SparkSession.createDataFrame

2016-05-18 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 32be51fba -> ebfe3a1f2


[SPARK-15192][SQL] null check for SparkSession.createDataFrame

## What changes were proposed in this pull request?

This PR adds null check in `SparkSession.createDataFrame`, so that we can make 
sure the passed in rows matches the given schema.

## How was this patch tested?

new tests in `DatasetSuite`

Author: Wenchen Fan 

Closes #13008 from cloud-fan/row-encoder.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ebfe3a1f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ebfe3a1f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ebfe3a1f

Branch: refs/heads/master
Commit: ebfe3a1f2c77e6869c3c36ba67afb7fabe6a94f5
Parents: 32be51f
Author: Wenchen Fan 
Authored: Wed May 18 18:06:38 2016 -0700
Committer: Yin Huai 
Committed: Wed May 18 18:06:38 2016 -0700

--
 .../scala/org/apache/spark/mllib/fpm/FPGrowth.scala|  2 +-
 .../apache/spark/sql/catalyst/ScalaReflection.scala|  4 ++--
 .../spark/sql/catalyst/encoders/RowEncoder.scala   | 10 +++---
 .../sql/catalyst/expressions/BoundAttribute.scala  |  2 +-
 .../sql/catalyst/expressions/objects/objects.scala |  4 +++-
 .../main/scala/org/apache/spark/sql/SparkSession.scala |  4 ++--
 .../scala/org/apache/spark/sql/api/r/SQLUtils.scala|  5 -
 .../test/scala/org/apache/spark/sql/DatasetSuite.scala | 13 +++--
 .../scala/org/apache/spark/sql/test/SQLTestUtils.scala |  6 +-
 9 files changed, 28 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ebfe3a1f/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
index 9166faa..28e4966 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
@@ -116,7 +116,7 @@ object FPGrowthModel extends Loader[FPGrowthModel[_]] {
 StructField("freq", LongType))
   val schema = StructType(fields)
   val rowDataRDD = model.freqItemsets.map { x =>
-Row(x.items, x.freq)
+Row(x.items.toSeq, x.freq)
   }
   sqlContext.createDataFrame(rowDataRDD, 
schema).write.parquet(Loader.dataPath(path))
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/ebfe3a1f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index cb9a62d..c0fa220 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -113,8 +113,8 @@ object ScalaReflection extends ScalaReflection {
* Returns true if the value of this data type is same between internal and 
external.
*/
   def isNativeType(dt: DataType): Boolean = dt match {
-case BooleanType | ByteType | ShortType | IntegerType | LongType |
- FloatType | DoubleType | BinaryType => true
+case NullType | BooleanType | ByteType | ShortType | IntegerType | 
LongType |
+ FloatType | DoubleType | BinaryType | CalendarIntervalType => true
 case _ => false
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ebfe3a1f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
index a5f39aa..71b39c5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala
@@ -70,8 +70,7 @@ object RowEncoder {
   private def serializerFor(
   inputObject: Expression,
   inputType: DataType): Expression = inputType match {
-case NullType | BooleanType | ByteType | ShortType | IntegerType | 
LongType |
- FloatType | DoubleType | BinaryType | CalendarIntervalType => 
inputObject
+case dt if ScalaReflection.isNativeType(dt) => inputObject
 
 case p: PythonUserDefinedType => serializerFor(inputObject, p.sqlType)
 
@@ -151,7 +150,7 @@ object RowEncoder {
 case StructType(fields) =>
   val

[GitHub] spark pull request: [SPARK-15192][SQL] null check for SparkSession...

2016-05-18 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13008#issuecomment-220201533
  
Thanks! Merging to master and branch 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10061#discussion_r63808009
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +100,7 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
+  case _ => parse(mapper.writeValueAsString(event))
--- End diff --

It is very possible that we silently pull in unnecessary information. If we 
have new event types, we should handle those explicitly instead of relying on 
this line. I am proposing to revert this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10061#discussion_r63804220
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +100,7 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
+  case _ => parse(mapper.writeValueAsString(event))
--- End diff --

What is the reason to add this line? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13173#discussion_r63732226
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -633,16 +633,16 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
   }
 
   private def showCreateHiveTable(metadata: CatalogTable): String = {
-def reportUnsupportedError(): Unit = {
-  throw new UnsupportedOperationException(
+def reportUnsupportedError(features: Seq[String]): Unit = {
+  throw new AnalysisException(
 s"Failed to execute SHOW CREATE TABLE against table 
${metadata.identifier.quotedString}, " +
-  "because it contains table structure(s) (e.g. skewed columns) 
that Spark SQL doesn't " +
-  "support yet."
+  "which is created by Hive and uses the following feature(s) that 
are not yet supported " +
+  "by Spark SQL:\n" + features.map(" - " + _).mkString("\n")
--- End diff --

It sounds like that we are missing functionalities. Can we update the error 
message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63729506
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -622,7 +622,7 @@ class MetastoreDataSourcesSuite extends QueryTest with 
SQLTestUtils with TestHiv
 .mode(SaveMode.Append)
 .saveAsTable("arrayInParquet")
 
-  sessionState.refreshTable("arrayInParquet")
+  sparkSession.catalog.refreshTable("arrayInParquet")
--- End diff --

Actually, invalidateTable and refreshTable do have different meanings. The 
current implementation of `HiveMetastoreCatalog.refreshTable` is 
`HiveMetastoreCatalog.invalidateTable` (and then we retrieve the new metadata 
lazily). But, it does not mean that `refreshTable` and `invalidateTable` have 
the same semantic. If we should remove any of `invalidateTable` or 
`refreshTable` should be discussed in a different thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63727680
  
--- Diff: 
sql/hivecontext-compatibility/src/test/scala/org/apache/spark/sql/hive/HiveContextCompatibilitySuite.scala
 ---
@@ -99,4 +105,41 @@ class HiveContextCompatibilitySuite extends 
SparkFunSuite with BeforeAndAfterEac
 assert(databases3.toSeq == Seq("default"))
   }
 
+  test("check change after refresh") {
--- End diff --

Looks like `RefreshTable` command is actually doing more work. I think we 
need to make `RefreshTable` and `sparkSession.catalog.refreshTable` have the 
same behavior. Can you make that change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15192][SQL] null check for SparkSession...

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13008#discussion_r63726478
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala ---
@@ -116,7 +116,7 @@ object FPGrowthModel extends Loader[FPGrowthModel[_]] {
 StructField("freq", LongType))
   val schema = StructType(fields)
   val rowDataRDD = model.freqItemsets.map { x =>
-Row(x.items, x.freq)
+Row(x.items.toSeq, x.freq)
--- End diff --

Do we need to call `toSeq` at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-14346] Fix scala-2.10 build

2016-05-17 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 2dddec40d -> 1db37417c


[SPARK-14346] Fix scala-2.10 build

## What changes were proposed in this pull request?
Scala 2.10 build was broken by #13079. I am reverting the change of that line.

Author: Yin Huai <yh...@databricks.com>

Closes #13157 from yhuai/SPARK-14346-fix-scala2.10.

(cherry picked from commit 2a5db9c140b9d60a5ec91018be19bec7b80850ee)
Signed-off-by: Yin Huai <yh...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1db37417
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1db37417
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1db37417

Branch: refs/heads/branch-2.0
Commit: 1db37417c25429c0001c19d2f10f4a314fe4585c
Parents: 2dddec4
Author: Yin Huai <yh...@databricks.com>
Authored: Tue May 17 18:02:31 2016 -0700
Committer: Yin Huai <yh...@databricks.com>
Committed: Tue May 17 18:02:45 2016 -0700

--
 .../scala/org/apache/spark/sql/catalyst/catalog/interface.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1db37417/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index d4f5cbb..3fdd411 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -208,6 +208,6 @@ case class SimpleCatalogRelation(
   }
 
   require(
-metadata.identifier.database.contains(databaseName),
+metadata.identifier.database == Some(databaseName),
 "provided database does not match the one specified in the table 
definition")
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-14346] Fix scala-2.10 build

2016-05-17 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 25b315e6c -> 2a5db9c14


[SPARK-14346] Fix scala-2.10 build

## What changes were proposed in this pull request?
Scala 2.10 build was broken by #13079. I am reverting the change of that line.

Author: Yin Huai <yh...@databricks.com>

Closes #13157 from yhuai/SPARK-14346-fix-scala2.10.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2a5db9c1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2a5db9c1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2a5db9c1

Branch: refs/heads/master
Commit: 2a5db9c140b9d60a5ec91018be19bec7b80850ee
Parents: 25b315e
Author: Yin Huai <yh...@databricks.com>
Authored: Tue May 17 18:02:31 2016 -0700
Committer: Yin Huai <yh...@databricks.com>
Committed: Tue May 17 18:02:31 2016 -0700

--
 .../scala/org/apache/spark/sql/catalyst/catalog/interface.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2a5db9c1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index d4f5cbb..3fdd411 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -208,6 +208,6 @@ case class SimpleCatalogRelation(
   }
 
   require(
-metadata.identifier.database.contains(databaseName),
+metadata.identifier.database == Some(databaseName),
 "provided database does not match the one specified in the table 
definition")
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346] Fix scala-2.10 build

2016-05-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13157#issuecomment-219897800
  
I am merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346] Fix scala-2.10 build

2016-05-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13157#issuecomment-219885469
  
tested locally. The fix is good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346] Fix scala-2.10 build

2016-05-17 Thread yhuai

GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/13157

[SPARK-14346] Fix scala-2.10 build

## What changes were proposed in this pull request?
Scala 2.10 build was broken by 
https://github.com/apache/spark/pull/13079/files. I am reverting the change of 
that line. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark SPARK-14346-fix-scala2.10

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13157.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13157


commit 5b64ca3f71aaf796d94537bc08670893afb38c88
Author: Yin Huai <yh...@databricks.com>
Date:   2016-05-17T23:18:34Z

[SPARK-14346] Fix scala-2.10 build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Native SHOW CREATE TABLE fo...

2016-05-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13079#discussion_r63620803
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -200,6 +207,7 @@ case class SimpleCatalogRelation(
 }
   }
 
-  require(metadata.identifier.database == Some(databaseName),
+  require(
+metadata.identifier.database.contains(databaseName),
--- End diff --

`contains` is not in scala 2.10. Let me fixing the build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

spark git commit: [SPARK-14346][SQL] Native SHOW CREATE TABLE for Hive tables/views

2016-05-17 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 7b62b7c11 -> 2dddec40d


[SPARK-14346][SQL] Native SHOW CREATE TABLE for Hive tables/views

## What changes were proposed in this pull request?

This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for 
Hive tables and views. A new field `hasUnsupportedFeatures` is added to 
`CatalogTable` to indicate whether all table metadata retrieved from the 
concrete underlying external catalog (i.e. Hive metastore in this case) can be 
mapped to fields in `CatalogTable`. This flag is useful when the target Hive 
table contains structures that can't be handled by Spark SQL, e.g., skewed 
columns and storage handler, etc..

## How was this patch tested?

New test cases are added in `ShowCreateTableSuite` to do round-trip tests.

Author: Cheng Lian 

Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables.

(cherry picked from commit b674e67c22bf663334e537e35787c00533adbb04)
Signed-off-by: Yin Huai 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2dddec40
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2dddec40
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2dddec40

Branch: refs/heads/branch-2.0
Commit: 2dddec40d6562d1d16bb242bf7dc730431ee1e3e
Parents: 7b62b7c
Author: Cheng Lian 
Authored: Tue May 17 15:56:44 2016 -0700
Committer: Yin Huai 
Committed: Tue May 17 15:56:57 2016 -0700

--
 .../spark/sql/catalyst/catalog/interface.scala  |  12 +-
 .../spark/sql/execution/command/tables.scala| 184 +-
 .../spark/sql/hive/client/HiveClientImpl.scala  |  10 +-
 .../spark/sql/hive/ShowCreateTableSuite.scala   | 185 ++-
 4 files changed, 333 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2dddec40/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index d215655..d4f5cbb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -79,6 +79,12 @@ case class CatalogTablePartition(
  *
  * Note that Hive's metastore also tracks skewed columns. We should consider 
adding that in the
  * future once we have a better understanding of how we want to handle skewed 
columns.
+ *
+ * @param hasUnsupportedFeatures is used to indicate whether all table 
metadata entries retrieved
+ *from the concrete underlying external catalog (e.g. Hive metastore) 
are supported by
+ *Spark SQL. For example, if the underlying Hive table has skewed 
columns, this information
+ *can't be mapped to [[CatalogTable]] since Spark SQL doesn't handle 
skewed columns for now.
+ *In this case `hasUnsupportedFeatures` is set to true. By default, it 
is false.
  */
 case class CatalogTable(
 identifier: TableIdentifier,
@@ -95,7 +101,8 @@ case class CatalogTable(
 properties: Map[String, String] = Map.empty,
 viewOriginalText: Option[String] = None,
 viewText: Option[String] = None,
-comment: Option[String] = None) {
+comment: Option[String] = None,
+hasUnsupportedFeatures: Boolean = false) {
 
   // Verify that the provided columns are part of the schema
   private val colNames = schema.map(_.name).toSet
@@ -200,6 +207,7 @@ case class SimpleCatalogRelation(
 }
   }
 
-  require(metadata.identifier.database == Some(databaseName),
+  require(
+metadata.identifier.database.contains(databaseName),
 "provided database does not match the one specified in the table 
definition")
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/2dddec40/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index bb4f1ff..1fc02d1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -626,40 +626,149 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
 val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) {
   showCreateDataSourceTable(tableMetadata)
 } else {
-  throw new UnsupportedOperationException(
-

spark git commit: [SPARK-14346][SQL] Native SHOW CREATE TABLE for Hive tables/views

2016-05-17 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master 8e8bc9f95 -> b674e67c2


[SPARK-14346][SQL] Native SHOW CREATE TABLE for Hive tables/views

## What changes were proposed in this pull request?

This is a follow-up of #12781. It adds native `SHOW CREATE TABLE` support for 
Hive tables and views. A new field `hasUnsupportedFeatures` is added to 
`CatalogTable` to indicate whether all table metadata retrieved from the 
concrete underlying external catalog (i.e. Hive metastore in this case) can be 
mapped to fields in `CatalogTable`. This flag is useful when the target Hive 
table contains structures that can't be handled by Spark SQL, e.g., skewed 
columns and storage handler, etc..

## How was this patch tested?

New test cases are added in `ShowCreateTableSuite` to do round-trip tests.

Author: Cheng Lian 

Closes #13079 from liancheng/spark-14346-show-create-table-for-hive-tables.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b674e67c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b674e67c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b674e67c

Branch: refs/heads/master
Commit: b674e67c22bf663334e537e35787c00533adbb04
Parents: 8e8bc9f
Author: Cheng Lian 
Authored: Tue May 17 15:56:44 2016 -0700
Committer: Yin Huai 
Committed: Tue May 17 15:56:44 2016 -0700

--
 .../spark/sql/catalyst/catalog/interface.scala  |  12 +-
 .../spark/sql/execution/command/tables.scala| 184 +-
 .../spark/sql/hive/client/HiveClientImpl.scala  |  10 +-
 .../spark/sql/hive/ShowCreateTableSuite.scala   | 185 ++-
 4 files changed, 333 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b674e67c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
index d215655..d4f5cbb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
@@ -79,6 +79,12 @@ case class CatalogTablePartition(
  *
  * Note that Hive's metastore also tracks skewed columns. We should consider 
adding that in the
  * future once we have a better understanding of how we want to handle skewed 
columns.
+ *
+ * @param hasUnsupportedFeatures is used to indicate whether all table 
metadata entries retrieved
+ *from the concrete underlying external catalog (e.g. Hive metastore) 
are supported by
+ *Spark SQL. For example, if the underlying Hive table has skewed 
columns, this information
+ *can't be mapped to [[CatalogTable]] since Spark SQL doesn't handle 
skewed columns for now.
+ *In this case `hasUnsupportedFeatures` is set to true. By default, it 
is false.
  */
 case class CatalogTable(
 identifier: TableIdentifier,
@@ -95,7 +101,8 @@ case class CatalogTable(
 properties: Map[String, String] = Map.empty,
 viewOriginalText: Option[String] = None,
 viewText: Option[String] = None,
-comment: Option[String] = None) {
+comment: Option[String] = None,
+hasUnsupportedFeatures: Boolean = false) {
 
   // Verify that the provided columns are part of the schema
   private val colNames = schema.map(_.name).toSet
@@ -200,6 +207,7 @@ case class SimpleCatalogRelation(
 }
   }
 
-  require(metadata.identifier.database == Some(databaseName),
+  require(
+metadata.identifier.database.contains(databaseName),
 "provided database does not match the one specified in the table 
definition")
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/b674e67c/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index bb4f1ff..1fc02d1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -626,40 +626,149 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
 val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) {
   showCreateDataSourceTable(tableMetadata)
 } else {
-  throw new UnsupportedOperationException(
-"SHOW CREATE TABLE only supports Spark SQL data source tables.")
+  showCreateHiveTable(tableMetadata)
 }

[GitHub] spark pull request: [SPARK-14346][SQL] Native SHOW CREATE TABLE fo...

2016-05-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13079#issuecomment-219878464
  
LGTM. I am merging this to master and 2.0. Let's change the error message 
and exception class in a separate pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Native SHOW CREATE TABLE fo...

2016-05-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13079#discussion_r63618404
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,40 +626,142 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
 val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) {
   showCreateDataSourceTable(tableMetadata)
 } else {
-  throw new UnsupportedOperationException(
-"SHOW CREATE TABLE only supports Spark SQL data source tables.")
+  showCreateHiveTable(tableMetadata)
 }
 
 Seq(Row(stmt))
   }
 
+  private def showCreateHiveTable(metadata: CatalogTable): String = {
+def reportUnsupportedError(): Unit = {
+  throw new UnsupportedOperationException(
--- End diff --

Let's be consistent with other places and change this to AnalysisException. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15317][Core]Don't store accumulators fo...

2016-05-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/13153#issuecomment-219877343
  
Maybe also put the size comparison in the description?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63613203
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MultiDatabaseSuite.scala ---
@@ -202,7 +202,8 @@ class MultiDatabaseSuite extends QueryTest with 
SQLTestUtils with TestHiveSingle
 
 activateDatabase(db) {
   sql(
-s"""CREATE EXTERNAL TABLE t (id BIGINT)
+s"""
+   |CREATE EXTERNAL TABLE t (id BIGINT)
--- End diff --

Seems these formatting changes are not necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 9 10 11 12 13 14 15 16 17 18 >

1301 - 1400 of 5990 matches

Mail list logo