[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16787


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100681309
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -649,6 +648,16 @@ class VersionsSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton w
   }
 }
 
+test(s"$version: create table should success to test 
HiveClientImpl.toHiveTable compatible") {
+  withTable("t", "t1") {
+import spark.implicits._
+Seq("1").toDF("a").write.saveAsTable("t")
+checkAnswer(spark.table("t"), Row("1") :: Nil)
+
+spark.sql("create table t1 as select 2 as a")
--- End diff --

-> `CREATE TABLE t1 USING parquet AS SELECT 2 AS a`

Could you change the statement and then update the test case name to 
`s"$version: create managed data source tables"`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100681187
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -649,6 +648,16 @@ class VersionsSuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton w
   }
 }
 
+test(s"$version: create table should success to test 
HiveClientImpl.toHiveTable compatible") {
+  withTable("t", "t1") {
+import spark.implicits._
+Seq("1").toDF("a").write.saveAsTable("t")
+checkAnswer(spark.table("t"), Row("1") :: Nil)
+
+spark.sql("create table t1 as select 2 as a")
+checkAnswer(spark.table("t1"), Row(2) :: Nil)
--- End diff --

Table `t` and `t1` are managed tables. Could you also verify whether the 
location are expected?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100681160
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -570,7 +570,6 @@ class VersionsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton w
   }
 }
--- End diff --

The above test case verified the location. I think it should be good enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100345819
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -845,10 +855,10 @@ private[hive] class HiveClientImpl(
   hiveTable.setFields(schema.asJava)
 }
 hiveTable.setPartCols(partCols.asJava)
-hiveTable.setOwner(conf.getUser)
+conf.foreach(c => hiveTable.setOwner(c.getUser))
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
-table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
+table.storage.locationUri.foreach { loc => 
hiveTable.getTTable.getSd.setLocation(loc)}
--- End diff --

I will check it ,thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100343323
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -845,10 +855,10 @@ private[hive] class HiveClientImpl(
   hiveTable.setFields(schema.asJava)
 }
 hiveTable.setPartCols(partCols.asJava)
-hiveTable.setOwner(conf.getUser)
+conf.foreach(c => hiveTable.setOwner(c.getUser))
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
-table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
+table.storage.locationUri.foreach { loc => 
hiveTable.getTTable.getSd.setLocation(loc)}
--- End diff --

thanks! versionsuits already has some create table case, if setLocation 
failed, these cases will fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100329892
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -845,10 +855,10 @@ private[hive] class HiveClientImpl(
   hiveTable.setFields(schema.asJava)
 }
 hiveTable.setPartCols(partCols.asJava)
-hiveTable.setOwner(conf.getUser)
+conf.foreach(c => hiveTable.setOwner(c.getUser))
 hiveTable.setCreateTime((table.createTime / 1000).toInt)
 hiveTable.setLastAccessTime((table.lastAccessTime / 1000).toInt)
-table.storage.locationUri.foreach { loc => 
shim.setDataLocation(hiveTable, loc) }
+table.storage.locationUri.foreach { loc => 
hiveTable.getTTable.getSd.setLocation(loc)}
--- End diff --

We might need to add the extra test case or checks to 
[VersionsSuites](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala)
 for verifying the location is correctly set. 

I am not 100% sure it works for all the versions we support. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100321936
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -891,7 +905,7 @@ private[hive] class HiveClientImpl(
 new HivePartition(ht, tpart)
   }
 
-  private def fromHivePartition(hp: HivePartition): CatalogTablePartition 
= {
+  def fromHivePartition(hp: HivePartition): CatalogTablePartition = {
--- End diff --

To be consistent with the others, also add a function description for this 
too? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100321748
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -815,7 +809,21 @@ private[hive] class HiveClientImpl(
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
-  private def toHiveTable(table: CatalogTable): HiveTable = {
+  private def toInputFormat(name: String) =
+Utils.classForName(name).asInstanceOf[Class[_ <: 
org.apache.hadoop.mapred.InputFormat[_, _]]]
+
+  private def toOutputFormat(name: String) =
+Utils.classForName(name)
+  .asInstanceOf[Class[_ <: 
org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]]
+
+  /**
+   * Converts the native table metadata representation format CatalogTable 
to Hive's Table.
+   * the default value shimForHiveExecution is only used for hive 
execution, a Shim instance
+   * with a specific metastore version should be passed to this function 
to interact with metastore
--- End diff --

This description is out of dated. Need an update. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100319490
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
-// We start by constructing an API table as Hive performs several 
important transformations
-// internally when converting an API table to a QL table.
-val tTable = new org.apache.hadoop.hive.metastore.api.Table()
-tTable.setTableName(catalogTable.identifier.table)
-tTable.setDbName(catalogTable.database)
-
-val tableParameters = new java.util.HashMap[String, String]()
-tTable.setParameters(tableParameters)
-catalogTable.properties.foreach { case (k, v) => 
tableParameters.put(k, v) }
-
-tTable.setTableType(catalogTable.tableType match {
-  case CatalogTableType.EXTERNAL => 
HiveTableType.EXTERNAL_TABLE.toString
-  case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString
-  case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString
-})
-
-val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
--- End diff --

I have the same question, that is why I added [this test 
case](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala#L574-L650)
 to ensure it works well for all the versions. It sounds like everything works 
well.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100315562
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
--- End diff --

Based on my understanding, the default runtime Hive execution does not hit 
any issue without Shim. Thus, we did not use Shim for some reasons. However, it 
will be great if we can merge them, as what this PR is doing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100241493
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -776,20 +778,21 @@ private[hive] class HiveClientImpl(
 client.dropDatabase(db, true, false, true)
   }
   }
+}
 
+private[hive] object HiveClientImpl {
+  private lazy val shimForHiveExecution = IsolatedClientLoader.hiveVersion(
--- End diff --

let me remove it , thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100235734
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -815,7 +819,20 @@ private[hive] class HiveClientImpl(
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
-  private def toHiveTable(table: CatalogTable): HiveTable = {
+  private def toInputFormat(name: String) =
+Utils.classForName(name).asInstanceOf[Class[_ <: 
org.apache.hadoop.mapred.InputFormat[_, _]]]
+
+  private def toOutputFormat(name: String) =
+Utils.classForName(name)
+  .asInstanceOf[Class[_ <: 
org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]]
+
+  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table.
--- End diff --

style:
```
/**
 * doc
 */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100235680
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -776,20 +778,21 @@ private[hive] class HiveClientImpl(
 client.dropDatabase(db, true, false, true)
   }
   }
+}
 
+private[hive] object HiveClientImpl {
+  private lazy val shimForHiveExecution = IsolatedClientLoader.hiveVersion(
--- End diff --

is this still needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100070021
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
--- End diff --

this method has been deleted, and use HiveClientImpl.toHiveTable which use 
shim to set location. In HiveClientImpl, the hive version maybe not same with 
the default hive(1.2.1),  so  it use run time shim to setDataLocation. while 
here deleted HiveUtils.toHiveTable just for runtime hive execution not to 
interact with metastore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100059652
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
-// We start by constructing an API table as Hive performs several 
important transformations
-// internally when converting an API table to a QL table.
-val tTable = new org.apache.hadoop.hive.metastore.api.Table()
-tTable.setTableName(catalogTable.identifier.table)
-tTable.setDbName(catalogTable.database)
-
-val tableParameters = new java.util.HashMap[String, String]()
-tTable.setParameters(tableParameters)
-catalogTable.properties.foreach { case (k, v) => 
tableParameters.put(k, v) }
-
-tTable.setTableType(catalogTable.tableType match {
-  case CatalogTableType.EXTERNAL => 
HiveTableType.EXTERNAL_TABLE.toString
-  case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString
-  case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString
-})
-
-val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
--- End diff --

cc @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100059580
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
-// We start by constructing an API table as Hive performs several 
important transformations
-// internally when converting an API table to a QL table.
-val tTable = new org.apache.hadoop.hive.metastore.api.Table()
-tTable.setTableName(catalogTable.identifier.table)
-tTable.setDbName(catalogTable.database)
-
-val tableParameters = new java.util.HashMap[String, String]()
-tTable.setParameters(tableParameters)
-catalogTable.properties.foreach { case (k, v) => 
tableParameters.put(k, v) }
-
-tTable.setTableType(catalogTable.tableType match {
-  case CatalogTableType.EXTERNAL => 
HiveTableType.EXTERNAL_TABLE.toString
-  case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString
-  case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString
-})
-
-val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor()
--- End diff --

here we set the data location via `StorageDescriptor`, is this way safe for 
all hive versions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100058971
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -463,117 +459,6 @@ private[spark] object HiveUtils extends Logging {
 case (other, tpe) if primitiveTypes contains tpe => other.toString
   }
 
-  /** Converts the native StructField to Hive's FieldSchema. */
-  private def toHiveColumn(c: StructField): FieldSchema = {
-val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) {
-  c.metadata.getString(HiveUtils.hiveTypeString)
-} else {
-  c.dataType.catalogString
-}
-new FieldSchema(c.name, typeString, c.getComment.orNull)
-  }
-
-  /** Builds the native StructField from Hive's FieldSchema. */
-  private def fromHiveColumn(hc: FieldSchema): StructField = {
-val columnType = try {
-  CatalystSqlParser.parseDataType(hc.getType)
-} catch {
-  case e: ParseException =>
-throw new SparkException("Cannot recognize hive type string: " + 
hc.getType, e)
-}
-
-val metadata = new 
MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build()
-val field = StructField(
-  name = hc.getName,
-  dataType = columnType,
-  nullable = true,
-  metadata = metadata)
-Option(hc.getComment).map(field.withComment).getOrElse(field)
-  }
-
-  // TODO: merge this with HiveClientImpl#toHiveTable
-  /** Converts the native table metadata representation format 
CatalogTable to Hive's Table. */
-  def toHiveTable(catalogTable: CatalogTable): HiveTable = {
--- End diff --

why this method doesn't need a `shim`? I took a look at 
`HiveClientImpl.toHiveTabe`, it only use `shim` to set data location.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r100057282
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -583,11 +468,12 @@ private[spark] object HiveUtils extends Logging {
 if (DDLUtils.isDatasourceTable(table) || table.schema.nonEmpty) {
   table
 } else {
-  val hiveTable = toHiveTable(table)
+  val hiveTable = HiveClientImpl.toHiveTable(table)
   // Note: Hive separates partition columns and the schema, but for us 
the
   // partition columns are part of the schema
-  val partCols = hiveTable.getPartCols.asScala.map(fromHiveColumn)
-  val schema = 
StructType(hiveTable.getCols.asScala.map(fromHiveColumn) ++ partCols)
+  val partCols = 
hiveTable.getPartCols.asScala.map(HiveClientImpl.fromHiveColumn)
+  val schema = 
StructType(hiveTable.getCols.asScala.map(HiveClientImpl.fromHiveColumn)
+++ partCols)
   table.copy(schema = schema)
--- End diff --

nit:
```
val partCols = ...
val dataCols = ...
table.copy(schema = StructType(dataCols ++ partCols))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99470385
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -365,8 +365,8 @@ private[hive] class HiveClientImpl(
 Option(client.getTable(dbName, tableName, false)).map { h =>
   // Note: Hive separates partition columns and the schema, but for us 
the
   // partition columns are part of the schema
-  val partCols = h.getPartCols.asScala.map(fromHiveColumn)
-  val schema = StructType(h.getCols.asScala.map(fromHiveColumn) ++ 
partCols)
+  val partCols = 
h.getPartCols.asScala.map(HiveClientImpl.fromHiveColumn)
--- End diff --

Add ```import HiveClientImpl._``` at the beginning of `class 
HiveClientImpl`. Then, you can avoid adding `HiveClientImpl` in the class 
`HiveClientImpl `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-04 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99462370
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala ---
@@ -231,18 +180,17 @@ private[hive] case class MetastoreRelation(
   val partitionKeys = catalogTable.partitionSchema.map(_.toAttribute)
 
   /** Non-partitionKey attributes */
-  // TODO: just make this hold the schema itself, not just non-partition 
columns
-  val attributes = catalogTable.schema
+  val dataColKeys = catalogTable.schema
--- End diff --

here has an another change @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99462090
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -815,7 +829,8 @@ private[hive] class HiveClientImpl(
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
-  private def toHiveTable(table: CatalogTable): HiveTable = {
+  def toHiveTable(table: CatalogTable, conf: Option[HiveConf] = None, 
shim: Shim = shimDefault)
--- End diff --

ok! thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99462031
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -815,7 +829,8 @@ private[hive] class HiveClientImpl(
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
-  private def toHiveTable(table: CatalogTable): HiveTable = {
+  def toHiveTable(table: CatalogTable, conf: Option[HiveConf] = None, 
shim: Shim = shimDefault)
+: HiveTable = {
--- End diff --

Nit: Style issue:

```Scala
  def toHiveTable(
  table: CatalogTable,
  conf: Option[HiveConf] = None,
  shim: Shim = shimForHiveExecution): HiveTable = {
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461979
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -815,7 +829,8 @@ private[hive] class HiveClientImpl(
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
-  private def toHiveTable(table: CatalogTable): HiveTable = {
+  def toHiveTable(table: CatalogTable, conf: Option[HiveConf] = None, 
shim: Shim = shimDefault)
--- End diff --

How about writing a comment to emphasize this shim is for Hive execution?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461967
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -776,7 +778,18 @@ private[hive] class HiveClientImpl(
 client.dropDatabase(db, true, false, true)
   }
   }
+}
 
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
--- End diff --

How about renaming it to `shimForHiveExecution`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461612
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

yes, I think we should keep it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461447
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

We still need `Object HiveClientImpl` for containing these utility 
functions, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461260
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

yes, we just don't new a HiveConf  in Object HiveClientImpl


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461201
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

We still need to pass conf in `class HiveClientImpl` for interacting with 
metastore, because we need to set it.

I think we do not need it for Hive execution, unless we hit any error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99461116
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

It's a good idea that we should not new a HiveConf here! thanks!
what about the Object HiveClientImpl?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460986
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

Something like this?

```Scala
def toHiveTable(table: CatalogTable, conf: Option[HiveConf] = None, shim: 
Shim = shimDefault)
...
conf.foreach(c => hiveTable.setOwner(c.getUser))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460850
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

if we merge `toHiveTable` in `HiveClientImpl` and `MetaStoreRelation`,  we 
can't seperate `hiveTable.setOwner(conf.getUser)`from `toHiveTable` function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460822
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

thanks a lot! 
But if we don't create an Object, we should create a HiveClientImpl 
instance(created by HiveUtils) to use its helper function, and HiveUtils will 
call `IsolatedClientLoader` , I think it is not properly to do this to use the 
helper funtions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460549
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

My major concern is the default value `hiveConf` is not right. We might 
abuse it in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460541
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

We do not need to do the following line, if `hiveConf` is not set: 

`hiveTable.setOwner(conf.getUser)` 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460528
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

Actually, we do not need it for execution, because `user` is not used


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460316
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
--- End diff --

these helper functions in Object `HiveClientImpl` are not used to interact 
with HiveMetaStore, they just used for run time execution , which Spark use 
built in `hive-exec-1.2.1.jar`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460223
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
+HiveUtils.hiveExecutionVersion) match {
+  case hive.v12 => new Shim_v0_12()
+  case hive.v13 => new Shim_v0_13()
+  case hive.v14 => new Shim_v0_14()
+  case hive.v1_0 => new Shim_v1_0()
+  case hive.v1_1 => new Shim_v1_1()
+  case hive.v1_2 => new Shim_v1_2()
+  }
+
+  private lazy val hiveConf = new HiveConf(classOf[SessionState])
--- End diff --

I think the same here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460081
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -778,6 +780,22 @@ private[hive] class HiveClientImpl(
   }
 
 
+
+}
+
+private[hive] object HiveClientImpl {
+  private lazy val shimDefault = IsolatedClientLoader.hiveVersion(
--- End diff --

I think we should not have this. The caller need to pass the version. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16787: [SPARK-19448][SQL]optimize some duplication funct...

2017-02-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16787#discussion_r99460070
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -365,8 +365,8 @@ private[hive] class HiveClientImpl(
 Option(client.getTable(dbName, tableName, false)).map { h =>
   // Note: Hive separates partition columns and the schema, but for us 
the
   // partition columns are part of the schema
-  val partCols = h.getPartCols.asScala.map(fromHiveColumn)
-  val schema = StructType(h.getCols.asScala.map(fromHiveColumn) ++ 
partCols)
+  val partCols = 
h.getPartCols.asScala.map(HiveClientImpl.fromHiveColumn)
+  val schema = 
StructType(h.getCols.asScala.map(HiveClientImpl.fromHiveColumn) ++ partCols)
--- End diff --

This will use `HiveUtils.hiveExecutionVersion`. However, it should use the 
version `spark.sql.hive.metastore.version`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org