[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-31 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14363#discussion_r72910844
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -78,28 +78,6 @@ object CatalogStorageFormat {
 }
 
 /**
- * A column in a table.
- */
-case class CatalogColumn(
-name: String,
-// TODO: make this type-safe; this is left as a string due to issues 
in converting Hive
-// varchars to and from SparkSQL strings.
--- End diff --

I am wondering if it is related to the parser that converts hive's type 
string to spark's struct type? Can you try it out and see there is any issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14363#discussion_r72735867
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -721,16 +723,16 @@ private[hive] class HiveClientImpl(
 Utils.classForName(name)
   .asInstanceOf[Class[_ <: 
org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]]
 
-  private def toHiveColumn(c: CatalogColumn): FieldSchema = {
-new FieldSchema(c.name, c.dataType, c.comment.orNull)
+  private def toHiveColumn(c: StructField): FieldSchema = {
+new FieldSchema(c.name, c.dataType.catalogString, 
c.getComment().orNull)
   }
 
-  private def fromHiveColumn(hc: FieldSchema): CatalogColumn = {
-new CatalogColumn(
+  private def fromHiveColumn(hc: FieldSchema): StructField = {
+val f = StructField(
   name = hc.getName,
-  dataType = hc.getType,
-  nullable = true,
-  comment = Option(hc.getComment))
+  dataType = CatalystSqlParser.parseDataType(hc.getType),
--- End diff --

So the behaviour change is: previously if a hive table contains type string 
that we can't parse, we are still able to describe it, but throw an exception 
if we try to read it. After this PR, we will throw an exception when we try to 
read its table meta from hive meta store.

I think it's ok to break it, but need better error message. what do you 
think? cc @yhuai @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14363#discussion_r72735500
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -78,28 +78,6 @@ object CatalogStorageFormat {
 }
 
 /**
- * A column in a table.
- */
-case class CatalogColumn(
-name: String,
-// TODO: make this type-safe; this is left as a string due to issues 
in converting Hive
-// varchars to and from SparkSQL strings.
--- End diff --

I don't know either...  cc @rxin @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14363#discussion_r72695248
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -78,28 +78,6 @@ object CatalogStorageFormat {
 }
 
 /**
- * A column in a table.
- */
-case class CatalogColumn(
-name: String,
-// TODO: make this type-safe; this is left as a string due to issues 
in converting Hive
-// varchars to and from SparkSQL strings.
--- End diff --

Do you know what is the issue when we converting varchars to and from 
SparkSQL strings? Sorry, I am unable to find the answer. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14363#discussion_r72694426
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -721,16 +723,16 @@ private[hive] class HiveClientImpl(
 Utils.classForName(name)
   .asInstanceOf[Class[_ <: 
org.apache.hadoop.hive.ql.io.HiveOutputFormat[_, _]]]
 
-  private def toHiveColumn(c: CatalogColumn): FieldSchema = {
-new FieldSchema(c.name, c.dataType, c.comment.orNull)
+  private def toHiveColumn(c: StructField): FieldSchema = {
+new FieldSchema(c.name, c.dataType.catalogString, 
c.getComment().orNull)
   }
 
-  private def fromHiveColumn(hc: FieldSchema): CatalogColumn = {
-new CatalogColumn(
+  private def fromHiveColumn(hc: FieldSchema): StructField = {
+val f = StructField(
   name = hc.getName,
-  dataType = hc.getType,
-  nullable = true,
-  comment = Option(hc.getComment))
+  dataType = CatalystSqlParser.parseDataType(hc.getType),
--- End diff --

This is the change we have to make if we convert `CatalogColumn` to 
`StructField`. It sounds like `hc.getType` could return null? or Hive could 
return some data types we might not recognize. We could hit the exception from 
Parser, right?

That means, the caller of `fromHiveColumn` will also get the exception. 
`getTableOption` is the caller. I am just wondering if we do not want to see 
this kind of exception when doing `getTableOption`. Or maybe issue a nicer 
error message here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14363: [SPARK-16731][SQL] use StructType in CatalogTable...

2016-07-26 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/14363

[SPARK-16731][SQL] use StructType in CatalogTable and remove CatalogColumn

## What changes were proposed in this pull request?

`StructField` has very similar semantic with `CatalogColumn`, except that 
`CatalogColumn` use string to express data type. I think it's reasonable to use 
`StructType` as the `CatalogTable.schema` and remove `CatalogColumn`.


## How was this patch tested?

existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark column

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14363






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org