[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r218562133
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Thank you for pinging me, @maropu . 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-18 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r218358670
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Thanks for the check!. I think we don't always need to comply with the Hive 
behaivour and an understandable behaivour for users is the best.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-18 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r218356576
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

cc: @gatorsmile @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-18 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r218324420
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Thanks for your advise!
I look into this in these days. With currently implement, all behavior 
comply with Hive(Support type change/Work well in non binary format/Exception 
in binary format like orc and parquet). Is it ok to add a config for constraint 
this?

The work of adding logic to cast input data into changed type in catalog 
may need modifying 4 parts logic including vectorized reader and row reader in 
parquet and orc. If we don't agree the currently behavior, I'll keep following 
these.

Item | Behavior
 | -
Parquet Row Reader | ClassCastException in SpecificInternalRow.set${Type}
Parquet Vectorized Reader | SchemaColumnConvertNotSupportedException in 
VectorizedColumnReader.read${Type}Batch
Orc Row Reader | ClassCastException in OrcDeserializer.newWriter
Orc Vectorized Reader | NullPointerException in OrcColumnVector get value 
by type method


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-12 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r217226856
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

In my opinion, in this pr, we need an additional logic to cast input data 
into a changed type in catalog when reading


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-11 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216600156
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

```
Although the query above doesn't work well, why do users change column 
types?
```
As the scenario described above, user firstly use int but during some time 
found here we need a Long, he can rewrite the new data as Long and load data to 
new partitions. And if we not support the type change, user should do the table 
recreate job for this type change work.

Yep, if not the binary file, the query works OK.
```
Logging initialized using configuration in 
jar:file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive> CREATE TABLE t(a INT, b STRING, c INT);
OK
Time taken: 2.576 seconds
hive> INSERT INTO t VALUES (1, 'a', 3);;
Query ID = XuanYuan_20180911164348_32238a6c-b0a4-4cfd-aa3d-00a7628031cf
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2018-09-11 16:43:51,684 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local162423_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: 
file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/warehouse/t/.hive-staging_hive_2018-09-11_16-43-48_117_2262603440504094412-1/-ext-1
Loading data to table default.t
Table default.t stats: [numFiles=1, numRows=1, totalSize=6, rawDataSize=5]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 4.025 seconds
hive> select * from t;;
OK
1   a   3
Time taken: 0.164 seconds, Fetched: 1 row(s)
hive> ALTER TABLE t CHANGE a a STRING;
OK
Time taken: 0.177 seconds
hive> select * from t;
OK
1   a   3
Time taken: 0.12 seconds, Fetched: 1 row(s)
hive> quit;
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-09 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216175940
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

ah ok. Although the query above doesn't work well, why do users change 
column types?
Basically, the query should work? e.g., in postgresql
```
postgres=# create table t(a int, b varchar);
CREATE TABLE
postgres=# insert into t values(1, 'a');
INSERT 0 1
postgres=# alter table t alter column a TYPE varchar;
ALTER TABLE
postgres=# select * from t;
 a | b 
---+---
 1 | a
(1 row)

postgres=# \d t
Table "public.t"
 Column |   Type| Modifiers 
+---+---
 a  | character varying | 
 b  | character varying | 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-08 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216132779
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Its also can't work in Hive, I test with Hive 1.2.2 and Hadoop 2.7.3.
```
Logging initialized using configuration in 
jar:file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties
hive> CREATE TABLE t(a INT, b STRING, c INT) stored as parquet;
OK
Time taken: 1.604 seconds
hive> INSERT INTO t VALUES (1, 'a', 3);
Query ID = XuanYuan_20180908230549_3c8732ff-07e0-4a7a-95b4-260aed04a762
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2018-09-08 23:06:08,732 Stage-1 map = 0%,  reduce = 0%
2018-09-08 23:06:09,743 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local712899233_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: 
file:/Users/XuanYuan/Source/hive/apache-hive-1.2.2-bin/warehouse/t/.hive-staging_hive_2018-09-08_23-05-49_782_100109481692677607-1/-ext-1
Loading data to table default.t
Table default.t stats: [numFiles=1, numRows=1, totalSize=343, rawDataSize=3]
MapReduce Jobs Launched:
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 20.294 seconds
hive> select * from t;
OK
1   a   3
Time taken: 0.154 seconds, Fetched: 1 row(s)
hive> ALTER TABLE t CHANGE a a STRING;
OK
Time taken: 0.18 seconds
hive> select * from t;
OK
Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.UnsupportedOperationException: Cannot inspect 
org.apache.hadoop.io.IntWritable
Time taken: 0.116 seconds
hive>
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216128008
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

I just want to know if the qury above can work in Hive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-08 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216127904
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

So maybe we should also add a conf like 
`MetastoreConf.ConfVars.DISALLOW_INCOMPATIBLE_COL_TYPE_CHANGES` in hive to wrap 
this behavior? WDYT. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-08 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216127880
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Thanks for the checking, my mistake of not describe the intention to do 
this feature. We want support type change just want to add the ability of 
changing the metadata of column type. The scenario we meet is our user want a 
type change(like int is not enough, need a long type), they has done the type 
changing in their data file, but we should hack to change the metastore or 
create the whole table again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216123860
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Ah, ok. Thanks for the check.  btw, have you checked if this could work 
correctly?
```

sql("""CREATE TABLE t(a INT, b STRING, c INT) using parquet""")
sql("""INSERT INTO t VALUES (1, 'a', 3)""")
sql("""ALTER TABLE t CHANGE a a STRING""")
spark.table("t").show
org.apache.spark.sql.execution.QueryExecutionException: Parquet column 
cannot be converted in file 
file:///Users/maropu/Repositories/spark/spark-master/spark-warehouse/t/part-0-93ddfd05-690a-480c-8cc5-fd0981206fc3-c000.snappy.parquet.
 Column: [a], Expected: string, Found: INT32
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:192)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at org.apache.spark.s
...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-08 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216122599
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Thanks for your question, actually that's also what I'm consider during do 
the compatible check. Hive do this column type change work in 
[HiveAlterHandler](https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveAlterHandler.java#L175
) and the detailed compatible check is in 
[ColumnType](https://github.com/apache/hive/blob/3287a097e31063cc805ca55c2ca7defffe761b6f/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ColumnType.java#L206).
 You can see in the ColumnType checking work, it actually use the `canCast` 
semantic to judge compatible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-07 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r216116477
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Probably, we need to comply with the Hive behaivour. Is the current fix (by 
casting) the same with Hive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-07 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215860035
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

Thanks for advise, I should also check the type compatible, add in ef65c4d.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-07 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215859851
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1697,6 +1697,16 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is 
col1'")
 assert(getMetadata("col1").getString("key") == "value")
 assert(getMetadata("col1").getString("comment") == "this is col1")
+
+// Ensure that changing column type takes effect
+sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 STRING")
+val column = 
catalog.getTableMetadata(tableIdent).schema.fields.find(_.name == "col1")
+assert(column.get.dataType == StringType)
+
+// Ensure that changing partition column type throw exception
+intercept[AnalysisException] {
+  sql("ALTER TABLE dbx.tab1 CHANGE COLUMN a a STRING")
+}
--- End diff --

Thanks, done in ef65c4d. Also add check for type compatible check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-07 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215859764
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand(
 
 // Find the origin column from dataSchema by column name.
 val originColumn = findColumnByName(table.dataSchema, columnName, 
resolver)
-// Throw an AnalysisException if the column name/dataType is changed.
+// Throw an AnalysisException if the column name is changed.
 if (!columnEqual(originColumn, newColumn, resolver)) {
   throw new AnalysisException(
 "ALTER TABLE CHANGE COLUMN is not supported for changing column " +
   s"'${originColumn.name}' with type '${originColumn.dataType}' to 
" +
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
--- End diff --

After add the type check, maybe we also need the type message in error 
message.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-07 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215859681
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand(
 
 // Find the origin column from dataSchema by column name.
 val originColumn = findColumnByName(table.dataSchema, columnName, 
resolver)
-// Throw an AnalysisException if the column name/dataType is changed.
+// Throw an AnalysisException if the column name is changed.
 if (!columnEqual(originColumn, newColumn, resolver)) {
--- End diff --

Thanks, not enough yet, add type compatible check in ef65c4d.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-05 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215283130
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -350,16 +366,11 @@ case class AlterTableChangeColumnCommand(
 s"${schema.fieldNames.mkString("[`", "`, `", "`]")}"))
   }
 
-  // Add the comment to a column, if comment is empty, return the original 
column.
-  private def addComment(column: StructField, comment: Option[String]): 
StructField = {
-comment.map(column.withComment(_)).getOrElse(column)
-  }
-
--- End diff --

What happens if we need data conversion (e.g., from ing to double?) in 
binary formats (parquet and orc)? Also, What happens if we get incompatible 
type changes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-05 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215279164
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand(
 
 // Find the origin column from dataSchema by column name.
 val originColumn = findColumnByName(table.dataSchema, columnName, 
resolver)
-// Throw an AnalysisException if the column name/dataType is changed.
+// Throw an AnalysisException if the column name is changed.
 if (!columnEqual(originColumn, newColumn, resolver)) {
--- End diff --

Its ok to check names only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-05 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215279507
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand(
 
 // Find the origin column from dataSchema by column name.
 val originColumn = findColumnByName(table.dataSchema, columnName, 
resolver)
-// Throw an AnalysisException if the column name/dataType is changed.
+// Throw an AnalysisException if the column name is changed.
 if (!columnEqual(originColumn, newColumn, resolver)) {
   throw new AnalysisException(
 "ALTER TABLE CHANGE COLUMN is not supported for changing column " +
   s"'${originColumn.name}' with type '${originColumn.dataType}' to 
" +
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
--- End diff --

Can you update this error message?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-09-05 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r215280045
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1697,6 +1697,16 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is 
col1'")
 assert(getMetadata("col1").getString("key") == "value")
 assert(getMetadata("col1").getString("comment") == "this is col1")
+
+// Ensure that changing column type takes effect
+sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 STRING")
+val column = 
catalog.getTableMetadata(tableIdent).schema.fields.find(_.name == "col1")
+assert(column.get.dataType == StringType)
+
+// Ensure that changing partition column type throw exception
+intercept[AnalysisException] {
+  sql("ALTER TABLE dbx.tab1 CHANGE COLUMN a a STRING")
+}
--- End diff --

Please compare the error message.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2018-07-24 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r204805474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,18 +318,34 @@ case class AlterTableChangeColumnCommand(
 
 // Find the origin column from dataSchema by column name.
 val originColumn = findColumnByName(table.dataSchema, columnName, 
resolver)
-// Throw an AnalysisException if the column name/dataType is changed.
+// Throw an AnalysisException if the column name is changed.
 if (!columnEqual(originColumn, newColumn, resolver)) {
   throw new AnalysisException(
 "ALTER TABLE CHANGE COLUMN is not supported for changing column " +
   s"'${originColumn.name}' with type '${originColumn.dataType}' to 
" +
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
+val typeChanged = originColumn.dataType != newColumn.dataType
+val partitionColumnChanged = 
table.partitionColumnNames.contains(originColumn.name)
+
+// Throw an AnalysisException if the type of partition column is 
changed.
+if (typeChanged && partitionColumnChanged) {
--- End diff --

Just adding a check here when user changing the type of partition columns.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-24 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r152957444
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,26 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
-val newSchema = table.schema.fields.map { field =>
+val typeChanged = originColumn.dataType != newColumn.dataType
+val newDataSchema = table.dataSchema.fields.map { field =>
   if (field.name == originColumn.name) {
-// Create a new column from the origin column with the new comment.
-addComment(field, newColumn.getComment)
+// Add the comment to a column, if comment is empty, return the 
original column.
+val newField = 
newColumn.getComment.map(field.withComment(_)).getOrElse(field)
+if (typeChanged) {
+  newField.copy(dataType = newColumn.dataType)
+} else {
+  newField
+}
   } else {
 field
   }
 }
-val newTable = table.copy(schema = StructType(newSchema))
-catalog.alterTable(newTable)
+val newTable = table.copy(schema = StructType(newDataSchema ++ 
table.partitionSchema))
+if (typeChanged) {
+  catalog.alterTableDataSchema(tableName, StructType(newDataSchema))
--- End diff --

I add the checking logic in next commit and fix bug for changing comment of 
partition column.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-23 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r152753785
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,26 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
-val newSchema = table.schema.fields.map { field =>
+val typeChanged = originColumn.dataType != newColumn.dataType
+val newDataSchema = table.dataSchema.fields.map { field =>
   if (field.name == originColumn.name) {
-// Create a new column from the origin column with the new comment.
-addComment(field, newColumn.getComment)
+// Add the comment to a column, if comment is empty, return the 
original column.
+val newField = 
newColumn.getComment.map(field.withComment(_)).getOrElse(field)
+if (typeChanged) {
+  newField.copy(dataType = newColumn.dataType)
+} else {
+  newField
+}
   } else {
 field
   }
 }
-val newTable = table.copy(schema = StructType(newSchema))
-catalog.alterTable(newTable)
+val newTable = table.copy(schema = StructType(newDataSchema ++ 
table.partitionSchema))
+if (typeChanged) {
+  catalog.alterTableDataSchema(tableName, StructType(newDataSchema))
--- End diff --

[HIVE-3672](https://issues.apache.org/jira/browse/HIVE-3672) Hive support 
this by adding new command of `ALTER TABLE  PARTITION COLUMN 
( )`.
So here maybe I should throw an AnalysisException while user change the 
type of partition column?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r152480007
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,26 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
-val newSchema = table.schema.fields.map { field =>
+val typeChanged = originColumn.dataType != newColumn.dataType
+val newDataSchema = table.dataSchema.fields.map { field =>
   if (field.name == originColumn.name) {
-// Create a new column from the origin column with the new comment.
-addComment(field, newColumn.getComment)
+// Add the comment to a column, if comment is empty, return the 
original column.
+val newField = 
newColumn.getComment.map(field.withComment(_)).getOrElse(field)
+if (typeChanged) {
+  newField.copy(dataType = newColumn.dataType)
+} else {
+  newField
+}
   } else {
 field
   }
 }
-val newTable = table.copy(schema = StructType(newSchema))
-catalog.alterTable(newTable)
+val newTable = table.copy(schema = StructType(newDataSchema ++ 
table.partitionSchema))
+if (typeChanged) {
+  catalog.alterTableDataSchema(tableName, StructType(newDataSchema))
--- End diff --

What is the Hive's behavior if users change the column type of partition 
schema?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-20 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151990044
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
+val changeSchema = originColumn.dataType != newColumn.dataType
 val newSchema = table.schema.fields.map { field =>
   if (field.name == originColumn.name) {
-// Create a new column from the origin column with the new comment.
-addComment(field, newColumn.getComment)
+var newField = field
--- End diff --

More clear for getting rid of var, pls check next patch. If we implement 
rename or others meta change feature here, may still need some code rework.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151739773
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
+val changeSchema = originColumn.dataType != newColumn.dataType
--- End diff --

What do you think about renaming the val to `typeChanged`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151740225
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1459,6 +1459,11 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 // Ensure that change column will preserve other metadata fields.
 sql("ALTER TABLE dbx.tab1 CHANGE COLUMN col1 col1 INT COMMENT 'this is 
col1'")
 assert(getMetadata("col1").getString("key") == "value")
+
+// Ensure that change column type take effect
--- End diff --

s/change/changing + s/take/takes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19773: [SPARK-22546][SQL] Supporting for changing column...

2017-11-17 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19773#discussion_r151739604
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala ---
@@ -318,16 +318,28 @@ case class AlterTableChangeColumnCommand(
   s"'${newColumn.name}' with type '${newColumn.dataType}'")
 }
 
+val changeSchema = originColumn.dataType != newColumn.dataType
 val newSchema = table.schema.fields.map { field =>
   if (field.name == originColumn.name) {
-// Create a new column from the origin column with the new comment.
-addComment(field, newColumn.getComment)
+var newField = field
--- End diff --

I'd recommend getting rid of this `var` and re-writting the code as follows:

```
val newField = newColumn.getComment.map(...).getOrElse(field)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org