[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2017-02-02 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r99075673
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +825,43 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will create the 
partition location with
+// a default path generate by the new spec with lower cased partition 
column names. This is
+// unexpected and we need to rename them manually and alter the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tablePath = new Path(tableMeta.location)
+  val newParts = newSpecs.map { spec =>
+val partition = client.getPartition(db, table, 
lowerCasePartitionSpec(spec))
+val wrongPath = new Path(partition.location)
+val rightPath = ExternalCatalogUtils.generatePartitionPath(
+  spec, partitionColumnNames, tablePath)
+try {
+  tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
--- End diff --

good catch! let me think about how to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2017-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r98327471
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +825,43 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will create the 
partition location with
+// a default path generate by the new spec with lower cased partition 
column names. This is
+// unexpected and we need to rename them manually and alter the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tablePath = new Path(tableMeta.location)
+  val newParts = newSpecs.map { spec =>
+val partition = client.getPartition(db, table, 
lowerCasePartitionSpec(spec))
+val wrongPath = new Path(partition.location)
+val rightPath = ExternalCatalogUtils.generatePartitionPath(
+  spec, partitionColumnNames, tablePath)
+try {
+  tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath)
--- End diff --

Found an issue here... When we call `rename`, not all the file systems have 
the same behaviors. For example, on mac OS, when we doing this 
`.../tbl/partcol1=5/partcol2=6` -> `.../tbl/partCol1=5/partCol2=6` . The result 
is `.../tbl/partcol1=5/partCol2=6`. However, the file system used in Jenkin 
does not have such an issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15797


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r87356512
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -231,7 +231,7 @@ class InMemoryCatalog(
 assert(tableMeta.storage.locationUri.isDefined,
   "Managed table should always have table location, as we will 
assign a default location " +
 "to it if it doesn't have one.")
-val dir = new Path(tableMeta.storage.locationUri.get)
+val dir = new Path(tableMeta.location)
 try {
   val fs = dir.getFileSystem(hadoopConfig)
   fs.delete(dir, true)
--- End diff --

it's minor and not related to this PR, I'd like to not touch it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-09 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r87336535
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +825,44 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will create the 
partition location with
+// a default path generate by the new spec with lower cased partition 
column names. This is
+// unexpected and we need to rename them manually and alter the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tablePath = new Path(tableMeta.location)
+  val fs = tablePath.getFileSystem(hadoopConf)
--- End diff --

[In the other 
functions](https://github.com/cloud-fan/spark/blob/56f3073162f0b53e4a3de686bba31c8eb6048123/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala#L367-L371),
 the function call `getFileSystem` is also part of the try-catch block. Do we 
need to make it consistent with others?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r87141007
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +825,44 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will create the 
partition location with
+// a default path generate by the new spec with lower cased partition 
column names. This is
+// unexpected and we need to rename them manually and alter the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tablePath = new Path(tableMeta.location)
+  val fs = tablePath.getFileSystem(hadoopConf)
+  val newParts = newSpecs.map { spec =>
+val partition = client.getPartition(db, table, 
lowerCasePartitionSpec(spec))
+val wrongPath = new Path(partition.storage.locationUri.get)
+val rightPath = ExternalCatalogUtils.generatePartitionPath(
+  spec, partitionColumnNames, tablePath)
+try {
+  fs.rename(wrongPath, rightPath)
+} catch {
+  case e: IOException =>
+throw new SparkException(s"Unable to rename partition path 
$wrongPath", e)
--- End diff --

Maybe we also need to add `rightPath` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86946976
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -591,25 +673,25 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
 val catalog = newBasicCatalog()
 val db = catalog.getDatabase("db1")
 val table = CatalogTable(
-  identifier = TableIdentifier("my_table", Some("db1")),
--- End diff --

Actually this test was written by me, and I picked `myTable` at the first 
time, but changed to `my_table` because of this bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86936905
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -171,6 +171,10 @@ case class CatalogTable(
 throw new AnalysisException(s"table $identifier did not specify 
database")
   }
 
+  def location: String = storage.locationUri.getOrElse {
--- End diff --

Like the other functions in this file, add the function descriptions too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86936775
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala
 ---
@@ -591,25 +673,25 @@ abstract class ExternalCatalogSuite extends 
SparkFunSuite with BeforeAndAfterEac
 val catalog = newBasicCatalog()
 val db = catalog.getDatabase("db1")
 val table = CatalogTable(
-  identifier = TableIdentifier("my_table", Some("db1")),
--- End diff --

Could we keep the original names unchanged? Now, the table/database names 
allow `alphanumeric characters and underscores`. Thus, I think they are named 
using underscore on purpose?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86935972
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -451,51 +444,6 @@ object PartitioningUtils {
 }
   }
 
-  
//
-  // The following string escaping code is mainly copied from Hive 
(o.a.h.h.common.FileUtils).
-  
//
-
-  val charToEscape = {
-val bitSet = new java.util.BitSet(128)
-
-/**
- * ASCII 01-1F are HTTP control characters that need to be escaped.
- * \u000A and \u000D are \n and \r, respectively.
- */
-val clist = Array(
-  '\u0001', '\u0002', '\u0003', '\u0004', '\u0005', '\u0006', 
'\u0007', '\u0008', '\u0009',
-  '\n', '\u000B', '\u000C', '\r', '\u000E', '\u000F', '\u0010', 
'\u0011', '\u0012', '\u0013',
-  '\u0014', '\u0015', '\u0016', '\u0017', '\u0018', '\u0019', 
'\u001A', '\u001B', '\u001C',
-  '\u001D', '\u001E', '\u001F', '"', '#', '%', '\'', '*', '/', ':', 
'=', '?', '\\', '\u007F',
-  '{', '[', ']', '^')
-
-clist.foreach(bitSet.set(_))
-
-if (Shell.WINDOWS) {
-  Array(' ', '<', '>', '|').foreach(bitSet.set(_))
-}
-
-bitSet
-  }
-
-  def needsEscaping(c: Char): Boolean = {
-c >= 0 && c < charToEscape.size() && charToEscape.get(c)
-  }
-
-  def escapePathName(path: String): String = {
-val builder = new StringBuilder()
-path.foreach { c =>
-  if (needsEscaping(c)) {
-builder.append('%')
-builder.append(f"${c.asInstanceOf[Int]}%02X")
-  } else {
-builder.append(c)
-  }
-}
-
-builder.toString()
-  }
-
   def unescapePathName(path: String): String = {
--- End diff --

If `escapePathName` is moved, I think we also need to move 
`unescapePathName` to the same file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86908704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.util.Shell
+
+import 
org.apache.spark.sql.catalyst.catalog.CatalogTypes.TablePartitionSpec
+
+object ExternalCatalogUtils {
--- End diff --

@ericl `PartitioningUtils` is not accessible in catalyst module, so I 
created this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86906444
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +840,37 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will update the 
partition location with
+// a default value generate by the new spec and lower cased partition 
column names. This is
+// unexpected and we need to call `alterPartitions` to fix the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tableLocation = tableMeta.storage.locationUri.get
+  val newParts = newSpecs.map { spec =>
+val partition = client.getPartition(db, table, 
lowerCasePartitionSpec(spec))
+val partitionPath = new Path(
+  tableLocation,
+  partitionColumnNames.map(col => col + "=" + 
spec(col)).mkString("/"))
--- End diff --

RENAME PARTITION can only change the partition spec, not partition path. 
The SQL syntax is :`ALTER TABLE table_name PARTITION partition_spec RENAME TO 
PARTITION partition_spec;`, users can't specify partition location


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86906239
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +840,37 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will update the 
partition location with
+// a default value generate by the new spec and lower cased partition 
column names. This is
+// unexpected and we need to call `alterPartitions` to fix the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
--- End diff --

no, Hive only update the partition location when renaming partitions for 
managed table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86866556
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -355,25 +355,30 @@ class InMemoryCatalog(
   }
 }
 
-val tableDir = new Path(catalog(db).db.locationUri, table)
-val partitionColumnNames = getTable(db, table).partitionColumnNames
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+val tableLocation = new Path(tableMeta.storage.locationUri.get)
 // TODO: we should follow hive to roll back if one partition path 
failed to create.
 parts.foreach { p =>
-  // If location is set, the partition is using an external partition 
location and we don't
-  // need to handle its directory.
-  if (p.storage.locationUri.isEmpty) {
-val partitionPath = partitionColumnNames.flatMap { col =>
-  p.spec.get(col).map(col + "=" + _)
-}.mkString("/")
-try {
-  val fs = tableDir.getFileSystem(hadoopConfig)
-  fs.mkdirs(new Path(tableDir, partitionPath))
-} catch {
-  case e: IOException =>
-throw new SparkException(s"Unable to create partition path 
$partitionPath", e)
+  val partitionLocation = p.storage.locationUri.getOrElse {
+val partitionPath = partitionColumnNames.map(col => col + "=" + 
p.spec(col)).mkString("/")
--- End diff --

Do we need to escape these, as in `PartitioningUtils.escapePathName`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86872012
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +840,37 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will update the 
partition location with
+// a default value generate by the new spec and lower cased partition 
column names. This is
+// unexpected and we need to call `alterPartitions` to fix the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tableLocation = tableMeta.storage.locationUri.get
+  val newParts = newSpecs.map { spec =>
+val partition = client.getPartition(db, table, 
lowerCasePartitionSpec(spec))
+val partitionPath = new Path(
+  tableLocation,
+  partitionColumnNames.map(col => col + "=" + 
spec(col)).mkString("/"))
--- End diff --

What if it had a custom location?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86868525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -423,33 +424,35 @@ class InMemoryCatalog(
 requirePartitionsExist(db, table, specs)
 requirePartitionsNotExist(db, table, newSpecs)
 
-val tableDir = new Path(catalog(db).db.locationUri, table)
-val partitionColumnNames = getTable(db, table).partitionColumnNames
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+val tableLocation = new Path(tableMeta.storage.locationUri.get)
+val shouldUpdatePartitionLocation = getTable(db, table).tableType == 
CatalogTableType.MANAGED
+val existingParts = catalog(db).tables(table).partitions
 // TODO: we should follow hive to roll back if one partition path 
failed to rename.
 specs.zip(newSpecs).foreach { case (oldSpec, newSpec) =>
-  val newPart = getPartition(db, table, oldSpec).copy(spec = newSpec)
-  val existingParts = catalog(db).tables(table).partitions
-
-  // If location is set, the partition is using an external partition 
location and we don't
-  // need to handle its directory.
-  if (newPart.storage.locationUri.isEmpty) {
-val oldPath = partitionColumnNames.flatMap { col =>
-  oldSpec.get(col).map(col + "=" + _)
-}.mkString("/")
-val newPath = partitionColumnNames.flatMap { col =>
-  newSpec.get(col).map(col + "=" + _)
-}.mkString("/")
+  val oldPartition = getPartition(db, table, oldSpec)
+  val newPartition = if (shouldUpdatePartitionLocation) {
+val oldPartPath = new Path(oldPartition.storage.locationUri.get)
+val newPartPath = new Path(
+  tableLocation,
+  partitionColumnNames.map(col => col + "=" + 
newSpec(col)).mkString("/"))
--- End diff --

Should we pull this into PartitioningUtils?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86871542
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +840,37 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will update the 
partition location with
+// a default value generate by the new spec and lower cased partition 
column names. This is
+// unexpected and we need to call `alterPartitions` to fix the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
--- End diff --

Would this apply to external tables as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r86806933
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -226,106 +208,139 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   } else {
 tableDefinition.storage.locationUri
   }
-  // Ideally we should also put `locationUri` in table properties like 
provider, schema, etc.
-  // However, in older version of Spark we already store table 
location in storage properties
-  // with key "path". Here we keep this behaviour for backward 
compatibility.
-  val storagePropsWithLocation = tableDefinition.storage.properties ++
-tableLocation.map("path" -> _)
-
-  // converts the table metadata to Spark SQL specific format, i.e. 
set data schema, names and
-  // bucket specification to empty. Note that partition columns are 
retained, so that we can
-  // call partition-related Hive API later.
-  def newSparkSQLSpecificMetastoreTable(): CatalogTable = {
-tableDefinition.copy(
-  // Hive only allows directory paths as location URIs while Spark 
SQL data source tables
-  // also allow file paths. For non-hive-compatible format, we 
should not set location URI
-  // to avoid hive metastore to throw exception.
-  storage = tableDefinition.storage.copy(
-locationUri = None,
-properties = storagePropsWithLocation),
-  schema = tableDefinition.partitionSchema,
-  bucketSpec = None,
-  properties = tableDefinition.properties ++ tableProperties)
+
+  if (tableDefinition.provider.get == DDLUtils.HIVE_PROVIDER) {
+val tableWithDataSourceProps = tableDefinition.copy(
+  // We can't leave `locationUri` empty and count on Hive 
metastore to set a default table
+  // location, because Hive metastore is not case preserving and 
the table name is always
+  // lower cased when appear in the default table path, which is 
not expected.
+  storage = tableDefinition.storage.copy(locationUri = 
tableLocation),
+  // Here we follow data source tables and put table metadata like 
provider, schema, etc. in
+  // table properties, so that we can work around the Hive 
metastore issue about not case
+  // preserving and make Hive serde table support mixed-case 
column names.
+  properties = tableDefinition.properties ++ 
tableMetaToTableProps(tableDefinition))
+client.createTable(tableWithDataSourceProps, ignoreIfExists)
+  } else {
+createDataSourceTable(
+  tableDefinition.withNewStorage(locationUri = tableLocation),
+  ignoreIfExists)
   }
+}
+  }
 
-  // converts the table metadata to Hive compatible format, i.e. set 
the serde information.
-  def newHiveCompatibleMetastoreTable(serde: HiveSerDe): CatalogTable 
= {
-val location = if (tableDefinition.tableType == EXTERNAL) {
-  // When we hit this branch, we are saving an external data 
source table with hive
-  // compatible format, which means the data source is file-based 
and must have a `path`.
-  require(tableDefinition.storage.locationUri.isDefined,
-"External file-based data source table must have a `path` 
entry in storage properties.")
-  Some(new 
Path(tableDefinition.storage.locationUri.get).toUri.toString)
-} else {
-  None
-}
+  private def createDataSourceTable(
--- End diff --

This method just copies the code in the above if branch: 
https://github.com/apache/spark/pull/15797/files#diff-159191585e10542f013cb3a714f26075R199


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org