[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-12-15 Thread adamjk
Github user adamjk commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-164703283
  
Is this being backported to 1.5.x?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-22 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150337486
  
LGTM, so I'm going to merge this into master. Should this be backported to 
1.5.x or any earlier releases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8026


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149919454
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149919423
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149911447
  
Thank you @JoshRosen so much for the detail review, but seems bug exists, 
I'd like to solve it myself soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149922732
  
**[Test build #44066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44066/consoleFull)**
 for PR 8026 at commit 
[`bdee89e`](https://github.com/apache/spark/commit/bdee89ea394b0477103b88c971df971812e98f82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149959477
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149959310
  
**[Test build #44066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44066/consoleFull)**
 for PR 8026 at commit 
[`bdee89e`](https://github.com/apache/spark/commit/bdee89ea394b0477103b88c971df971812e98f82).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149959480
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44066/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150021131
  
@chenghao-intel, it looks like this most recent test failure is legitimate:

```
assertion failed: Actual partitioning column names did not match 
user-specified partitioning schema; expect 
StructType(StructField(part,IntegerType,true)), but got StructType()}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150068746
  
Yes, true, actually SPARK-7749 provides an example of Hive metastore 
backend empty partition table, then we will not detect any partition column 
values.

I simply removed the assertion in the code, as it's not valid in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150068875
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150068962
  
**[Test build #44113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44113/consoleFull)**
 for PR 8026 at commit 
[`3383473`](https://github.com/apache/spark/commit/3383473e8a56eba9f7c92106dca9e171f88e0534).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150068858
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-150091343
  
**[Test build #44113 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44113/consoleFull)**
 for PR 8026 at commit 
[`3383473`](https://github.com/apache/spark/commit/3383473e8a56eba9f7c92106dca9e171f88e0534).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42533868
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -134,17 +137,38 @@ private[sql] object PartitioningUtils {
 var finished = path.getParent == null
 var chopped = path
 
+var idx = 0 // the partition index from the right side of the path
 while (!finished) {
+  val folderName = chopped.getName
   // Sometimes (e.g., when speculative task is enabled), temporary 
directories may be left
   // uncleaned.  Here we simply ignore them.
-  if (chopped.getName.toLowerCase == "_temporary") {
+  if (folderName.toLowerCase == "_temporary") {
 return None
   }
 
-  val maybeColumn = parsePartitionColumn(chopped.getName, 
defaultPartitionName, typeInference)
-  maybeColumn.foreach(columns += _)
+  folderName.split("=") match {
+case Array(columnName, rawColumnValue) =>
+  val field = userDefinedPartitionColumns.map(struct => 
struct(struct.length - idx - 1))
+  assert(columnName.nonEmpty, s"Empty partition column name in 
'$folderName'")
+  assert(field.isEmpty || (field.get.name  == columnName))
+  assert(rawColumnValue.nonEmpty, s"Empty partition column value 
in '$folderName'")
+
+  val literal = inferPartitionColumnValue(
+field.map(_.dataType), rawColumnValue, defaultPartitionName, 
typeInference)
+  columns += (columnName -> literal)
+
+case Array(value) if folderName.startsWith("=") =>
+  throw new AssertionError(s"Empty partition column name in 
'$folderName'")
--- End diff --

Is AssertionError the right exception to be throwing here? I'd think that 
IllegalArgumentException might be more appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42509765
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -101,11 +118,13 @@ class ParquetPartitionDiscoverySuite extends 
QueryTest with ParquetTest with Sha
 
 checkThrows[AssertionError]("file://path/=10", "Empty partition column 
name")
 checkThrows[AssertionError]("file://path/a=", "Empty partition column 
value")
+checkThrows[AssertionError]("file://path/a=b=c", "Not a partition 
format in")
--- End diff --

You're right, it's not related for this PR, but a very trivial checking, 
with more informative message for the invalid partition in the path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149598558
  
@JoshRosen I've updated the unit test also by adding an `Append` operation, 
without this PR, it will throws exception as I described in the jira 
(https://issues.apache.org/jira/browse/SPARK-9735).

The root reason that the previous unit test can even passed, should be 
solved #8035, as it will always get the latest schema from the user specified 
without calling the `relation.refresh()`, however `relation.refresh()` will be 
called indirectly in `Append` mode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149628393
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42509891
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -236,15 +241,22 @@ private[sql] object PartitioningUtils {
   }
 
   /**
-   * Converts a string to a [[Literal]] with automatic type inference.  
Currently only supports
-   * [[IntegerType]], [[LongType]], [[DoubleType]], 
[[DecimalType.SYSTEM_DEFAULT]], and
-   * [[StringType]].
+   * Converts a string to a [[Literal]] with automatic type inference if 
no field type specified.
+   * Auto inference only supports [[IntegerType]], [[LongType]], 
[[DoubleType]],
+   * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]].
*/
   private[sql] def inferPartitionColumnValue(
+  expectedDT: Option[DataType],
--- End diff --

I agree that casting to a non string type and then converting back to a 
string may lose precision, but what about disabling inference when calling 
inferPartitionColumnValue if the user has provided a schema? In that case, it 
should end up just returning the string literals, which you can then cast 
without a loss of precision.

Sent from my phone

> On Oct 20, 2015, at 8:00 AM, Cheng Hao  wrote:
> 
> In 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala:
> 
> > */
> >private[sql] def inferPartitionColumnValue(
> > +  expectedDT: Option[DataType],
> We need to pass the expect the data type down and then get the associated 
literal-based partition column value; and @liancheng's suggestion kind of like 
get the literal (maybe string based) first, and then do casting outside, 
however, this probably lose some data precision during the re-casting.
> 
> For example:
> The path looks like, /part1=1.000, and with the auto inference, we will 
get a Double, and it will be cast to string as 1.0 if what user expect is 
StringType;
> 
> However, this is totally different if we get it as StringType directly, 
which supposed to be 1.000.
> 
> —
> Reply to this email directly or view it on GitHub.
> 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149628397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43984/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149627957
  
**[Test build #43984 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43984/consoleFull)**
 for PR 8026 at commit 
[`7f2da8c`](https://github.com/apache/spark/commit/7f2da8c4868ed0c3fcdc9ab7748b421a5ebc6f89).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149591364
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149591384
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149591577
  
**[Test build #43984 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43984/consoleFull)**
 for PR 8026 at commit 
[`7f2da8c`](https://github.com/apache/spark/commit/7f2da8c4868ed0c3fcdc9ab7748b421a5ebc6f89).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42506907
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -236,15 +241,22 @@ private[sql] object PartitioningUtils {
   }
 
   /**
-   * Converts a string to a [[Literal]] with automatic type inference.  
Currently only supports
-   * [[IntegerType]], [[LongType]], [[DoubleType]], 
[[DecimalType.SYSTEM_DEFAULT]], and
-   * [[StringType]].
+   * Converts a string to a [[Literal]] with automatic type inference if 
no field type specified.
+   * Auto inference only supports [[IntegerType]], [[LongType]], 
[[DoubleType]],
+   * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]].
*/
   private[sql] def inferPartitionColumnValue(
+  expectedDT: Option[DataType],
--- End diff --

We need to pass the expect the data type down and then get the associated 
literal-based partition column value; and @liancheng's suggestion kind of like 
get the literal (maybe string based) first, and then do casting outside, 
however, this probably lose some data precision during the re-casting.

For example:
The path looks like, /part1=1.000, and with the auto inference, we will get 
a Double, and it will be cast to string as `1.0` if what user expect is 
StringType;

However, this is totally different if we get it as StringType directly, 
which supposed to be `1.000`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42575628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -134,17 +137,38 @@ private[sql] object PartitioningUtils {
 var finished = path.getParent == null
 var chopped = path
 
+var idx = 0 // the partition index from the right side of the path
 while (!finished) {
+  val folderName = chopped.getName
   // Sometimes (e.g., when speculative task is enabled), temporary 
directories may be left
   // uncleaned.  Here we simply ignore them.
-  if (chopped.getName.toLowerCase == "_temporary") {
+  if (folderName.toLowerCase == "_temporary") {
 return None
   }
 
-  val maybeColumn = parsePartitionColumn(chopped.getName, 
defaultPartitionName, typeInference)
-  maybeColumn.foreach(columns += _)
+  folderName.split("=") match {
+case Array(columnName, rawColumnValue) =>
+  val field = userDefinedPartitionColumns.map(struct => 
struct(struct.length - idx - 1))
+  assert(columnName.nonEmpty, s"Empty partition column name in 
'$folderName'")
+  assert(field.isEmpty || (field.get.name  == columnName))
+  assert(rawColumnValue.nonEmpty, s"Empty partition column value 
in '$folderName'")
+
+  val literal = inferPartitionColumnValue(
+field.map(_.dataType), rawColumnValue, defaultPartitionName, 
typeInference)
+  columns += (columnName -> literal)
+
+case Array(value) if folderName.startsWith("=") =>
+  throw new AssertionError(s"Empty partition column name in 
'$folderName'")
--- End diff --

I'll agree we need to take the partition path validation into a separate 
PR, since we definitely can do more checking and also more pretty error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149776551
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149777383
  
**[Test build #44038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44038/consoleFull)**
 for PR 8026 at commit 
[`2cc93da`](https://github.com/apache/spark/commit/2cc93dac02b5a91f6c3dee0a0fdfb6a019c00921).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42575993
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -236,15 +241,22 @@ private[sql] object PartitioningUtils {
   }
 
   /**
-   * Converts a string to a [[Literal]] with automatic type inference.  
Currently only supports
-   * [[IntegerType]], [[LongType]], [[DoubleType]], 
[[DecimalType.SYSTEM_DEFAULT]], and
-   * [[StringType]].
+   * Converts a string to a [[Literal]] with automatic type inference if 
no field type specified.
+   * Auto inference only supports [[IntegerType]], [[LongType]], 
[[DoubleType]],
+   * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]].
*/
   private[sql] def inferPartitionColumnValue(
+  expectedDT: Option[DataType],
--- End diff --

Sounds good to me, I will update the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149776594
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149779455
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149779487
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149781507
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44038/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149781506
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149781486
  
**[Test build #44038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44038/consoleFull)**
 for PR 8026 at commit 
[`2cc93da`](https://github.com/apache/spark/commit/2cc93dac02b5a91f6c3dee0a0fdfb6a019c00921).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149781986
  
@JoshRosen I've updated the code, should be more straightforward and clean


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149783351
  
**[Test build #44040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44040/consoleFull)**
 for PR 8026 at commit 
[`9f08f76`](https://github.com/apache/spark/commit/9f08f761ff404464b4bdfc83352e4bcad139e36c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149783376
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149783377
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44040/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42583944
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
+: InternalRow = {
+  InternalRow((0 until row.numFields) map { i =>
+Cast(Literal.create(row.getString(i), StringType), 
schema.fields(i).dataType).eval()
+  }: _*)
+}
+
+assert(schema.length == spec.partitionColumns.length &&
+  schema.fieldNames.sameElements(spec.partitionColumns.fieldNames),
+  s"Auto infer partition column is not match with user specified, 
" +
--- End diff --

The wording of this error message might be slightly confusing to users 
since this branch is explicitly _disabling_ inference. I think that it might be 
slightly clearer to say something like "Actual partitioning column names did 
not match user-specified partitioning schema; expected ... but got ...", since 
as far as I know the inference is really only done for the types of the 
columns, not their names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42584058
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
+: InternalRow = {
+  InternalRow((0 until row.numFields) map { i =>
+Cast(Literal.create(row.getString(i), StringType), 
schema.fields(i).dataType).eval()
+  }: _*)
+}
+
+assert(schema.length == spec.partitionColumns.length &&
+  schema.fieldNames.sameElements(spec.partitionColumns.fieldNames),
+  s"Auto infer partition column is not match with user specified, 
" +
+s"expect $schema, but got ${spec.partitionColumns}}")
+
+PartitionSpec(schema, spec.partitions.map { part =>
+  part.copy(values = 
castPartitionValueWithGivenSchema(part.values, schema))
+})
+  case None =>
+val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
--- End diff --

You could just inline this call on line 574 and save one variable 
declaration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42584024
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
--- End diff --

Also, maybe rename this something like `castPartitionValuesToUserSchema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42584044
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
+: InternalRow = {
+  InternalRow((0 until row.numFields) map { i =>
+Cast(Literal.create(row.getString(i), StringType), 
schema.fields(i).dataType).eval()
+  }: _*)
+}
+
+assert(schema.length == spec.partitionColumns.length &&
+  schema.fieldNames.sameElements(spec.partitionColumns.fieldNames),
+  s"Auto infer partition column is not match with user specified, 
" +
+s"expect $schema, but got ${spec.partitionColumns}}")
+
+PartitionSpec(schema, spec.partitions.map { part =>
+  part.copy(values = 
castPartitionValueWithGivenSchema(part.values, schema))
+})
+  case None =>
--- End diff --

To be super-explicit, maybe put a ` // user did not provide a partitioning 
schema` comment at the end of this line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149781430
  
**[Test build #44040 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44040/consoleFull)**
 for PR 8026 at commit 
[`9f08f76`](https://github.com/apache/spark/commit/9f08f761ff404464b4bdfc83352e4bcad139e36c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42583431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
+: InternalRow = {
--- End diff --

In order to avoid the weird wrapping here, I think you might be able to 
just leave off the `: InternalRow` here, unless you somehow need it to appease 
MiMa.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42583415
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
--- End diff --

Super-minor nit: could you explicitly name the boolean parameter here at 
the call-site, e.g. `inferSchema = false`? This is one of IntelliJ's automatic 
style recommendations and I'm a fan of it because it makes the code a bit 
easier to read. I might also just change this myself on merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42583857
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
+: InternalRow = {
+  InternalRow((0 until row.numFields) map { i =>
--- End diff --

Nit: `.map` instead of using infix notation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42583983
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
+val spec = PartitioningUtils.parsePartitions(
+  leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false)
+
+// Without auto inference, all of value in the `row` should be 
null or in StringType,
+// we need to cast into the data type that user specified.
+def castPartitionValueWithGivenSchema(row: InternalRow, schema: 
StructType)
--- End diff --

Actually, do you need the `schema` field here, since it's always going to 
be the same? Maybe you could drop the `schema` parameter and retrieve it via 
the closure, which would simplify this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42583969
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -544,11 +544,35 @@ abstract class HadoopFsRelation 
private[sql](maybePartitionSpec: Option[Partitio
   }
 
   private def discoverPartitions(): PartitionSpec = {
-val typeInference = 
sqlContext.conf.partitionColumnTypeInferenceEnabled()
 // We use leaf dirs containing data files to discover the schema.
 val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq
-PartitioningUtils.parsePartitions(leafDirs, 
PartitioningUtils.DEFAULT_PARTITION_NAME,
-  typeInference)
+userDefinedPartitionColumns match {
+  case Some(schema) =>
--- End diff --

Maybe rename `schema` here to `userProvidedSchema` to be more explicit and 
avoid shadowing the `schema` variable defined in 
`castPartitionValueWithGivenSchema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149785524
  
@chenghao-intel, thanks a bunch for updating this; the current version of 
this patch is a lot easier to understand and I'm happy with how clean the code 
turned out. I left only minor style / clarity comments, which I don't mind 
addressing myself on merge if you're too busy. If you don't mind, though, one 
more round of quick updates to address my comments would be appreciated.

Anyhow, the technical changes here LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42584080
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -510,21 +510,39 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils with Tes
 }
   }
 
-  // HadoopFsRelation.discoverPartitions() called by refresh(), which will 
ignore
-  // the given partition data type.
-  ignore("Partition column type casting") {
+  test("Partition column type casting") {
--- End diff --

Do you mind adding a comment beneath this line which reads `// regression 
test for SPARK-9735` so that readers can quickly figure out what this is 
supposed to be testing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149390928
  
@chenghao-intel, just to clarify: I noticed that your final approach 
involved pushing an expected data type down into the method named 
`inferPartitionColumValue`. I'm curious why you chose this approach as opposed 
to re-using the code that @liancheng [pointed 
to](https://github.com/apache/spark/pull/8026#issuecomment-128643963) upthread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42451143
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -236,15 +241,22 @@ private[sql] object PartitioningUtils {
   }
 
   /**
-   * Converts a string to a [[Literal]] with automatic type inference.  
Currently only supports
-   * [[IntegerType]], [[LongType]], [[DoubleType]], 
[[DecimalType.SYSTEM_DEFAULT]], and
-   * [[StringType]].
+   * Converts a string to a [[Literal]] with automatic type inference if 
no field type specified.
+   * Auto inference only supports [[IntegerType]], [[LongType]], 
[[DoubleType]],
+   * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]].
*/
   private[sql] def inferPartitionColumnValue(
+  expectedDT: Option[DataType],
--- End diff --

In the master branch, if `typeInference == false`, it means the data type 
of partition key will be `StringType` by default, otherwise, it's probably will 
be `IntegerType`, `LongType` etc. depends on the real value the partition key 
is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42451397
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -447,7 +447,7 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 
   // HadoopFsRelation.discoverPartitions() called by refresh(), which will 
ignore
   // the given partition data type.
--- End diff --

Yes, true, this is not valid any more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42446094
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -447,7 +447,7 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 
   // HadoopFsRelation.discoverPartitions() called by refresh(), which will 
ignore
   // the given partition data type.
--- End diff --

+1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149406766
  
I just tried testing a build where I _only_ re-enabled the ignored test and 
changed nothing else. In this case, the test still passed. This makes me wonder 
whether the "Partition column type casting" is an adequate regression test for 
this issue.

Can you write a new test for this which fails without this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42451442
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -458,6 +458,8 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 .partitionBy("ps", "p2")
 .saveAsTable("t")
 
+  val a = input.collect()
--- End diff --

Oh, yeah, will remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42447317
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -458,6 +458,8 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 .partitionBy("ps", "p2")
 .saveAsTable("t")
 
+  val a = input.collect()
--- End diff --

Why these two lines? Leftovers from debugging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42447350
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -101,11 +118,13 @@ class ParquetPartitionDiscoverySuite extends 
QueryTest with ParquetTest with Sha
 
 checkThrows[AssertionError]("file://path/=10", "Empty partition column 
name")
 checkThrows[AssertionError]("file://path/a=", "Empty partition column 
value")
+checkThrows[AssertionError]("file://path/a=b=c", "Not a partition 
format in")
--- End diff --

Is this fixing a separate issue than the refresh issue? If so, can we do it 
separately?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42447228
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -236,15 +241,22 @@ private[sql] object PartitioningUtils {
   }
 
   /**
-   * Converts a string to a [[Literal]] with automatic type inference.  
Currently only supports
-   * [[IntegerType]], [[LongType]], [[DoubleType]], 
[[DecimalType.SYSTEM_DEFAULT]], and
-   * [[StringType]].
+   * Converts a string to a [[Literal]] with automatic type inference if 
no field type specified.
+   * Auto inference only supports [[IntegerType]], [[LongType]], 
[[DoubleType]],
+   * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]].
*/
   private[sql] def inferPartitionColumnValue(
+  expectedDT: Option[DataType],
--- End diff --

Per my other comment upthread, this is a bit confusing to me: this method 
is named infer, but has a mode where it won't perform inference (controlled by 
a boolean flag), and now has another new field which _also_ bypasses inference 
_and_ performs a cast. This is confusing to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-149413572
  
Thank you @JoshRosen , I will pick up this PR as some details I almost 
forgot. But definitely, the ignored test cases will fail without this PR 
previously, not sure if someone else fixed that in some other place.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-19 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r42445776
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -236,15 +241,22 @@ private[sql] object PartitioningUtils {
   }
 
   /**
-   * Converts a string to a [[Literal]] with automatic type inference.  
Currently only supports
-   * [[IntegerType]], [[LongType]], [[DoubleType]], 
[[DecimalType.SYSTEM_DEFAULT]], and
-   * [[StringType]].
+   * Converts a string to a [[Literal]] with automatic type inference if 
no field type specified.
+   * Auto inference only supports [[IntegerType]], [[LongType]], 
[[DoubleType]],
+   * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]].
*/
   private[sql] def inferPartitionColumnValue(
+  expectedDT: Option[DataType],
   raw: String,
   defaultPartitionName: String,
-  typeInference: Boolean): Literal = {
-if (typeInference) {
+  typeInference: Boolean): Literal = expectedDT match {
+case Some(dt) if raw == defaultPartitionName =>
+  Literal.create(null, dt)
+case Some(dt) if dt == StringType =>
+  Literal.create(unescapePathName(raw), StringType)
+case Some(dt) =>
+  Literal.create(Cast(Literal.create(unescapePathName(raw), 
StringType), dt).eval(null), dt)
--- End diff --

Instead of `eval(null)`, I think this could simply be `eval()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-10-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8026#discussion_r41919664
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
 ---
@@ -447,7 +447,7 @@ abstract class HadoopFsRelationTest extends QueryTest 
with SQLTestUtils {
 
   // HadoopFsRelation.discoverPartitions() called by refresh(), which will 
ignore
   // the given partition data type.
--- End diff --

Remove the comment?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132455893
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41202/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132455892
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132455872
  
  [Test build #41202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41202/console)
 for   PR 8026 at commit 
[`f68d827`](https://github.com/apache/spark/commit/f68d82714a3e8eb2033d9ad04ef136c9132b38e7).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132480407
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132480410
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41214/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132480287
  
  [Test build #41214 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41214/console)
 for   PR 8026 at commit 
[`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132431832
  
  [Test build #41202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41202/consoleFull)
 for   PR 8026 at commit 
[`f68d827`](https://github.com/apache/spark/commit/f68d82714a3e8eb2033d9ad04ef136c9132b38e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-13249
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132445384
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41211/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132445382
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132445340
  
  [Test build #41211 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41211/consoleFull)
 for   PR 8026 at commit 
[`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132445375
  
  [Test build #41211 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41211/console)
 for   PR 8026 at commit 
[`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-13240
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132446424
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132446425
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41209/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132446390
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132430775
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132430765
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132444085
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132444093
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132447550
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132447554
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41212/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132447476
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132447507
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132450511
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132450499
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132451174
  
  [Test build #41214 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41214/consoleFull)
 for   PR 8026 at commit 
[`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-18 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-132450163
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-07 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-128615201
  
cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-07 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-128643963
  
A summary of my offline discussion with @chenghao-intel:

The real problem here is that the partition column types of the newly 
refreshed partition spec don't match those in the user specified spec. The 
current fix simply disables refreshing partition spec, which is not preferable. 
My suggestion is to factor out the [partition values casting part] [1] in the 
`partitionSpec` method and reuse it in `refresh()` to cast data types of 
partition values and just reuse `partitionColumns` in the user specified 
partition spec.

[1]: 
https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L460-L473


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-07 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/8026

[SPARK-9735][SQL]Respect the user specified schema than the infer partition 
schema for HadoopFsRelation

To enable the unit test of `hadoopFsRelationSuite.Partition column type 
casting`. It previously threw exception like:
···
11.521 ERROR org.apache.spark.executor.Executor: Exception in task 2.0 in 
stage 2.0 (TID 130)
java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:45)
at 
org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.getUTF8String(SpecificMutableRow.scala:195)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toScalaImpl(CatalystTypeConverters.scala:297)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toScalaImpl(CatalystTypeConverters.scala:289)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toScala(CatalystTypeConverters.scala:110)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:278)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:245)
at 
org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$2.apply(CatalystTypeConverters.scala:406)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3$$anonfun$apply$2.apply(SparkPlan.scala:194)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$3$$anonfun$apply$2.apply(SparkPlan.scala:194)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1836)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1836)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
···

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark partition_discovery

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8026


commit 637e26fec1c00cad457f5ae92200b5f6700f1e36
Author: Cheng Hao hao.ch...@intel.com
Date:   2015-08-07T06:45:21Z

make lower priority of infer partition schema for HadoopFsRelation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-128619293
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-128616385
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...

2015-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8026#issuecomment-128616376
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org