[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-11 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r44612323
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -124,34 +132,40 @@ private[sql] object PartitioningUtils {
*   Literal.create("hello", StringType),
*   Literal.create(3.14, FloatType)))
* }}}
+   * and the base path:
+   * {{{
+   *   /path/to/partition
+   * }}}
*/
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
 val columns = ArrayBuffer.empty[(String, Literal)]
 // Old Hadoop versions don't have `Path.isRoot`
 var finished = path.getParent == null
 var chopped = path
+var basePath = path
 
 while (!finished) {
   // Sometimes (e.g., when speculative task is enabled), temporary 
directories may be left
   // uncleaned.  Here we simply ignore them.
   if (chopped.getName.toLowerCase == "_temporary") {
-return None
+return (None, None)
   }
 
   val maybeColumn = parsePartitionColumn(chopped.getName, 
defaultPartitionName, typeInference)
   maybeColumn.foreach(columns += _)
+  basePath = chopped
   chopped = chopped.getParent
-  finished = maybeColumn.isEmpty || chopped.getParent == null
+  finished = (maybeColumn.isEmpty && !columns.isEmpty) || 
chopped.getParent == null
--- End diff --

oh i see. It is for something like `table/a=1/_temporary/something`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-11 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r44611718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -124,34 +132,40 @@ private[sql] object PartitioningUtils {
*   Literal.create("hello", StringType),
*   Literal.create(3.14, FloatType)))
* }}}
+   * and the base path:
+   * {{{
+   *   /path/to/partition
+   * }}}
*/
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
 val columns = ArrayBuffer.empty[(String, Literal)]
 // Old Hadoop versions don't have `Path.isRoot`
 var finished = path.getParent == null
 var chopped = path
+var basePath = path
 
 while (!finished) {
   // Sometimes (e.g., when speculative task is enabled), temporary 
directories may be left
   // uncleaned.  Here we simply ignore them.
   if (chopped.getName.toLowerCase == "_temporary") {
-return None
+return (None, None)
   }
 
   val maybeColumn = parsePartitionColumn(chopped.getName, 
defaultPartitionName, typeInference)
   maybeColumn.foreach(columns += _)
+  basePath = chopped
   chopped = chopped.getParent
-  finished = maybeColumn.isEmpty || chopped.getParent == null
+  finished = (maybeColumn.isEmpty && !columns.isEmpty) || 
chopped.getParent == null
--- End diff --

Why do we need `!columns.isEmpty`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153283225
  
**[Test build #44906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44906/consoleFull)**
 for PR 8840 at commit 
[`3db4226`](https://github.com/apache/spark/commit/3db42268b7948102278427f73d48e3ebb3196924).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153302126
  
**[Test build #44906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44906/consoleFull)**
 for PR 8840 at commit 
[`3db4226`](https://github.com/apache/spark/commit/3db42268b7948102278427f73d48e3ebb3196924).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153302250
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153302252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44906/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153393850
  
Should this one be backported to branch-1.5? 

cc @liancheng @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r43775159
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -77,9 +77,11 @@ private[sql] object PartitioningUtils {
   defaultPartitionName: String,
   typeInference: Boolean): PartitionSpec = {
 // First, we need to parse every partition's path and see if we can 
find partition values.
-val pathsWithPartitionValues = paths.flatMap { path =>
-  parsePartition(path, defaultPartitionName, typeInference).map(path 
-> _)
-}
+val (partitionValues, optBasePaths) = paths.map { path =>
+  parsePartition(path, defaultPartitionName, typeInference)
+}.unzip
+
+val pathsWithPartitionValues = paths.zip(partitionValues).flatMap(x => 
x._2.map(x._1 -> _))
--- End diff --

Seems it is not very obvious what we are doing at here. Maybe a comment can 
help.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153419110
  
@rxin I think it is find to not backport it. Without this, a user will get 
the DataFrame and when he/she queries it, it will fail (and the error message 
does not say what's the cause). Since, it does not really let users make any 
real progress, I think we do not really need to backport it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r43775309
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -58,14 +58,46 @@ class ParquetPartitionDiscoverySuite extends QueryTest 
with ParquetTest with Sha
 check(defaultPartitionName, Literal.create(null, NullType))
   }
 
+  test("parse invalid partitioned directories") {
+// Invalid
+var paths = Seq(
+  "hdfs://host:9000/invalidPath",
+  "hdfs://host:9000/path/a=10/b=20",
+  "hdfs://host:9000/path/a=10.5/b=hello")
+
+var exception = intercept[AssertionError] {
+  parsePartitions(paths.map(new Path(_)), defaultPartitionName, true)
+}
+assert(exception.getMessage().contains("Conflicting directory 
structures detected"))
+
+// Valid
+paths = Seq(
+  "hdfs://host:9000/path/_temporary",
+  "hdfs://host:9000/path/a=10/b=20",
+  "hdfs://host:9000/path/_temporary/path")
+
+parsePartitions(paths.map(new Path(_)), defaultPartitionName, true)
+
+// Invalid
+paths = Seq(
+  "hdfs://host:9000/path/_temporary",
+  "hdfs://host:9000/path/a=10/b=20",
+  "hdfs://host:9000/path/path1")
+
+exception = intercept[AssertionError] {
+  parsePartitions(paths.map(new Path(_)), defaultPartitionName, true)
+}
+assert(exception.getMessage().contains("Conflicting directory 
structures detected"))
+  }
--- End diff --

Which case is for the one I mentioned in the jira?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r43775098
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -87,6 +89,12 @@ private[sql] object PartitioningUtils {
 } else {
   // This dataset is partitioned. We need to check whether all 
partitions have the same
   // partition columns and resolve potential type conflicts.
+  val basePaths = optBasePaths.flatMap(x => x)
+  assert(
+basePaths.distinct.size == 1,
+"Conflicting directory structures detected. Suspicious paths:\b" +
+  basePaths.mkString("\n\t", "\n\t", "\n\n"))
--- End diff --

Can you give a case that we will hit this branch? What will `basePaths` be 
at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153419572
  
OK great. @viirya can you submit a followup pr to address @yhuai's feedback.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-03 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153592786
  
@rxin @yhuai OK. Submitted a following pr at #9459.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153259988
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153261282
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153259912
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153260381
  
**[Test build #44888 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44888/consoleFull)**
 for PR 8840 at commit 
[`3db4226`](https://github.com/apache/spark/commit/3db42268b7948102278427f73d48e3ebb3196924).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153280347
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44888/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153280346
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153280321
  
**[Test build #44888 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44888/consoleFull)**
 for PR 8840 at commit 
[`3db4226`](https://github.com/apache/spark/commit/3db42268b7948102278427f73d48e3ebb3196924).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153282428
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153282411
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r43719832
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -184,7 +204,8 @@ private[sql] object PartitioningUtils {
* }}}
*/
   private[sql] def resolvePartitions(
-  pathsWithPartitionValues: Seq[(Path, PartitionValues)]): 
Seq[PartitionValues] = {
+  pathsWithPartitionValues: Seq[(Path, PartitionValues)],
+  basePaths: Seq[Path]): Seq[PartitionValues] = {
--- End diff --

Removed. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-153282032
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r43699130
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -77,8 +77,16 @@ private[sql] object PartitioningUtils {
   defaultPartitionName: String,
   typeInference: Boolean): PartitionSpec = {
 // First, we need to parse every partition's path and see if we can 
find partition values.
-val pathsWithPartitionValues = paths.flatMap { path =>
-  parsePartition(path, defaultPartitionName, typeInference).map(path 
-> _)
+val partitionValuesWithBasePaths = paths.map { path =>
+  path -> parsePartition(path, defaultPartitionName, typeInference)
+}
+
+val pathsWithPartitionValues = partitionValuesWithBasePaths.flatMap { 
pathAndPart =>
+  pathAndPart._2._1.map(part => pathAndPart._1 -> part)
+}
+
+val basePaths = partitionValuesWithBasePaths.flatMap { pathAndPart =>
+  pathAndPart._2._2
--- End diff --

Can we writhe these like this?
```
val (basePaths, pathsWithPartitionValues) = paths.flatMap {
}.unzip
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-11-02 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r43699404
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -184,7 +204,8 @@ private[sql] object PartitioningUtils {
* }}}
*/
   private[sql] def resolvePartitions(
-  pathsWithPartitionValues: Seq[(Path, PartitionValues)]): 
Seq[PartitionValues] = {
+  pathsWithPartitionValues: Seq[(Path, PartitionValues)],
+  basePaths: Seq[Path]): Seq[PartitionValues] = {
--- End diff --

Where is `basePaths` used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-151132730
  
@chenghao-intel @JoshRosen any comments? Is this patch ready? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150900314
  
**[Test build #44316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/consoleFull)**
 for PR 8840 at commit 
[`cdf6dc4`](https://github.com/apache/spark/commit/cdf6dc424abba99a7fd091fca5ce2af56255f69a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42942564
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -128,30 +136,32 @@ private[sql] object PartitioningUtils {
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
--- End diff --

A base path is not always associated with a `PartitionValues`. If there is 
no partition, we can still have a base path.

That is why I don't make `case class PartitionValues(columnNames: 
Seq[String], literals: Seq[Literal])` to something like `case class 
PartitionValues(path: String, columnNames: Seq[String], literals: 
Seq[Literal])`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150900104
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150900105
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150911391
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150911390
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150911363
  
**[Test build #44316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44316/consoleFull)**
 for PR 8840 at commit 
[`cdf6dc4`](https://github.com/apache/spark/commit/cdf6dc424abba99a7fd091fca5ce2af56255f69a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150814991
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44295/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150814989
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150814759
  
**[Test build #44295 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44295/consoleFull)**
 for PR 8840 at commit 
[`779fbd2`](https://github.com/apache/spark/commit/779fbd264092e8b4ff0ab5472d34db2d01a971f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150802465
  
**[Test build #44295 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44295/consoleFull)**
 for PR 8840 at commit 
[`779fbd2`](https://github.com/apache/spark/commit/779fbd264092e8b4ff0ab5472d34db2d01a971f3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150799912
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-150799847
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42694917
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -128,30 +136,32 @@ private[sql] object PartitioningUtils {
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
--- End diff --

But it is possible that we only have Path without corresponding 
PartitionValues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149656467
  
**[Test build #44001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44001/consoleFull)**
 for PR 8840 at commit 
[`2d13156`](https://github.com/apache/spark/commit/2d13156c32002f65acc1bfccf09e08ef0063aa73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149654462
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149654228
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149655120
  
@chenghao-intel, it looks like this PR's error message improvements are 
similar to the ones that you added as part of #8026. @viirya, could you take a 
look at @chenghao-intel's PR to see which approach you like better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149692945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44001/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149692943
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149692790
  
**[Test build #44001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44001/consoleFull)**
 for PR 8840 at commit 
[`2d13156`](https://github.com/apache/spark/commit/2d13156c32002f65acc1bfccf09e08ef0063aa73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-149753146
  
@JoshRosen ok, I will take a look at it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42577464
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -207,6 +222,12 @@ private[sql] object PartitioningUtils {
 }
   }
 
+  private[sql] def throwErrorForInvalidPartition(
--- End diff --

This function seems used only once, can we inline it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42577386
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -128,30 +136,32 @@ private[sql] object PartitioningUtils {
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
--- End diff --

I am wondering if it's possible to make the return type as 
`Option[(PartitionValues, Path)]`, and can we simply ignore the path whose 
Column / Value is empty?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42577425
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -128,30 +136,32 @@ private[sql] object PartitioningUtils {
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
--- End diff --

BTW, you need to update this function description also for its return type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42576934
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -193,6 +204,10 @@ private[sql] object PartitioningUtils {
 distinctPartColNames.size == 1,
 listConflictingPartitionColumns(pathsWithPartitionValues))
 
+  assert(
--- End diff --

Move this `assert` out of the `resolvePartitions`, since it's not so tight 
with the original purpose of this function, and we can avoid the code change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-10-20 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/8840#discussion_r42577632
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -128,30 +136,32 @@ private[sql] object PartitioningUtils {
   private[sql] def parsePartition(
   path: Path,
   defaultPartitionName: String,
-  typeInference: Boolean): Option[PartitionValues] = {
+  typeInference: Boolean): (Option[PartitionValues], Option[Path]) = {
--- End diff --

Or change the `case class PartitionValues(columnNames: Seq[String], 
literals: Seq[Literal])` as `case class PartitionValues(path: String, 
columnNames: Seq[String], literals: Seq[Literal])`?

Then the code probably much simple and readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141700875
  
  [Test build #42708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42708/console)
 for   PR 8840 at commit 
[`2d13156`](https://github.com/apache/spark/commit/2d13156c32002f65acc1bfccf09e08ef0063aa73).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141709318
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141700888
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42708/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141700887
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141709793
  
  [Test build #42718 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42718/consoleFull)
 for   PR 8840 at commit 
[`2d13156`](https://github.com/apache/spark/commit/2d13156c32002f65acc1bfccf09e08ef0063aa73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141720466
  
  [Test build #42718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42718/console)
 for   PR 8840 at commit 
[`2d13156`](https://github.com/apache/spark/commit/2d13156c32002f65acc1bfccf09e08ef0063aa73).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141720580
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42718/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141720579
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/8840

[SPARK-10304][SQL] Partition discovery should throw an exception if the dir 
structure is invalid

JIRA: https://issues.apache.org/jira/browse/SPARK-10304

This patch detects if the structure of partition directories is not valid.

The test cases are from #8547. Thanks @zhzhan.

cc @liancheng

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 detect_invalid_part_dir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8840


commit 2d13156c32002f65acc1bfccf09e08ef0063aa73
Author: Liang-Chi Hsieh 
Date:   2015-09-19T13:01:01Z

Detect invalid partition directory.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141691809
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141691798
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141692703
  
  [Test build #42708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42708/consoleFull)
 for   PR 8840 at commit 
[`2d13156`](https://github.com/apache/spark/commit/2d13156c32002f65acc1bfccf09e08ef0063aa73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141709394
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10304][SQL] Partition discovery should ...

2015-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8840#issuecomment-141709421
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org