[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user adamjk commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-164703283 Is this being backported to 1.5.x? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150337486 LGTM, so I'm going to merge this into master. Should this be backported to 1.5.x or any earlier releases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8026 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149919454 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149919423 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149911447 Thank you @JoshRosen so much for the detail review, but seems bug exists, I'd like to solve it myself soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149922732 **[Test build #44066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44066/consoleFull)** for PR 8026 at commit [`bdee89e`](https://github.com/apache/spark/commit/bdee89ea394b0477103b88c971df971812e98f82). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149959477 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149959310 **[Test build #44066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44066/consoleFull)** for PR 8026 at commit [`bdee89e`](https://github.com/apache/spark/commit/bdee89ea394b0477103b88c971df971812e98f82). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149959480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44066/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150021131 @chenghao-intel, it looks like this most recent test failure is legitimate: ``` assertion failed: Actual partitioning column names did not match user-specified partitioning schema; expect StructType(StructField(part,IntegerType,true)), but got StructType()} ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150068746 Yes, true, actually SPARK-7749 provides an example of Hive metastore backend empty partition table, then we will not detect any partition column values. I simply removed the assertion in the code, as it's not valid in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150068875 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150068962 **[Test build #44113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44113/consoleFull)** for PR 8026 at commit [`3383473`](https://github.com/apache/spark/commit/3383473e8a56eba9f7c92106dca9e171f88e0534). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150068858 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-150091343 **[Test build #44113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44113/consoleFull)** for PR 8026 at commit [`3383473`](https://github.com/apache/spark/commit/3383473e8a56eba9f7c92106dca9e171f88e0534). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42533868 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -134,17 +137,38 @@ private[sql] object PartitioningUtils { var finished = path.getParent == null var chopped = path +var idx = 0 // the partition index from the right side of the path while (!finished) { + val folderName = chopped.getName // Sometimes (e.g., when speculative task is enabled), temporary directories may be left // uncleaned. Here we simply ignore them. - if (chopped.getName.toLowerCase == "_temporary") { + if (folderName.toLowerCase == "_temporary") { return None } - val maybeColumn = parsePartitionColumn(chopped.getName, defaultPartitionName, typeInference) - maybeColumn.foreach(columns += _) + folderName.split("=") match { +case Array(columnName, rawColumnValue) => + val field = userDefinedPartitionColumns.map(struct => struct(struct.length - idx - 1)) + assert(columnName.nonEmpty, s"Empty partition column name in '$folderName'") + assert(field.isEmpty || (field.get.name == columnName)) + assert(rawColumnValue.nonEmpty, s"Empty partition column value in '$folderName'") + + val literal = inferPartitionColumnValue( +field.map(_.dataType), rawColumnValue, defaultPartitionName, typeInference) + columns += (columnName -> literal) + +case Array(value) if folderName.startsWith("=") => + throw new AssertionError(s"Empty partition column name in '$folderName'") --- End diff -- Is AssertionError the right exception to be throwing here? I'd think that IllegalArgumentException might be more appropriate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42509765 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -101,11 +118,13 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha checkThrows[AssertionError]("file://path/=10", "Empty partition column name") checkThrows[AssertionError]("file://path/a=", "Empty partition column value") +checkThrows[AssertionError]("file://path/a=b=c", "Not a partition format in") --- End diff -- You're right, it's not related for this PR, but a very trivial checking, with more informative message for the invalid partition in the path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149598558 @JoshRosen I've updated the unit test also by adding an `Append` operation, without this PR, it will throws exception as I described in the jira (https://issues.apache.org/jira/browse/SPARK-9735). The root reason that the previous unit test can even passed, should be solved #8035, as it will always get the latest schema from the user specified without calling the `relation.refresh()`, however `relation.refresh()` will be called indirectly in `Append` mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149628393 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42509891 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -236,15 +241,22 @@ private[sql] object PartitioningUtils { } /** - * Converts a string to a [[Literal]] with automatic type inference. Currently only supports - * [[IntegerType]], [[LongType]], [[DoubleType]], [[DecimalType.SYSTEM_DEFAULT]], and - * [[StringType]]. + * Converts a string to a [[Literal]] with automatic type inference if no field type specified. + * Auto inference only supports [[IntegerType]], [[LongType]], [[DoubleType]], + * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]]. */ private[sql] def inferPartitionColumnValue( + expectedDT: Option[DataType], --- End diff -- I agree that casting to a non string type and then converting back to a string may lose precision, but what about disabling inference when calling inferPartitionColumnValue if the user has provided a schema? In that case, it should end up just returning the string literals, which you can then cast without a loss of precision. Sent from my phone > On Oct 20, 2015, at 8:00 AM, Cheng Haowrote: > > In sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala: > > > */ > >private[sql] def inferPartitionColumnValue( > > + expectedDT: Option[DataType], > We need to pass the expect the data type down and then get the associated literal-based partition column value; and @liancheng's suggestion kind of like get the literal (maybe string based) first, and then do casting outside, however, this probably lose some data precision during the re-casting. > > For example: > The path looks like, /part1=1.000, and with the auto inference, we will get a Double, and it will be cast to string as 1.0 if what user expect is StringType; > > However, this is totally different if we get it as StringType directly, which supposed to be 1.000. > > â > Reply to this email directly or view it on GitHub. > --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149628397 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43984/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149627957 **[Test build #43984 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43984/consoleFull)** for PR 8026 at commit [`7f2da8c`](https://github.com/apache/spark/commit/7f2da8c4868ed0c3fcdc9ab7748b421a5ebc6f89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149591364 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149591384 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149591577 **[Test build #43984 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43984/consoleFull)** for PR 8026 at commit [`7f2da8c`](https://github.com/apache/spark/commit/7f2da8c4868ed0c3fcdc9ab7748b421a5ebc6f89). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42506907 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -236,15 +241,22 @@ private[sql] object PartitioningUtils { } /** - * Converts a string to a [[Literal]] with automatic type inference. Currently only supports - * [[IntegerType]], [[LongType]], [[DoubleType]], [[DecimalType.SYSTEM_DEFAULT]], and - * [[StringType]]. + * Converts a string to a [[Literal]] with automatic type inference if no field type specified. + * Auto inference only supports [[IntegerType]], [[LongType]], [[DoubleType]], + * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]]. */ private[sql] def inferPartitionColumnValue( + expectedDT: Option[DataType], --- End diff -- We need to pass the expect the data type down and then get the associated literal-based partition column value; and @liancheng's suggestion kind of like get the literal (maybe string based) first, and then do casting outside, however, this probably lose some data precision during the re-casting. For example: The path looks like, /part1=1.000, and with the auto inference, we will get a Double, and it will be cast to string as `1.0` if what user expect is StringType; However, this is totally different if we get it as StringType directly, which supposed to be `1.000`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42575628 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -134,17 +137,38 @@ private[sql] object PartitioningUtils { var finished = path.getParent == null var chopped = path +var idx = 0 // the partition index from the right side of the path while (!finished) { + val folderName = chopped.getName // Sometimes (e.g., when speculative task is enabled), temporary directories may be left // uncleaned. Here we simply ignore them. - if (chopped.getName.toLowerCase == "_temporary") { + if (folderName.toLowerCase == "_temporary") { return None } - val maybeColumn = parsePartitionColumn(chopped.getName, defaultPartitionName, typeInference) - maybeColumn.foreach(columns += _) + folderName.split("=") match { +case Array(columnName, rawColumnValue) => + val field = userDefinedPartitionColumns.map(struct => struct(struct.length - idx - 1)) + assert(columnName.nonEmpty, s"Empty partition column name in '$folderName'") + assert(field.isEmpty || (field.get.name == columnName)) + assert(rawColumnValue.nonEmpty, s"Empty partition column value in '$folderName'") + + val literal = inferPartitionColumnValue( +field.map(_.dataType), rawColumnValue, defaultPartitionName, typeInference) + columns += (columnName -> literal) + +case Array(value) if folderName.startsWith("=") => + throw new AssertionError(s"Empty partition column name in '$folderName'") --- End diff -- I'll agree we need to take the partition path validation into a separate PR, since we definitely can do more checking and also more pretty error message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149776551 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149777383 **[Test build #44038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44038/consoleFull)** for PR 8026 at commit [`2cc93da`](https://github.com/apache/spark/commit/2cc93dac02b5a91f6c3dee0a0fdfb6a019c00921). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42575993 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -236,15 +241,22 @@ private[sql] object PartitioningUtils { } /** - * Converts a string to a [[Literal]] with automatic type inference. Currently only supports - * [[IntegerType]], [[LongType]], [[DoubleType]], [[DecimalType.SYSTEM_DEFAULT]], and - * [[StringType]]. + * Converts a string to a [[Literal]] with automatic type inference if no field type specified. + * Auto inference only supports [[IntegerType]], [[LongType]], [[DoubleType]], + * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]]. */ private[sql] def inferPartitionColumnValue( + expectedDT: Option[DataType], --- End diff -- Sounds good to me, I will update the code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149776594 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149779455 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149779487 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149781507 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44038/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149781506 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149781486 **[Test build #44038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44038/consoleFull)** for PR 8026 at commit [`2cc93da`](https://github.com/apache/spark/commit/2cc93dac02b5a91f6c3dee0a0fdfb6a019c00921). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149781986 @JoshRosen I've updated the code, should be more straightforward and clean --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149783351 **[Test build #44040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44040/consoleFull)** for PR 8026 at commit [`9f08f76`](https://github.com/apache/spark/commit/9f08f761ff404464b4bdfc83352e4bcad139e36c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149783376 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149783377 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44040/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42583944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) +: InternalRow = { + InternalRow((0 until row.numFields) map { i => +Cast(Literal.create(row.getString(i), StringType), schema.fields(i).dataType).eval() + }: _*) +} + +assert(schema.length == spec.partitionColumns.length && + schema.fieldNames.sameElements(spec.partitionColumns.fieldNames), + s"Auto infer partition column is not match with user specified, " + --- End diff -- The wording of this error message might be slightly confusing to users since this branch is explicitly _disabling_ inference. I think that it might be slightly clearer to say something like "Actual partitioning column names did not match user-specified partitioning schema; expected ... but got ...", since as far as I know the inference is really only done for the types of the columns, not their names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42584058 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) +: InternalRow = { + InternalRow((0 until row.numFields) map { i => +Cast(Literal.create(row.getString(i), StringType), schema.fields(i).dataType).eval() + }: _*) +} + +assert(schema.length == spec.partitionColumns.length && + schema.fieldNames.sameElements(spec.partitionColumns.fieldNames), + s"Auto infer partition column is not match with user specified, " + +s"expect $schema, but got ${spec.partitionColumns}}") + +PartitionSpec(schema, spec.partitions.map { part => + part.copy(values = castPartitionValueWithGivenSchema(part.values, schema)) +}) + case None => +val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() --- End diff -- You could just inline this call on line 574 and save one variable declaration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42584024 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) --- End diff -- Also, maybe rename this something like `castPartitionValuesToUserSchema`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42584044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) +: InternalRow = { + InternalRow((0 until row.numFields) map { i => +Cast(Literal.create(row.getString(i), StringType), schema.fields(i).dataType).eval() + }: _*) +} + +assert(schema.length == spec.partitionColumns.length && + schema.fieldNames.sameElements(spec.partitionColumns.fieldNames), + s"Auto infer partition column is not match with user specified, " + +s"expect $schema, but got ${spec.partitionColumns}}") + +PartitionSpec(schema, spec.partitions.map { part => + part.copy(values = castPartitionValueWithGivenSchema(part.values, schema)) +}) + case None => --- End diff -- To be super-explicit, maybe put a ` // user did not provide a partitioning schema` comment at the end of this line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149781430 **[Test build #44040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44040/consoleFull)** for PR 8026 at commit [`9f08f76`](https://github.com/apache/spark/commit/9f08f761ff404464b4bdfc83352e4bcad139e36c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42583431 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) +: InternalRow = { --- End diff -- In order to avoid the weird wrapping here, I think you might be able to just leave off the `: InternalRow` here, unless you somehow need it to appease MiMa. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42583415 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) --- End diff -- Super-minor nit: could you explicitly name the boolean parameter here at the call-site, e.g. `inferSchema = false`? This is one of IntelliJ's automatic style recommendations and I'm a fan of it because it makes the code a bit easier to read. I might also just change this myself on merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42583857 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) +: InternalRow = { + InternalRow((0 until row.numFields) map { i => --- End diff -- Nit: `.map` instead of using infix notation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42583983 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => +val spec = PartitioningUtils.parsePartitions( + leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, false) + +// Without auto inference, all of value in the `row` should be null or in StringType, +// we need to cast into the data type that user specified. +def castPartitionValueWithGivenSchema(row: InternalRow, schema: StructType) --- End diff -- Actually, do you need the `schema` field here, since it's always going to be the same? Maybe you could drop the `schema` parameter and retrieve it via the closure, which would simplify this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42583969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -544,11 +544,35 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio } private def discoverPartitions(): PartitionSpec = { -val typeInference = sqlContext.conf.partitionColumnTypeInferenceEnabled() // We use leaf dirs containing data files to discover the schema. val leafDirs = fileStatusCache.leafDirToChildrenFiles.keys.toSeq -PartitioningUtils.parsePartitions(leafDirs, PartitioningUtils.DEFAULT_PARTITION_NAME, - typeInference) +userDefinedPartitionColumns match { + case Some(schema) => --- End diff -- Maybe rename `schema` here to `userProvidedSchema` to be more explicit and avoid shadowing the `schema` variable defined in `castPartitionValueWithGivenSchema`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149785524 @chenghao-intel, thanks a bunch for updating this; the current version of this patch is a lot easier to understand and I'm happy with how clean the code turned out. I left only minor style / clarity comments, which I don't mind addressing myself on merge if you're too busy. If you don't mind, though, one more round of quick updates to address my comments would be appreciated. Anyhow, the technical changes here LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42584080 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -510,21 +510,39 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils with Tes } } - // HadoopFsRelation.discoverPartitions() called by refresh(), which will ignore - // the given partition data type. - ignore("Partition column type casting") { + test("Partition column type casting") { --- End diff -- Do you mind adding a comment beneath this line which reads `// regression test for SPARK-9735` so that readers can quickly figure out what this is supposed to be testing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149390928 @chenghao-intel, just to clarify: I noticed that your final approach involved pushing an expected data type down into the method named `inferPartitionColumValue`. I'm curious why you chose this approach as opposed to re-using the code that @liancheng [pointed to](https://github.com/apache/spark/pull/8026#issuecomment-128643963) upthread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42451143 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -236,15 +241,22 @@ private[sql] object PartitioningUtils { } /** - * Converts a string to a [[Literal]] with automatic type inference. Currently only supports - * [[IntegerType]], [[LongType]], [[DoubleType]], [[DecimalType.SYSTEM_DEFAULT]], and - * [[StringType]]. + * Converts a string to a [[Literal]] with automatic type inference if no field type specified. + * Auto inference only supports [[IntegerType]], [[LongType]], [[DoubleType]], + * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]]. */ private[sql] def inferPartitionColumnValue( + expectedDT: Option[DataType], --- End diff -- In the master branch, if `typeInference == false`, it means the data type of partition key will be `StringType` by default, otherwise, it's probably will be `IntegerType`, `LongType` etc. depends on the real value the partition key is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42451397 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -447,7 +447,7 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { // HadoopFsRelation.discoverPartitions() called by refresh(), which will ignore // the given partition data type. --- End diff -- Yes, true, this is not valid any more. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42446094 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -447,7 +447,7 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { // HadoopFsRelation.discoverPartitions() called by refresh(), which will ignore // the given partition data type. --- End diff -- +1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149406766 I just tried testing a build where I _only_ re-enabled the ignored test and changed nothing else. In this case, the test still passed. This makes me wonder whether the "Partition column type casting" is an adequate regression test for this issue. Can you write a new test for this which fails without this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42451442 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -458,6 +458,8 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { .partitionBy("ps", "p2") .saveAsTable("t") + val a = input.collect() --- End diff -- Oh, yeah, will remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42447317 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -458,6 +458,8 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { .partitionBy("ps", "p2") .saveAsTable("t") + val a = input.collect() --- End diff -- Why these two lines? Leftovers from debugging? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42447350 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala --- @@ -101,11 +118,13 @@ class ParquetPartitionDiscoverySuite extends QueryTest with ParquetTest with Sha checkThrows[AssertionError]("file://path/=10", "Empty partition column name") checkThrows[AssertionError]("file://path/a=", "Empty partition column value") +checkThrows[AssertionError]("file://path/a=b=c", "Not a partition format in") --- End diff -- Is this fixing a separate issue than the refresh issue? If so, can we do it separately? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42447228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -236,15 +241,22 @@ private[sql] object PartitioningUtils { } /** - * Converts a string to a [[Literal]] with automatic type inference. Currently only supports - * [[IntegerType]], [[LongType]], [[DoubleType]], [[DecimalType.SYSTEM_DEFAULT]], and - * [[StringType]]. + * Converts a string to a [[Literal]] with automatic type inference if no field type specified. + * Auto inference only supports [[IntegerType]], [[LongType]], [[DoubleType]], + * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]]. */ private[sql] def inferPartitionColumnValue( + expectedDT: Option[DataType], --- End diff -- Per my other comment upthread, this is a bit confusing to me: this method is named infer, but has a mode where it won't perform inference (controlled by a boolean flag), and now has another new field which _also_ bypasses inference _and_ performs a cast. This is confusing to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-149413572 Thank you @JoshRosen , I will pick up this PR as some details I almost forgot. But definitely, the ignored test cases will fail without this PR previously, not sure if someone else fixed that in some other place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r42445776 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -236,15 +241,22 @@ private[sql] object PartitioningUtils { } /** - * Converts a string to a [[Literal]] with automatic type inference. Currently only supports - * [[IntegerType]], [[LongType]], [[DoubleType]], [[DecimalType.SYSTEM_DEFAULT]], and - * [[StringType]]. + * Converts a string to a [[Literal]] with automatic type inference if no field type specified. + * Auto inference only supports [[IntegerType]], [[LongType]], [[DoubleType]], + * [[DecimalType.SYSTEM_DEFAULT]], and [[StringType]]. */ private[sql] def inferPartitionColumnValue( + expectedDT: Option[DataType], raw: String, defaultPartitionName: String, - typeInference: Boolean): Literal = { -if (typeInference) { + typeInference: Boolean): Literal = expectedDT match { +case Some(dt) if raw == defaultPartitionName => + Literal.create(null, dt) +case Some(dt) if dt == StringType => + Literal.create(unescapePathName(raw), StringType) +case Some(dt) => + Literal.create(Cast(Literal.create(unescapePathName(raw), StringType), dt).eval(null), dt) --- End diff -- Instead of `eval(null)`, I think this could simply be `eval()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8026#discussion_r41919664 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala --- @@ -447,7 +447,7 @@ abstract class HadoopFsRelationTest extends QueryTest with SQLTestUtils { // HadoopFsRelation.discoverPartitions() called by refresh(), which will ignore // the given partition data type. --- End diff -- Remove the comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132455893 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41202/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132455892 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132455872 [Test build #41202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41202/console) for PR 8026 at commit [`f68d827`](https://github.com/apache/spark/commit/f68d82714a3e8eb2033d9ad04ef136c9132b38e7). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132480407 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132480410 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41214/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132480287 [Test build #41214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41214/console) for PR 8026 at commit [`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132431832 [Test build #41202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41202/consoleFull) for PR 8026 at commit [`f68d827`](https://github.com/apache/spark/commit/f68d82714a3e8eb2033d9ad04ef136c9132b38e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-13249 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132445384 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41211/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132445382 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132445340 [Test build #41211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41211/consoleFull) for PR 8026 at commit [`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132445375 [Test build #41211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41211/console) for PR 8026 at commit [`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-13240 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132446424 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132446425 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41209/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132446390 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132430775 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132430765 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132444085 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132444093 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132447550 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132447554 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41212/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132447476 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132447507 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132450511 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132450499 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132451174 [Test build #41214 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41214/consoleFull) for PR 8026 at commit [`cda059f`](https://github.com/apache/spark/commit/cda059fd5ff8a08864f79171d1ba2e0becf73134). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-132450163 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-128615201 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-128643963 A summary of my offline discussion with @chenghao-intel: The real problem here is that the partition column types of the newly refreshed partition spec don't match those in the user specified spec. The current fix simply disables refreshing partition spec, which is not preferable. My suggestion is to factor out the [partition values casting part] [1] in the `partitionSpec` method and reuse it in `refresh()` to cast data types of partition values and just reuse `partitionColumns` in the user specified partition spec. [1]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala#L460-L473 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/8026 [SPARK-9735][SQL]Respect the user specified schema than the infer partition schema for HadoopFsRelation To enable the unit test of `hadoopFsRelationSuite.Partition column type casting`. It previously threw exception like: ··· 11.521 ERROR org.apache.spark.executor.Executor: Exception in task 2.0 in stage 2.0 (TID 130) java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:45) at org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.getUTF8String(SpecificMutableRow.scala:195) at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toScalaImpl(CatalystTypeConverters.scala:297) at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toScalaImpl(CatalystTypeConverters.scala:289) at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toScala(CatalystTypeConverters.scala:110) at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:278) at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toScala(CatalystTypeConverters.scala:245) at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToScalaConverter$2.apply(CatalystTypeConverters.scala:406) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3$$anonfun$apply$2.apply(SparkPlan.scala:194) at org.apache.spark.sql.execution.SparkPlan$$anonfun$3$$anonfun$apply$2.apply(SparkPlan.scala:194) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905) at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:905) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1836) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1836) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) ··· You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark partition_discovery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8026.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8026 commit 637e26fec1c00cad457f5ae92200b5f6700f1e36 Author: Cheng Hao hao.ch...@intel.com Date: 2015-08-07T06:45:21Z make lower priority of infer partition schema for HadoopFsRelation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-128619293 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-128616385 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9735][SQL]Respect the user specified sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8026#issuecomment-128616376 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org