[GitHub] spark pull request #19331: [SPARK-22109][SQL] Resolves type conflicts betwee...

squito Mon, 25 Sep 2017 13:36:06 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19331#discussion_r140890429
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
    @@ -470,15 +471,15 @@ object PartitioningUtils {
        * Given a collection of [[Literal]]s, resolves possible type conflicts 
by up-casting "lower"
        * types.
        */
    -  private def resolveTypeConflicts(literals: Seq[Literal]): Seq[Literal] = 
{
    +  private def resolveTypeConflicts(literals: Seq[Literal], timeZone: 
TimeZone): Seq[Literal] = {
         val desiredType = {
           val topType = 
literals.map(_.dataType).maxBy(upCastingOrder.indexOf(_))
    --- End diff --
    
    thanks for such a quick fix!
    
    but, don't we also need to update 
[`upCastingOrder`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L430)?
  It seems this happens to work because `upcastingOrder.indexOf(TimestampType) 
= -1`.  But I think that doesn't compare the right way with NullType, so if you 
add this to [`ParquetPartitionDiscoverySuite.test("parse 
partitions")`](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala#L241),
 it fails:
    
    ```
        check(Seq(
          s"hdfs://host:9000/path/a=$defaultPartitionName/b=blah",
          s"hdfs://host:9000/path/a=2014-01-01 00%3A00%3A00.0/b=foo"),
          PartitionSpec(
            StructType(Seq(
              StructField("a", TimestampType),
              StructField("b", StringType))),
            Seq(
              Partition(InternalRow(null, "blah"),
                s"hdfs://host:9000/path/a=$defaultPartitionName/b=blah"),
              Partition(InternalRow(Timestamp.valueOf("2014-01-01 00:00:00.0"), 
"foo"),
                s"hdfs://host:9000/path/a=2014-01-01 00%3A00%3A00.0/b=foo"))))
    ```
    
    (I have to admit, I don't totally understand what the ramifications of that 
fail are -- the behavior in the resulting dataframe seems fine to me, but I 
figure there is probably some case this would mess up ...)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19331: [SPARK-22109][SQL] Resolves type conflicts betwee...

Reply via email to