[GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...

cloud-fan Wed, 28 Nov 2018 03:18:02 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21004#discussion_r237040743
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala
 ---
    @@ -126,35 +126,32 @@ abstract class PartitioningAwareFileIndex(
         val caseInsensitiveOptions = CaseInsensitiveMap(parameters)
         val timeZoneId = 
caseInsensitiveOptions.get(DateTimeUtils.TIMEZONE_OPTION)
           .getOrElse(sparkSession.sessionState.conf.sessionLocalTimeZone)
    -
    -    userPartitionSchema match {
    +    val inferredPartitionSpec = PartitioningUtils.parsePartitions(
    +      leafDirs,
    +      typeInference = 
sparkSession.sessionState.conf.partitionColumnTypeInferenceEnabled,
    --- End diff --
    
    Before this patch, there was a subtle difference between with and without a 
user-provided partition schema:
    1. with user-provided partition schema, we should not infer data types. We 
should infer as string and cast to user-provided type
    2. without user-provided partition schema, we should infer the data 
type(with a config)
    
    So it was wrong to unify these 2 code paths. @gengliangwang can you change 
it back?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...

Reply via email to