Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240038837
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: 
JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    I didn't mean type inference in partition values but you are probably right 
we should follow the same logic in schema inferring in datasources and 
partition value types.
    
    Just wondering how it works for now, this code: 
https://github.com/apache/spark/blob/5a140b7844936cf2b65f08853b8cfd8c499d4f13/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L474-L482
 and this 
https://github.com/apache/spark/blob/f982ca07e80074bdc1e3b742c5e21cf368e4ede2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L163
 can use different timestamp patterns, or it is supposed to work only with 
default settings?
    
    Maybe `inferPartitionColumnValue` should ask a datasource for inferring 
date/timestamp types?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to