[GitHub] [spark] yaooqinn commented on a change in pull request #31921: [SPARK-34817][SQL] Read parquet unsigned types that stored as int32 physical type in parquet

GitBox Sun, 21 Mar 2021 23:59:29 -0700


yaooqinn commented on a change in pull request #31921:
URL: https://github.com/apache/spark/pull/31921#discussion_r598463339




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
##########
@@ -130,13 +130,11 @@ class ParquetToSparkSchemaConverter(
       case INT32 =>
         originalType match {
           case INT_8 => ByteType
-          case INT_16 => ShortType
-          case INT_32 | null => IntegerType
+          case INT_16 | UINT_8 => ShortType
+          case INT_32 | UINT_16 | null => IntegerType
           case DATE => DateType
           case DECIMAL => makeDecimalType(Decimal.MAX_INT_DIGITS)
-          case UINT_8 => typeNotSupported()
-          case UINT_16 => typeNotSupported()
-          case UINT_32 => typeNotSupported()
+          case UINT_32 => LongType

Review comment:
       Thanks, @HyukjinKwon,
   Yea, I have checked that PR too. There's also a suggestion that we support 
them.
   Lately, Wenchen created https://issues.apache.org/jira/browse/SPARK-34786 
for reading uint64. As other unsigned types are not supported too,  and they 
are a bit more clear than uint64 which needs a decimal.
   
   IMO, for Spark, it is worthwhile to be able to support more storage layer 
features without breaking our own rules. So I raised this PR to collect more 
opinions.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #31921: [SPARK-34817][SQL] Read parquet unsigned types that stored as int32 physical type in parquet

Reply via email to