[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

rdblue Tue, 03 Jul 2018 16:44:55 -0700

Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21682#discussion_r199977784
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
    @@ -69,6 +77,14 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
       private val makeNotEq: PartialFunction[DataType, (String, Any) => 
FilterPredicate] = {
         case BooleanType =>
           (n: String, v: Any) => FilterApi.notEq(booleanColumn(n), 
v.asInstanceOf[java.lang.Boolean])
    +    case ByteType =>
    --- End diff --
    
    Usually, both byte and short would be stored as integers in Parquet. 
Because Parquet uses bit packing, it doesn't matter if you store them as ints 
(or even longs) because they'll get packed into the same space.
    
    The important thing is to match the Parquet file's type when pushing a 
filter. Since Spark stores ByteType and ShortType in Parquet as INT32, this is 
correct.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...

Reply via email to