[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

HyukjinKwon Thu, 19 Jul 2018 21:16:55 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21320#discussion_r203934281
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
    @@ -47,16 +47,25 @@ import org.apache.spark.sql.types._
      *
      * Due to this reason, we no longer rely on [[ReadContext]] to pass 
requested schema from [[init()]]
      * to [[prepareForRead()]], but use a private `var` for simplicity.
    + *
    + * @param parquetMrCompatibility support reading with parquet-mr or 
Spark's built-in Parquet reader
      */
    -private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone])
    +private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone],
    +    parquetMrCompatibility: Boolean)
         extends ReadSupport[UnsafeRow] with Logging {
       private var catalystRequestedSchema: StructType = _
     
    +  /**
    +   * Construct a [[ParquetReadSupport]] with [[convertTz]] set to [[None]] 
and
    +   * [[parquetMrCompatibility]] set to [[false]].
    +   *
    +   * We need a zero-arg constructor for SpecificParquetRecordReaderBase.  
But that is only
    +   * used in the vectorized reader, where we get the convertTz value 
directly, and the value here
    +   * is ignored. Further, we set [[parquetMrCompatibility]] to [[false]] 
as this constructor is only
    +   * called by the Spark reader.
    --- End diff --
    
    re: 
https://github.com/apache/spark/pull/21320/files/cb858f202e49d69f2044681e37f982dc10676296#r199631341
 actually, it doesn't looks clear to me too. What does the flag indicate? you 
mean normal parquet reader vs vectorized parquet reader?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

Reply via email to