[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

mallman Mon, 02 Jul 2018 14:50:23 -0700

Github user mallman commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21320#discussion_r199631341
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
    @@ -47,16 +47,25 @@ import org.apache.spark.sql.types._
      *
      * Due to this reason, we no longer rely on [[ReadContext]] to pass 
requested schema from [[init()]]
      * to [[prepareForRead()]], but use a private `var` for simplicity.
    + *
    + * @param parquetMrCompatibility support reading with parquet-mr or 
Spark's built-in Parquet reader
      */
    -private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone])
    +private[parquet] class ParquetReadSupport(val convertTz: Option[TimeZone],
    +    parquetMrCompatibility: Boolean)
         extends ReadSupport[UnsafeRow] with Logging {
       private var catalystRequestedSchema: StructType = _
     
    +  /**
    +   * Construct a [[ParquetReadSupport]] with [[convertTz]] set to [[None]] 
and
    +   * [[parquetMrCompatibility]] set to [[false]].
    +   *
    +   * We need a zero-arg constructor for SpecificParquetRecordReaderBase.  
But that is only
    +   * used in the vectorized reader, where we get the convertTz value 
directly, and the value here
    +   * is ignored. Further, we set [[parquetMrCompatibility]] to [[false]] 
as this constructor is only
    +   * called by the Spark reader.
    --- End diff --
    
    I don't understand your confusion. I think the comment makes it very clear 
why we need to set that parameter to false. How can I make it better? Or can 
you be more specific about what is unclear to you?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

Reply via email to