[GitHub] spark pull request #22157: [SPARK-25126][SQL] Avoid creating Reader for all ...

cloud-fan Thu, 23 Aug 2018 01:44:47 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22157#discussion_r212227737
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ---
    @@ -79,9 +79,10 @@ object OrcUtils extends Logging {
         val ignoreCorruptFiles = 
sparkSession.sessionState.conf.ignoreCorruptFiles
         val conf = sparkSession.sessionState.newHadoopConf()
         // TODO: We need to support merge schema. Please see SPARK-11412.
    -    files.map(_.getPath).flatMap(readSchema(_, conf, 
ignoreCorruptFiles)).headOption.map { schema =>
    -      logDebug(s"Reading schema from file $files, got Hive schema string: 
$schema")
    -      
CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]
    +    files.toIterator.map(file => readSchema(file.getPath, conf, 
ignoreCorruptFiles)).collectFirst {
    +      case Some(schema) =>
    +        logDebug(s"Reading schema from file $files, got Hive schema 
string: $schema")
    +        
CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]
    --- End diff --
    
    BTW, I think we have to create a reader for each file when implementing 
schema merging like parquet, right?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22157: [SPARK-25126][SQL] Avoid creating Reader for all ...

Reply via email to