Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20894#discussion_r189062745
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
           StructType(schema.filterNot(_.name == 
parsedOptions.columnNameOfCorruptRecord))
     
         val linesWithoutHeader: RDD[String] = maybeFirstLine.map { firstLine =>
    +      if (!parsedOptions.enforceSchema) {
    +        CSVDataSource.checkHeader(firstLine, new 
CsvParser(parsedOptions.asParserSettings),
    --- End diff --
    
    I mean we could, for example, make a dataset from 
spark.read.text("tmp/*.csv"), preprocess it and then convert it via 
spark.read.csv(dataset). In this case, every file would have the header. This 
doesn't validate each file's header.
    
    Shall we document this if it's hard to fix?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to