Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20894#discussion_r189062745 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala --- @@ -497,6 +498,11 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { StructType(schema.filterNot(_.name == parsedOptions.columnNameOfCorruptRecord)) val linesWithoutHeader: RDD[String] = maybeFirstLine.map { firstLine => + if (!parsedOptions.enforceSchema) { + CSVDataSource.checkHeader(firstLine, new CsvParser(parsedOptions.asParserSettings), --- End diff -- I mean we could, for example, make a dataset from spark.read.text("tmp/*.csv"), preprocess it and then convert it via spark.read.csv(dataset). In this case, every file would have the header. This doesn't validate each file's header. Shall we document this if it's hard to fix?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org