beliefer commented on a change in pull request #32958: URL: https://github.com/apache/spark/pull/32958#discussion_r655934506
########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala ########## @@ -1647,4 +1643,300 @@ private[spark] object QueryCompilationErrors { def invalidYearMonthIntervalType(startFieldName: String, endFieldName: String): Throwable = { new AnalysisException(s"'interval $startFieldName to $endFieldName' is invalid.") } + + def queryFromRawFilesIncludeCorruptRecordColumnError(): Throwable = { + new AnalysisException( + """ + |Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the + |referenced columns only include the internal corrupt record column + |(named _corrupt_record by default). For example: + |spark.read.schema(schema).csv(file).filter($\"_corrupt_record\".isNotNull).count() + |and spark.read.schema(schema).csv(file).select(\"_corrupt_record\").show(). + |Instead, you can cache or save the parsed results and then send the same query. + |For example, val df = spark.read.schema(schema).csv(file).cache() and then + |df.filter($\"_corrupt_record\".isNotNull).count(). + """.stripMargin('#')) Review comment: Thank you. `stripMargin` is good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org