beliefer commented on a change in pull request #32958:
URL: https://github.com/apache/spark/pull/32958#discussion_r655934506



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -1647,4 +1643,300 @@ private[spark] object QueryCompilationErrors {
   def invalidYearMonthIntervalType(startFieldName: String, endFieldName: 
String): Throwable = {
     new AnalysisException(s"'interval $startFieldName to $endFieldName' is 
invalid.")
   }
+
+  def queryFromRawFilesIncludeCorruptRecordColumnError(): Throwable = {
+    new AnalysisException(
+      """
+        |Since Spark 2.3, the queries from raw JSON/CSV files are disallowed 
when the
+        |referenced columns only include the internal corrupt record column
+        |(named _corrupt_record by default). For example:
+        
|spark.read.schema(schema).csv(file).filter($\"_corrupt_record\".isNotNull).count()
+        |and 
spark.read.schema(schema).csv(file).select(\"_corrupt_record\").show().
+        |Instead, you can cache or save the parsed results and then send the 
same query.
+        |For example, val df = spark.read.schema(schema).csv(file).cache() and 
then
+        |df.filter($\"_corrupt_record\".isNotNull).count().
+      """.stripMargin('#'))

Review comment:
       Thank you.  `stripMargin` is good enough.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to