sandip-db commented on code in PR #45487:
URL: https://github.com/apache/spark/pull/45487#discussion_r1556266125


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala:
##########
@@ -682,25 +684,25 @@ class XmlTokenizer(
         return false
       }
       val c = cOrEOF.toChar
-      if (c == commentEnd(i)) {
-        if (i >= commentEnd.length - 1) {
-          // Found comment close.
+      if (c == end(i)) {
+        i += 1
+        if (i >= end.length) {

Review Comment:
   Please add a test with two scenarios:
   - CDATA ends at the end of the file,
   - CDATA never ends. 
   The later will be invalid XML. Goal is to make sure the parser doesn't crash 
and still returns other valid records.
   
   Add the same two tests for comments as well. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to