This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 2e07662e24e2 [SPARK-46952][SQL] XML: Limit size of corrupt record
2e07662e24e2 is described below

commit 2e07662e24e243e7d1760ea063c9e88417bc873f
Author: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
AuthorDate: Fri Feb 2 15:45:09 2024 +0900

    [SPARK-46952][SQL] XML: Limit size of corrupt record
    
    ### What changes were proposed in this pull request?
    Limit the size of malformed XML string that gets stored in the corrupt 
column.
    
    ### Why are the changes needed?
    A large corrupt XML record can be arbitrarily large and may cause OOM.
    
    ### Does this PR introduce _any_ user-facing change?
    Yes
    
    ### How was this patch tested?
    Unit test
    
    ### Was this patch authored or co-authored using generative AI tooling?
    No
    
    Closes #44994 from sandip-db/xml_limit_corrupt_record_size.
    
    Authored-by: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
    Signed-off-by: Hyukjin Kwon <gurwls...@apache.org>
---
 .../main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
index 2458d1772dab..674d5f63b039 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala
@@ -145,7 +145,7 @@ class StaxXmlParser(
   def doParseColumn(xml: String,
       parseMode: ParseMode,
       xsdSchema: Option[Schema]): Option[InternalRow] = {
-    val xmlRecord = UTF8String.fromString(xml)
+    lazy val xmlRecord = UTF8String.fromString(xml)
     try {
       xsdSchema.foreach { schema =>
         schema.newValidator().validate(new StreamSource(new StringReader(xml)))


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to