This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 2e07662e24e2 [SPARK-46952][SQL] XML: Limit size of corrupt record 2e07662e24e2 is described below commit 2e07662e24e243e7d1760ea063c9e88417bc873f Author: Sandip Agarwala <131817656+sandip...@users.noreply.github.com> AuthorDate: Fri Feb 2 15:45:09 2024 +0900 [SPARK-46952][SQL] XML: Limit size of corrupt record ### What changes were proposed in this pull request? Limit the size of malformed XML string that gets stored in the corrupt column. ### Why are the changes needed? A large corrupt XML record can be arbitrarily large and may cause OOM. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? Unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #44994 from sandip-db/xml_limit_corrupt_record_size. Authored-by: Sandip Agarwala <131817656+sandip...@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls...@apache.org> --- .../main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala index 2458d1772dab..674d5f63b039 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/StaxXmlParser.scala @@ -145,7 +145,7 @@ class StaxXmlParser( def doParseColumn(xml: String, parseMode: ParseMode, xsdSchema: Option[Schema]): Option[InternalRow] = { - val xmlRecord = UTF8String.fromString(xml) + lazy val xmlRecord = UTF8String.fromString(xml) try { xsdSchema.foreach { schema => schema.newValidator().validate(new StreamSource(new StringReader(xml))) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org