[ https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon reassigned SPARK-47371: ------------------------------------ Assignee: Yousof Hosny > XML: Ignore row tags in CDATA Tokenizer > --------------------------------------- > > Key: SPARK-47371 > URL: https://issues.apache.org/jira/browse/SPARK-47371 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 4.0.0 > Reporter: Yousof Hosny > Assignee: Yousof Hosny > Priority: Minor > Labels: pull-request-available > > The current parser does not recognize CDATA sections and thus will read row > tags that are enclosed within a CDATA section. The expected behavior is for > none of the following rows to be read, but they are all read. > {code:java} > // BUG: rowTag in CDATA section > val xmlString="""<?xml version="1.0" encoding="UTF-8" ?> > <test><![CDATA[ > <elem id="1" /> > <elem id="2" > </elem> > <elem> <id>3</id> </elem> > ]]> > </test> > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org