[jira] [Commented] (SPARK-47892) XML: Stop ignoring CDATA within rows.
[ https://issues.apache.org/jira/browse/SPARK-47892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842481#comment-17842481 ] Yousof Hosny commented on SPARK-47892: -- This change was implemented as a Follow Up under: https://issues.apache.org/jira/browse/SPARK-47371. Resolving this issue as duplicate. > XML: Stop ignoring CDATA within rows. > -- > > Key: SPARK-47892 > URL: https://issues.apache.org/jira/browse/SPARK-47892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > Fix For: 4.0.0 > > > This change ignores CDATA within row tags as well as outside of it. We should > only ignore CDATA found outside of row tags as they are considered data > within the row. > [https://github.com/apache/spark/pull/45487] > > NOTE: With the current parser implementation, after not ignoring CDATA > elements within row tags there remains the edge case of a matching closing > row tag within CDATA which will be parsed as a valid end tag. > Example: > {code:java} > {code} > after no longer ignoring CDATA within rows, the closing tag in the example > above will be matched by the parser which is incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47892) XML: Stop ignoring CDATA within rows.
[ https://issues.apache.org/jira/browse/SPARK-47892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny resolved SPARK-47892. -- Resolution: Duplicate > XML: Stop ignoring CDATA within rows. > -- > > Key: SPARK-47892 > URL: https://issues.apache.org/jira/browse/SPARK-47892 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > Fix For: 4.0.0 > > > This change ignores CDATA within row tags as well as outside of it. We should > only ignore CDATA found outside of row tags as they are considered data > within the row. > [https://github.com/apache/spark/pull/45487] > > NOTE: With the current parser implementation, after not ignoring CDATA > elements within row tags there remains the edge case of a matching closing > row tag within CDATA which will be parsed as a valid end tag. > Example: > {code:java} > {code} > after no longer ignoring CDATA within rows, the closing tag in the example > above will be matched by the parser which is incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47275) XML: Change to not support DROPMALFORMED parse mode
[ https://issues.apache.org/jira/browse/SPARK-47275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny resolved SPARK-47275. -- Resolution: Duplicate > XML: Change to not support DROPMALFORMED parse mode > --- > > Key: SPARK-47275 > URL: https://issues.apache.org/jira/browse/SPARK-47275 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Minor > > Change XML expressions to not support DROPMALFORMED parse mode. This matches > JSON expressions which also do not support it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47275) XML: Change to not support DROPMALFORMED parse mode
[ https://issues.apache.org/jira/browse/SPARK-47275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842480#comment-17842480 ] Yousof Hosny commented on SPARK-47275: -- [~HF] Yes, that is correct thanks for the notice. Closed as duplicate. > XML: Change to not support DROPMALFORMED parse mode > --- > > Key: SPARK-47275 > URL: https://issues.apache.org/jira/browse/SPARK-47275 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Minor > > Change XML expressions to not support DROPMALFORMED parse mode. This matches > JSON expressions which also do not support it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47892) XML: Stop ignoring CDATA within rows.
Yousof Hosny created SPARK-47892: Summary: XML: Stop ignoring CDATA within rows. Key: SPARK-47892 URL: https://issues.apache.org/jira/browse/SPARK-47892 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yousof Hosny Fix For: 4.0.0 This change ignores CDATA within row tags as well as outside of it. We should only ignore CDATA found outside of row tags as they are considered data within the row. [https://github.com/apache/spark/pull/45487] NOTE: With the current parser implementation, after not ignoring CDATA elements within row tags there remains the edge case of a matching closing row tag within CDATA which will be parsed as a valid end tag. Example: {code:java} {code} after no longer ignoring CDATA within rows, the closing tag in the example above will be matched by the parser which is incorrect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47371) XML: Ignore row tags in CDATA
Yousof Hosny created SPARK-47371: Summary: XML: Ignore row tags in CDATA Key: SPARK-47371 URL: https://issues.apache.org/jira/browse/SPARK-47371 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yousof Hosny The current parser does not recognize CDATA sections and thus will read row tags that are enclosed within a CDATA section. The expected behavior is for none of the following rows to be read, but they are all read. {code:java} // BUG: rowTag in CDATA section val xmlString=""" {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47371) XML: Ignore row tags in CDATA Tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47371: - Summary: XML: Ignore row tags in CDATA Tokenizer (was: XML: Ignore row tags in CDATA) > XML: Ignore row tags in CDATA Tokenizer > --- > > Key: SPARK-47371 > URL: https://issues.apache.org/jira/browse/SPARK-47371 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Minor > > The current parser does not recognize CDATA sections and thus will read row > tags that are enclosed within a CDATA section. The expected behavior is for > none of the following rows to be read, but they are all read. > {code:java} > // BUG: rowTag in CDATA section > val xmlString=""" > > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47345) XML: Add XmlFunctionsSuite
[ https://issues.apache.org/jira/browse/SPARK-47345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47345: - Description: Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t implement all json functions like {{{}json_tuple{}}}, {{{}get_json_object{}}}, etc. > XML: Add XmlFunctionsSuite > -- > > Key: SPARK-47345 > URL: https://issues.apache.org/jira/browse/SPARK-47345 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Minor > > Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t > implement all json functions like {{{}json_tuple{}}}, > {{{}get_json_object{}}}, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47345) XML: Add XmlFunctionsSuite
[ https://issues.apache.org/jira/browse/SPARK-47345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47345: - Summary: XML: Add XmlFunctionsSuite (was: Add XmlFunctionsSuite) > XML: Add XmlFunctionsSuite > -- > > Key: SPARK-47345 > URL: https://issues.apache.org/jira/browse/SPARK-47345 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47345) Add XmlFunctionsSuite
Yousof Hosny created SPARK-47345: Summary: Add XmlFunctionsSuite Key: SPARK-47345 URL: https://issues.apache.org/jira/browse/SPARK-47345 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yousof Hosny -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47330) XML: Add XmlExpressionsSuite
Yousof Hosny created SPARK-47330: Summary: XML: Add XmlExpressionsSuite Key: SPARK-47330 URL: https://issues.apache.org/jira/browse/SPARK-47330 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Yousof Hosny Convert JsonExpressionsSuite.scala to XML equivalent. Note that XML doesn’t implement all json functions like {{{}json_tuple{}}}, {{{}get_json_object{}}}, etc. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47275) XML: Change to not support DROPMALFORMED parse mode
Yousof Hosny created SPARK-47275: Summary: XML: Change to not support DROPMALFORMED parse mode Key: SPARK-47275 URL: https://issues.apache.org/jira/browse/SPARK-47275 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Yousof Hosny Change XML expressions to not support DROPMALFORMED parse mode. This matches JSON expressions which also do not support it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47218) XML: Ignore commented Row Tags in XML tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47218: - Summary: XML: Ignore commented Row Tags in XML tokenizer (was: XML: Ignore commented row tags in XML tokenizer) > XML: Ignore commented Row Tags in XML tokenizer > --- > > Key: SPARK-47218 > URL: https://issues.apache.org/jira/browse/SPARK-47218 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > > The following returns rows that was within comments: > {{}} > {code:java} > // BUG: rowTag in comment -- incorrectly processed > display(spark.read.xml(write(""" 1 > """))){code} > {{}} > This has been reported before:[!https://github.com/fluidicon.png!How to > Ignore XML comments like this · Issue #208 · > databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47218) XML: Ignore commented row tags in XML tokenizer
[ https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47218: - Summary: XML: Ignore commented row tags in XML tokenizer (was: XML: Skip rowTag in a comment) > XML: Ignore commented row tags in XML tokenizer > --- > > Key: SPARK-47218 > URL: https://issues.apache.org/jira/browse/SPARK-47218 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > > The following returns rows that was within comments: > {{}} > {code:java} > // BUG: rowTag in comment -- incorrectly processed > display(spark.read.xml(write(""" 1 > """))){code} > {{}} > This has been reported before:[!https://github.com/fluidicon.png!How to > Ignore XML comments like this · Issue #208 · > databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47218) XML: Skip rowTag in a comment
[ https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47218: - Description: The following returns rows that was within comments: ``` {{// BUG: rowTag in comment -- incorrectly processed display(spark.read.xml(write(""" 1 """)))}} ``` This has been reported before:[!https://github.com/fluidicon.png!How to Ignore XML comments like this · Issue #208 · databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] {{}} was: The following returns rows that was within comments: display(spark.read.xml(write(""" 1 “""))) This has been reported before:[!https://github.com/fluidicon.png!How to Ignore XML comments like this · Issue #208 · databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] {{}} > XML: Skip rowTag in a comment > - > > Key: SPARK-47218 > URL: https://issues.apache.org/jira/browse/SPARK-47218 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > > The following returns rows that was within comments: > ``` > {{// BUG: rowTag in comment -- incorrectly processed > display(spark.read.xml(write(""" 1 > """)))}} > ``` > This has been reported before:[!https://github.com/fluidicon.png!How to > Ignore XML comments like this · Issue #208 · > databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47218) XML: Skip rowTag in a comment
[ https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47218: - Description: The following returns rows that was within comments: {{}} {code:java} // BUG: rowTag in comment -- incorrectly processed display(spark.read.xml(write(""" 1 """))){code} {{}} This has been reported before:[!https://github.com/fluidicon.png!How to Ignore XML comments like this · Issue #208 · databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] {{}} was: The following returns rows that was within comments: ``` {{// BUG: rowTag in comment -- incorrectly processed display(spark.read.xml(write(""" 1 """)))}} ``` This has been reported before:[!https://github.com/fluidicon.png!How to Ignore XML comments like this · Issue #208 · databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] {{}} > XML: Skip rowTag in a comment > - > > Key: SPARK-47218 > URL: https://issues.apache.org/jira/browse/SPARK-47218 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > > The following returns rows that was within comments: > {{}} > {code:java} > // BUG: rowTag in comment -- incorrectly processed > display(spark.read.xml(write(""" 1 > """))){code} > {{}} > This has been reported before:[!https://github.com/fluidicon.png!How to > Ignore XML comments like this · Issue #208 · > databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47218) XML: Skip rowTag in a comment
[ https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yousof Hosny updated SPARK-47218: - Summary: XML: Skip rowTag in a comment (was: XML: Skip rowTag in a comment,) > XML: Skip rowTag in a comment > - > > Key: SPARK-47218 > URL: https://issues.apache.org/jira/browse/SPARK-47218 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Yousof Hosny >Priority: Major > > The following returns rows that was within comments: > display(spark.read.xml(write(""" 1 > “""))) > > This has been reported before:[!https://github.com/fluidicon.png!How to > Ignore XML comments like this · Issue #208 · > databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] > {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47218) XML: Skip rowTag in a comment,
Yousof Hosny created SPARK-47218: Summary: XML: Skip rowTag in a comment, Key: SPARK-47218 URL: https://issues.apache.org/jira/browse/SPARK-47218 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Yousof Hosny The following returns rows that was within comments: display(spark.read.xml(write(""" 1 “""))) This has been reported before:[!https://github.com/fluidicon.png!How to Ignore XML comments like this · Issue #208 · databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208] {{}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org