[jira] [Commented] (SPARK-47892) XML: Stop ignoring CDATA within rows.

2024-04-30 Thread Yousof Hosny (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842481#comment-17842481
 ] 

Yousof Hosny commented on SPARK-47892:
--

This change was implemented as a Follow Up under: 
https://issues.apache.org/jira/browse/SPARK-47371. 
Resolving this issue as duplicate. 

> XML: Stop ignoring CDATA within rows. 
> --
>
> Key: SPARK-47892
> URL: https://issues.apache.org/jira/browse/SPARK-47892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
> Fix For: 4.0.0
>
>
> This change ignores CDATA within row tags as well as outside of it. We should 
> only ignore CDATA found outside of row tags as they are considered data 
> within the row.
> [https://github.com/apache/spark/pull/45487]
>  
> NOTE: With the current parser implementation, after not ignoring CDATA 
> elements within row tags there remains the edge case of a matching closing 
> row tag within CDATA which will be parsed as a valid end tag. 
> Example:
> {code:java}
>   {code}
> after no longer ignoring CDATA within rows, the closing tag in the example 
> above will be matched by the parser which is incorrect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47892) XML: Stop ignoring CDATA within rows.

2024-04-30 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny resolved SPARK-47892.
--
Resolution: Duplicate

> XML: Stop ignoring CDATA within rows. 
> --
>
> Key: SPARK-47892
> URL: https://issues.apache.org/jira/browse/SPARK-47892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
> Fix For: 4.0.0
>
>
> This change ignores CDATA within row tags as well as outside of it. We should 
> only ignore CDATA found outside of row tags as they are considered data 
> within the row.
> [https://github.com/apache/spark/pull/45487]
>  
> NOTE: With the current parser implementation, after not ignoring CDATA 
> elements within row tags there remains the edge case of a matching closing 
> row tag within CDATA which will be parsed as a valid end tag. 
> Example:
> {code:java}
>   {code}
> after no longer ignoring CDATA within rows, the closing tag in the example 
> above will be matched by the parser which is incorrect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47275) XML: Change to not support DROPMALFORMED parse mode

2024-04-30 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny resolved SPARK-47275.
--
Resolution: Duplicate

> XML: Change to not support DROPMALFORMED parse mode
> ---
>
> Key: SPARK-47275
> URL: https://issues.apache.org/jira/browse/SPARK-47275
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Minor
>
> Change XML expressions to not support DROPMALFORMED parse mode. This matches 
> JSON expressions which also do not support it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47275) XML: Change to not support DROPMALFORMED parse mode

2024-04-30 Thread Yousof Hosny (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842480#comment-17842480
 ] 

Yousof Hosny commented on SPARK-47275:
--

[~HF] Yes, that is correct thanks for the notice. Closed as duplicate. 

> XML: Change to not support DROPMALFORMED parse mode
> ---
>
> Key: SPARK-47275
> URL: https://issues.apache.org/jira/browse/SPARK-47275
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Minor
>
> Change XML expressions to not support DROPMALFORMED parse mode. This matches 
> JSON expressions which also do not support it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47892) XML: Stop ignoring CDATA within rows.

2024-04-17 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47892:


 Summary: XML: Stop ignoring CDATA within rows. 
 Key: SPARK-47892
 URL: https://issues.apache.org/jira/browse/SPARK-47892
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yousof Hosny
 Fix For: 4.0.0


This change ignores CDATA within row tags as well as outside of it. We should 
only ignore CDATA found outside of row tags as they are considered data within 
the row.
[https://github.com/apache/spark/pull/45487]

 

NOTE: With the current parser implementation, after not ignoring CDATA elements 
within row tags there remains the edge case of a matching closing row tag 
within CDATA which will be parsed as a valid end tag. 
Example:
{code:java}
  {code}
after no longer ignoring CDATA within rows, the closing tag in the example 
above will be matched by the parser which is incorrect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47371) XML: Ignore row tags in CDATA

2024-03-12 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47371:


 Summary: XML: Ignore row tags in CDATA
 Key: SPARK-47371
 URL: https://issues.apache.org/jira/browse/SPARK-47371
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yousof Hosny


The current parser does not recognize CDATA sections and thus will read row 
tags that are enclosed within a CDATA section. The expected behavior is for 
none of the following rows to be read, but they are all read. 
{code:java}
// BUG:  rowTag in CDATA section
val xmlString="""


{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47371) XML: Ignore row tags in CDATA Tokenizer

2024-03-12 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47371:
-
Summary: XML: Ignore row tags in CDATA Tokenizer  (was: XML: Ignore row 
tags in CDATA)

> XML: Ignore row tags in CDATA Tokenizer
> ---
>
> Key: SPARK-47371
> URL: https://issues.apache.org/jira/browse/SPARK-47371
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Minor
>
> The current parser does not recognize CDATA sections and thus will read row 
> tags that are enclosed within a CDATA section. The expected behavior is for 
> none of the following rows to be read, but they are all read. 
> {code:java}
> // BUG:  rowTag in CDATA section
> val xmlString="""
> 
> 
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47345) XML: Add XmlFunctionsSuite

2024-03-11 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47345:
-
Description: Convert JsonFunctiosnSuite.scala to XML equivalent. Note that 
XML doesn’t implement all json functions like {{{}json_tuple{}}}, 
{{{}get_json_object{}}}, etc.

> XML: Add XmlFunctionsSuite
> --
>
> Key: SPARK-47345
> URL: https://issues.apache.org/jira/browse/SPARK-47345
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Minor
>
> Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t 
> implement all json functions like {{{}json_tuple{}}}, 
> {{{}get_json_object{}}}, etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47345) XML: Add XmlFunctionsSuite

2024-03-11 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47345:
-
Summary: XML: Add XmlFunctionsSuite  (was: Add XmlFunctionsSuite)

> XML: Add XmlFunctionsSuite
> --
>
> Key: SPARK-47345
> URL: https://issues.apache.org/jira/browse/SPARK-47345
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47345) Add XmlFunctionsSuite

2024-03-11 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47345:


 Summary: Add XmlFunctionsSuite
 Key: SPARK-47345
 URL: https://issues.apache.org/jira/browse/SPARK-47345
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yousof Hosny






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47330) XML: Add XmlExpressionsSuite

2024-03-08 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47330:


 Summary: XML: Add XmlExpressionsSuite 
 Key: SPARK-47330
 URL: https://issues.apache.org/jira/browse/SPARK-47330
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yousof Hosny


Convert JsonExpressionsSuite.scala to XML equivalent. Note that XML doesn’t 
implement all json functions like {{{}json_tuple{}}}, {{{}get_json_object{}}}, 
etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47275) XML: Change to not support DROPMALFORMED parse mode

2024-03-04 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47275:


 Summary: XML: Change to not support DROPMALFORMED parse mode
 Key: SPARK-47275
 URL: https://issues.apache.org/jira/browse/SPARK-47275
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yousof Hosny


Change XML expressions to not support DROPMALFORMED parse mode. This matches 
JSON expressions which also do not support it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47218) XML: Ignore commented Row Tags in XML tokenizer

2024-02-28 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47218:
-
Summary: XML: Ignore commented Row Tags in XML tokenizer  (was: XML: Ignore 
commented row tags in XML tokenizer)

> XML: Ignore commented Row Tags in XML tokenizer
> ---
>
> Key: SPARK-47218
> URL: https://issues.apache.org/jira/browse/SPARK-47218
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
>
> The following returns rows that was within comments:
> {{}}
> {code:java}
> // BUG: rowTag in comment -- incorrectly processed 
> display(spark.read.xml(write(""" 1 
>  """))){code}
> {{}}
> This has been reported before:[!https://github.com/fluidicon.png!How to 
> Ignore XML comments like this · Issue #208 · 
> databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47218) XML: Ignore commented row tags in XML tokenizer

2024-02-28 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47218:
-
Summary: XML: Ignore commented row tags in XML tokenizer  (was: XML: Skip 
rowTag in a comment)

> XML: Ignore commented row tags in XML tokenizer
> ---
>
> Key: SPARK-47218
> URL: https://issues.apache.org/jira/browse/SPARK-47218
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
>
> The following returns rows that was within comments:
> {{}}
> {code:java}
> // BUG: rowTag in comment -- incorrectly processed 
> display(spark.read.xml(write(""" 1 
>  """))){code}
> {{}}
> This has been reported before:[!https://github.com/fluidicon.png!How to 
> Ignore XML comments like this · Issue #208 · 
> databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47218) XML: Skip rowTag in a comment

2024-02-28 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47218:
-
Description: 
The following returns rows that was within comments:
```

{{// BUG: rowTag in comment -- incorrectly processed
display(spark.read.xml(write(""" 1 
 """)))}}

```

This has been reported before:[!https://github.com/fluidicon.png!How to Ignore 
XML comments like this · Issue #208 · 
databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
{{}}

  was:
The following returns rows that was within comments:
display(spark.read.xml(write(""" 1 
 “"")))
 
This has been reported before:[!https://github.com/fluidicon.png!How to Ignore 
XML comments like this · Issue #208 · 
databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
{{}}


> XML: Skip rowTag in a comment
> -
>
> Key: SPARK-47218
> URL: https://issues.apache.org/jira/browse/SPARK-47218
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
>
> The following returns rows that was within comments:
> ```
> {{// BUG: rowTag in comment -- incorrectly processed
> display(spark.read.xml(write(""" 1 
>  """)))}}
> ```
> This has been reported before:[!https://github.com/fluidicon.png!How to 
> Ignore XML comments like this · Issue #208 · 
> databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47218) XML: Skip rowTag in a comment

2024-02-28 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47218:
-
Description: 
The following returns rows that was within comments:

{{}}
{code:java}
// BUG: rowTag in comment -- incorrectly processed 
display(spark.read.xml(write(""" 1 
 """))){code}
{{}}

This has been reported before:[!https://github.com/fluidicon.png!How to Ignore 
XML comments like this · Issue #208 · 
databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
{{}}

  was:
The following returns rows that was within comments:
```

{{// BUG: rowTag in comment -- incorrectly processed
display(spark.read.xml(write(""" 1 
 """)))}}

```

This has been reported before:[!https://github.com/fluidicon.png!How to Ignore 
XML comments like this · Issue #208 · 
databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
{{}}


> XML: Skip rowTag in a comment
> -
>
> Key: SPARK-47218
> URL: https://issues.apache.org/jira/browse/SPARK-47218
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
>
> The following returns rows that was within comments:
> {{}}
> {code:java}
> // BUG: rowTag in comment -- incorrectly processed 
> display(spark.read.xml(write(""" 1 
>  """))){code}
> {{}}
> This has been reported before:[!https://github.com/fluidicon.png!How to 
> Ignore XML comments like this · Issue #208 · 
> databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47218) XML: Skip rowTag in a comment

2024-02-28 Thread Yousof Hosny (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yousof Hosny updated SPARK-47218:
-
Summary: XML: Skip rowTag in a comment  (was: XML: Skip rowTag in a 
comment,)

> XML: Skip rowTag in a comment
> -
>
> Key: SPARK-47218
> URL: https://issues.apache.org/jira/browse/SPARK-47218
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yousof Hosny
>Priority: Major
>
> The following returns rows that was within comments:
> display(spark.read.xml(write(""" 1 
>  “"")))
>  
> This has been reported before:[!https://github.com/fluidicon.png!How to 
> Ignore XML comments like this · Issue #208 · 
> databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
> {{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47218) XML: Skip rowTag in a comment,

2024-02-28 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47218:


 Summary: XML: Skip rowTag in a comment,
 Key: SPARK-47218
 URL: https://issues.apache.org/jira/browse/SPARK-47218
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yousof Hosny


The following returns rows that was within comments:
display(spark.read.xml(write(""" 1 
 “"")))
 
This has been reported before:[!https://github.com/fluidicon.png!How to Ignore 
XML comments like this · Issue #208 · 
databricks/spark-xml|https://github.com/databricks/spark-xml/issues/208]
{{}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org