Shujing Yang created SPARK-46382:
------------------------------------

             Summary: XML: Capture values interspersed between elements
                 Key: SPARK-46382
                 URL: https://issues.apache.org/jira/browse/SPARK-46382
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Shujing Yang


In XML, elements typically consist of a name and a value, with the value 
enclosed between the opening and closing tags. But XML also allows to include 
arbitrary values interspersed between these elements. To address this, we 
provide an option named `valueTags`, which is enabled by default, to capture 
these values. Consider the following example:

```

<ROW>
    <a>1</a>
  value1
  <b>
    value2
    <c>2</c>
    value3
  </b>
</ROW>

```
In this example, `<a>`,`<b>`, and `<c>` are named elements with their 
respective values enclosed within tags. There are arbitrary values value1 
value2 value3 interspersed between the elements. Please note that there can be 
multiple occurrences of values in a single element (i.e. there are value2, 
value3 in the element <b>)

 

We should parse the values between tags into the valueTags field. If there are 
multiple occurrences of value tags, the value tag field will be converted to an 
array type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to