zhengruifeng opened a new pull request, #47355:
URL: https://github.com/apache/spark/pull/47355

   ### What changes were proposed in this pull request?
   Make `from_xml` support StructType schema
   
   ### Why are the changes needed?
   StructType schema was supported in Spark Classic, but not in Spark Connect
   
   to address https://github.com/apache/spark/pull/43680#discussion_r1385332357
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   before:
   ```
   from pyspark.sql.types import StructType, LongType
   import pyspark.sql.functions as sf
   data = [(1, '''<p><a>1</a></p>''')]
   df = spark.createDataFrame(data, ("key", "value"))
   
   schema = StructType().add("a", LongType())
   df.select(sf.from_xml(df.value, schema)).show()
   
   ---------------------------------------------------------------------------
   AnalysisException                         Traceback (most recent call last)
   Cell In[1], line 7
   ...
   AnalysisException: [PARSE_SYNTAX_ERROR] Syntax error at or near '{'. 
SQLSTATE: 42601
   
   JVM stacktrace:
   org.apache.spark.sql.AnalysisException
        at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:278)
        at 
org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:98)
        at 
org.apache.spark.sql.catalyst.parser.AbstractParser.parseDataType(parsers.scala:40)
        at 
org.apache.spark.sql.types.DataType$.$anonfun$fromDDL$1(DataType.scala:126)
        at 
org.apache.spark.sql.types.DataType$.parseTypeWithFallback(DataType.scala:145)
        at org.apache.spark.sql.types.DataType$.fromDDL(DataType.scala:127)
   ```
   
   after:
   ```
   +---------------+
   |from_xml(value)|
   +---------------+
   |            {1}|
   +---------------+
   
   ```
   
   ### How was this patch tested?
   added doctest
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to