[ 
https://issues.apache.org/jira/browse/SPARK-49865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063665#comment-18063665
 ] 

Holden Karau commented on SPARK-49865:
--------------------------------------

I guess the question here is should we implement it client side (I don't think 
so) or add another RPC for SparkConnect. Is that what your suggesting 
[~oschistad] ?

> Extend Spark XSD parser into PySpark
> ------------------------------------
>
>                 Key: SPARK-49865
>                 URL: https://issues.apache.org/jira/browse/SPARK-49865
>             Project: Spark
>          Issue Type: Wish
>          Components: PySpark
>    Affects Versions: 3.5.3
>            Reporter: Ole André Schistad
>            Priority: Major
>
> While PySpark does support XML parsing, it lacks a native XSD parser. 
> Instead, you have to access the native Scala method {{XSDToSchema}} via the 
> Spark Context.
> This does work, but only if you are able to access the SparkContext object.  
> This is going to be a problem for users on Databricks, where Shared Clusters 
> are required for many important features such as row-level security, since 
> the Spark Context is not available from PySpark in this environment.
> So I would like to request that the Apache Spark project implements a PySpark 
> wrapper for the XSDToSchema Scala method. This should be a fairly quick and 
> easy fix, as it is functionally the same as the existing PySpark function 
> {{from_ddl_schema}}
> This is implemented in PySpark already, 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py:1903:]
>  
> {code:java}
> def from_ddl_schema(type_str: str) -> DataType:
>             return _parse_datatype_json_string(
>                 cast(JVMView, sc._jvm)
>                 .org.apache.spark.sql.types.StructType.fromDDL(type_str)
>                 .json()
>             ) {code}
> And so an equivalent function for XSD might be as simple as:
> {code:java}
> def from_xsd_schema(type_str: str) -> DataType:
>            return _parse_datatype_json_string(
>                 cast(JVMView, sc._jvm)
>                 .org.apache.spark.sql.execution.datasources.xml
>                 .XSDToSchema.read(type_str)
>                 .json()
>             ) {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to