[
https://issues.apache.org/jira/browse/SPARK-49865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063665#comment-18063665
]
Holden Karau commented on SPARK-49865:
--------------------------------------
I guess the question here is should we implement it client side (I don't think
so) or add another RPC for SparkConnect. Is that what your suggesting
[~oschistad] ?
> Extend Spark XSD parser into PySpark
> ------------------------------------
>
> Key: SPARK-49865
> URL: https://issues.apache.org/jira/browse/SPARK-49865
> Project: Spark
> Issue Type: Wish
> Components: PySpark
> Affects Versions: 3.5.3
> Reporter: Ole André Schistad
> Priority: Major
>
> While PySpark does support XML parsing, it lacks a native XSD parser.
> Instead, you have to access the native Scala method {{XSDToSchema}} via the
> Spark Context.
> This does work, but only if you are able to access the SparkContext object.
> This is going to be a problem for users on Databricks, where Shared Clusters
> are required for many important features such as row-level security, since
> the Spark Context is not available from PySpark in this environment.
> So I would like to request that the Apache Spark project implements a PySpark
> wrapper for the XSDToSchema Scala method. This should be a fairly quick and
> easy fix, as it is functionally the same as the existing PySpark function
> {{from_ddl_schema}}
> This is implemented in PySpark already,
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py:1903:]
>
> {code:java}
> def from_ddl_schema(type_str: str) -> DataType:
> return _parse_datatype_json_string(
> cast(JVMView, sc._jvm)
> .org.apache.spark.sql.types.StructType.fromDDL(type_str)
> .json()
> ) {code}
> And so an equivalent function for XSD might be as simple as:
> {code:java}
> def from_xsd_schema(type_str: str) -> DataType:
> return _parse_datatype_json_string(
> cast(JVMView, sc._jvm)
> .org.apache.spark.sql.execution.datasources.xml
> .XSDToSchema.read(type_str)
> .json()
> ) {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]