Re: [DISCUSS] SPIP: XML data source support

Martin Andersson Wed, 19 Jul 2023 01:51:42 -0700

How much of an effort is it to use the spark-xml library today? What's the 
drawback to keeping this as an external library as-is?

Best Regards, Martin
________________________________
From: Hyukjin Kwon <gurwls...@apache.org>
Sent: Wednesday, July 19, 2023 01:27
To: Sandip Agarwala <sandip.agarw...@databricks.com>
Cc: dev@spark.apache.org <dev@spark.apache.org>
Subject: Re: [DISCUSS] SPIP: XML data source support

EXTERNAL SENDER. Do not click links or open attachments unless you recognize 
the sender and know the content is safe. DO NOT provide your username or 
password.

Yeah I support this. XML is pretty outdated format TBH but still used in many 
legacy systems. For example, Wikipedia dump is one case.

Even when you take a look from stats CVS vs XML vs JSON, some show that XML is 
more used in CSV.

On Wed, Jul 19, 2023 at 12:58 AM Sandip Agarwala 
<sandip.agarw...@databricks.com<mailto:sandip.agarw...@databricks.com>> wrote:
Dear Spark community,

I would like to start a discussion on "XML data source support".

XML is a widely used data format. An external spark-xml package 
(https://github.com/databricks/spark-xml) is available to read and write XML 
data in spark. Making spark-xml built-in will provide a better user experience 
for Spark SQL and structured streaming. The proposal is to inline code from the 
spark-xml package.
I am collaborating with Hyukjin Kwon, who is the original author of spark-xml, 
for this effort.

SPIP link:
https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing

JIRA:
https://issues.apache.org/jira/browse/SPARK-44265

Looking forward to your feedback.
Thanks, Sandip

Re: [DISCUSS] SPIP: XML data source support

Reply via email to