How much of an effort is it to use the spark-xml library today? What's the drawback to keeping this as an external library as-is?
Best Regards, Martin ________________________________ From: Hyukjin Kwon <gurwls...@apache.org> Sent: Wednesday, July 19, 2023 01:27 To: Sandip Agarwala <sandip.agarw...@databricks.com> Cc: dev@spark.apache.org <dev@spark.apache.org> Subject: Re: [DISCUSS] SPIP: XML data source support EXTERNAL SENDER. Do not click links or open attachments unless you recognize the sender and know the content is safe. DO NOT provide your username or password. Yeah I support this. XML is pretty outdated format TBH but still used in many legacy systems. For example, Wikipedia dump is one case. Even when you take a look from stats CVS vs XML vs JSON, some show that XML is more used in CSV. On Wed, Jul 19, 2023 at 12:58 AM Sandip Agarwala <sandip.agarw...@databricks.com<mailto:sandip.agarw...@databricks.com>> wrote: Dear Spark community, I would like to start a discussion on "XML data source support". XML is a widely used data format. An external spark-xml package (https://github.com/databricks/spark-xml) is available to read and write XML data in spark. Making spark-xml built-in will provide a better user experience for Spark SQL and structured streaming. The proposal is to inline code from the spark-xml package. I am collaborating with Hyukjin Kwon, who is the original author of spark-xml, for this effort. SPIP link: https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit?usp=sharing JIRA: https://issues.apache.org/jira/browse/SPARK-44265 Looking forward to your feedback. Thanks, Sandip