Re: XML Processing using Spark SQL

Hyukjin Kwon Thu, 12 May 2016 17:36:06 -0700

Hi Arunkumar,


I guess your records are self-closing ones.

There is an issue open here,
https://github.com/databricks/spark-xml/issues/92

This is about XmlInputFormat.scala and it seems a bit tricky to handle the
case so I left open until now.


Thanks!


2016-05-13 5:03 GMT+09:00 Arunkumar Chandrasekar <chan.arunku...@gmail.com>:

> Hello,
>
> Greetings.
>
> I'm trying to process a xml file exported from Health Kit application
> using Spark SQL for learning purpose. The sample record data is like the
> below:
>
>  <Record type="HKQuantityTypeIdentifierStepCount" sourceName="Vizhi"
> sourceVersion="9.3" device="&lt;&lt;HKDevice: 0x7896&gt;, name:iPhone,
> manufacturer:Apple, model:iPhone, hardware:iPhone7,2, software:9.3&gt;"
> unit="count" creationDate="2016-04-23 19:31:33 +0530" startDate="2016-04-23
> 19:00:20 +0530" endDate="2016-04-23 19:01:41 +0530" value="31"/>
>
>  <Record type="HKQuantityTypeIdentifierStepCount" sourceName="Vizhi"
> sourceVersion="9.3.1" device="&lt;&lt;HKDevice: 0x85746&gt;, name:iPhone,
> manufacturer:Apple, model:iPhone, hardware:iPhone7,2, software:9.3.1&gt;"
> unit="count" creationDate="2016-04-24 05:45:00 +0530" startDate="2016-04-24
> 05:25:04 +0530" endDate="2016-04-24 05:25:24 +0530" value="10"/>.
>
> I want to have the column name of my table as the field value like type,
> sourceName, sourceVersion and the row entries as their respective values
> like HKQuantityTypeIdentifierStepCount, Vizhi, 9.3.1,..
>
> I took a look at the Spark-XML <https://github.com/databricks/spark-xml>,
> but didn't get any information in my case (my xml is not well formed with
> the tags). Is there any other option to convert the record that I have
> mentioned above into a schema format for playing with Spark SQL?
>
> Thanks in Advance.
>
> *Thank You*,
> Arun Chandrasekar
> chan.arunku...@gmail.com
>

Re: XML Processing using Spark SQL

Reply via email to