Hi:

We are trying to parse XML data to get below output from given input sample.
Can someone suggest a way to pass one DFrames output into load() function or 
any other alternative to get this output.

Input Data from Oracle Table XMLBlob:
SequenceID

Name

City

XMLComment

1

Amol

Kolhapur

<books><Comments><Comment><Title>Title1.1</Title><Description>Description_1.1</Description><Comment><Title>Title1.2</Title><Description>Description_1.2</Description><Comment><Title>Title1.3</Title><Description>Description_1.3</Description></Comment></Comments></books>

2

Suresh

Mumbai

<books><Comments><Comment><Title>Title2</Title><Description>Description_2</Description></Comment></Comments></books>

3

Vishal

Delhi

<books><Comments><Comment><Title>Title3</Title><Description>Description_3</Description></Comment></Comments></books>

4

Swastik

Bangalore

<books><Comments><Comment><Title>Title4</Title><Description>Description_4</Description></Comment></Comments></books>


Output Data Expected using Spark SQL:
SequenceID

Name

City

Title

Description

1

Amol

Kolhapur

Title1.1

Description_1.1

1

Amol

Kolhapur

Title1.1

Description_1.2

1

Amol

Kolhapur

Title1.3

Description_1.3

2

Suresh

Mumbai

Title2

Description_2

3

Vishal

Delhi

Title3.1

Description_3.1

4

Swastik

Bangalore

Title4

Description_4


I am able to parse single XML using below approach in spark-shell using example 
below but how do we apply the same recursively for all rows ?
https://community.hortonworks.com/questions/71538/parsing-xml-in-spark-rdd.html.

val dfX = 
sqlContext.read.format("com.databricks.spark.xml").option("rowTag","book").load("books.xml")

val xData = dfX.registerTempTable("books")

dfX.printSchema()

val books_inexp =sqlContext.sql("select title,author from books where price<10")

books_inexp.show


Regards,
Amol
This message contains information that may be privileged or confidential and is 
the property of the Capgemini Group. It is intended only for the person to whom 
it is addressed. If you are not the intended recipient, you are not authorized 
to read, print, retain, copy, disseminate, distribute, or use this message or 
any part thereof. If you receive this message in error, please notify the 
sender immediately and delete all copies of this message.

Reply via email to