This is more a question about spark-xml, which is not part of Spark.
You can ask at https://github.com/databricks/spark-xml/ but if you do
please show some example of the XML input and schema and output.
On Tue, Jun 30, 2020 at 11:39 AM mars76 wrote:
>
> Hi,
>
> I am trying to read XML data from a Kafka topic and using XmlReader to
> convert the RDD[String] into a DataFrame conforming to predefined Schema.
>
> One issue i am running into is after saving the final Data Frame to AVRO
> format most of the elements data is showing up in avro files. How ever the
> nested Element which is of Array Type is not getting parsed properly and
> getting loaded as null into the DF and hence when i save it to avro or to
> json that field is always null.
>
> Not sure why this element is not getting parsed.
>
>
> Here is the code i am using
>
>
> kafkaValueAsStringDF = kafakDF.selectExpr("CAST(key AS STRING)
> msgKey","CAST(value AS STRING) xmlString")
>
> var parameters = collection.mutable.Map.empty[String, String]
>
> parameters.put("rowTag", "Book")
>
> kafkaValueAsStringDF.writeStream.foreachBatch {
> (batchDF: DataFrame, batchId: Long) =>
>
> val xmlStringDF:DataFrame = batchDF.selectExpr("xmlString")
>
> xmlStringDF.printSchema()
>
> val rdd: RDD[String] = xmlStringDF.as[String].rdd
>
>
> val relation = XmlRelation(
> () => rdd,
> None,
> parameters.toMap,
> xmlSchema)(spark.sqlContext)
>
>
> logger.info(".convert() : XmlRelation Schema ={}
> "+relation.schema.treeString)
>
> }
> .start()
> .awaitTermination()
>
>
> Thanks
> Sateesh
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org