textFileStream and default fileStream recognizes the compressed
xml(.xml.gz) files.
Each line in the xml file is an element in RDD[string].
Then whole RDD is converted to a proper xml format data and stored in a *Scala
variable*.
- I believe storing huge data in a *Scala variable* is
One approach would be, If you are using fileStream you can access the
individual filenames from the partitions and with that filename you can
apply your uncompression logic/parsing logic and get it done.
Like:
UnionPartition upp = (UnionPartition)
ds.values().getPartitions()[i];
Hi All,
Processing streaming JSON files with Spark features (Spark streaming and
Spark SQL), is very efficient and works like a charm.
Below is the code snippet to process JSON files.
windowDStream.foreachRDD(IncomingFiles = {
val IncomingFilesTable =