Your file size is too small this has a significant impact on the namenode. Use
Hbase or maybe hawq to store small writes.
> On 10 Oct 2016, at 16:25, Kevin Mellott wrote:
>
> Whilst working on this application, I found a setting that drastically
> improved the
The batch interval was set to 30 seconds; however, after getting the
parquet files to save faster I lowered the interval to 10 seconds. The
number of log messages contained in each batch varied from just a few up to
around 3500, with the number of partitions ranging from 1 to around 15.
I will
Hi Kevin,
What is the streaming interval (batch interval) above?
I do analytics on streaming trade data but after manipulation of individual
messages I store the selected on in Hbase. Very fast.
HTH
Dr Mich Talebzadeh
LinkedIn *
Whilst working on this application, I found a setting that drastically
improved the performance of my particular Spark Streaming application. I'm
sharing the details in hopes that it may help somebody in a similar
situation.
As my program ingested information into HDFS (as parquet files), I