Re: Spark Streaming Advice

2016-10-10 Thread Jörn Franke
Your file size is too small this has a significant impact on the namenode. Use Hbase or maybe hawq to store small writes. > On 10 Oct 2016, at 16:25, Kevin Mellott wrote: > > Whilst working on this application, I found a setting that drastically > improved the

Re: Spark Streaming Advice

2016-10-10 Thread Kevin Mellott
The batch interval was set to 30 seconds; however, after getting the parquet files to save faster I lowered the interval to 10 seconds. The number of log messages contained in each batch varied from just a few up to around 3500, with the number of partitions ranging from 1 to around 15. I will

Re: Spark Streaming Advice

2016-10-10 Thread Mich Talebzadeh
Hi Kevin, What is the streaming interval (batch interval) above? I do analytics on streaming trade data but after manipulation of individual messages I store the selected on in Hbase. Very fast. HTH Dr Mich Talebzadeh LinkedIn *

Re: Spark Streaming Advice

2016-10-10 Thread Kevin Mellott
Whilst working on this application, I found a setting that drastically improved the performance of my particular Spark Streaming application. I'm sharing the details in hopes that it may help somebody in a similar situation. As my program ingested information into HDFS (as parquet files), I