subject:"Spark Streaming Advice"

Re: Spark Streaming Advice

2016-10-10 Thread Jörn Franke

Your file size is too small this has a significant impact on the namenode. Use Hbase or maybe hawq to store small writes. > On 10 Oct 2016, at 16:25, Kevin Mellott wrote: > > Whilst working on this application, I found a setting that drastically > improved the

Re: Spark Streaming Advice

2016-10-10 Thread Kevin Mellott

The batch interval was set to 30 seconds; however, after getting the parquet files to save faster I lowered the interval to 10 seconds. The number of log messages contained in each batch varied from just a few up to around 3500, with the number of partitions ranging from 1 to around 15. I will

Re: Spark Streaming Advice

2016-10-10 Thread Mich Talebzadeh

Hi Kevin, What is the streaming interval (batch interval) above? I do analytics on streaming trade data but after manipulation of individual messages I store the selected on in Hbase. Very fast. HTH Dr Mich Talebzadeh LinkedIn *

Re: Spark Streaming Advice

2016-10-10 Thread Kevin Mellott

Whilst working on this application, I found a setting that drastically improved the performance of my particular Spark Streaming application. I'm sharing the details in hopes that it may help somebody in a similar situation. As my program ingested information into HDFS (as parquet files), I

Re: Spark Streaming Advice

Re: Spark Streaming Advice

Re: Spark Streaming Advice

Re: Spark Streaming Advice

4 matches

Site Navigation

Mail list logo

Footer information