Re: Measuring cluster utilization of a streaming job

2017-11-14 Thread Teemu Heikkilä
Without knowing anything about your pipeline the best estimate of the resources needed is to run the job with same ingestion rate as the normal production load. With kafka you can enable back pressure so with high load also your latency will just increase but you don’t have to have capacity for

Measuring cluster utilization of a streaming job

2017-11-14 Thread Nadeem Lalani
Hi, I was wondering if anyone has done some work around measuring the cluster resource utilization of a "typical" spark streaming job. We are trying to build a message ingestion system which will read from Kafka and do some processing. We have had some concerns raised in the team that a 24*7