I'm wondering the reason, why simple Spark prog. reading streaming data from 
Kafka and writing result to Kudu, has unpredictable write times. In most cases, 
when running the prog, write times are systematically 4 sec regardless of the 
number of messages (anything from 50 to 2000 messages per batch). But 
occasionally when starting the prog, it runs substantially faster where write 
times are below 0,5 sec with exactly same code base, settings etc.

Our environment is plain AWS cluster with 3 slaves where each slave has Kafka 
and Kudu tablet server instance with CDH 5.10 & Kudu 1.2  & Spark 1.6. 

Any hints what to look at?

cheers,
-jan

Reply via email to