Hey Anh, For a simple read/write StreamTask that has little logic in it, you should be able to get 10,000+ messages/sec per-container with a 1kb msg payload when talking to a remote Kafka broker.
At first glance, setting a batch size of 1 with a sync producer will definitely slow down your task, especially if num.acks is set to a number other than zero. Could you please post your job config file, and your code (if that's OK)? Cheers, Chris On 3/28/14 8:00 AM, "Anh Thu Vu" <[email protected]> wrote: >I forgot to clarify this. My application is a simple pipeline of 2 jobs: >The first one reads from a file and write to a kafka topic. >The second reads from that kafka topic. > >The measured throughput is done in the second job (get timestamp when >receive the 1st, 1000th,... message) > >Casey > > >On Fri, Mar 28, 2014 at 3:56 PM, Anh Thu Vu <[email protected]> wrote: > >> Hi guys, >> >> I'm running my application on both local and on a small cluster of 5 >>nodes >> (each with 2GB RAM, 1 core, connected via normal Ethernet - I think) and >> the observed throughput seems very slow. >> >> Do you have any idea about an expected throughput when run with one >> 7200RPM harddrive? >> My estimated throughput is about 1000 messages per second. Each message >>is >> slightly more than 1kB, kafka batchsize = 1, sync producer. >> >> When I try with async producer, with different batchsize, there can be a >> slight improvement. >> >> The config for the job has only the essential properties. >> >> Any suggestion? Could I misconfigure something? >> >> Casey >>
