Hi Cody, I'm going to use an accumulator right now to get an idea of the throughput. Thanks for mentioning the back ported module. Also it looks like I missed this section: https://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#reducing-the-processing-time-of-each-batch from the docs. Then maybe I should try creating multiple streams to get more throughput?
Thanks, Colin Williams On Mon, May 2, 2016 at 6:09 PM, Cody Koeninger <c...@koeninger.org> wrote: > Have you tested for read throughput (without writing to hbase, just > deserialize)? > > Are you limited to using spark 1.2, or is upgrading possible? The > kafka direct stream is available starting with 1.3. If you're stuck > on 1.2, I believe there have been some attempts to backport it, search > the mailing list archives. > > On Mon, May 2, 2016 at 12:54 PM, Colin Kincaid Williams <disc...@uw.edu> > wrote: >> I've written an application to get content from a kafka topic with 1.7 >> billion entries, get the protobuf serialized entries, and insert into >> hbase. Currently the environment that I'm running in is Spark 1.2. >> >> With 8 executors and 2 cores, and 2 jobs, I'm only getting between >> 0-2500 writes / second. This will take much too long to consume the >> entries. >> >> I currently believe that the spark kafka receiver is the bottleneck. >> I've tried both 1.2 receivers, with the WAL and without, and didn't >> notice any large performance difference. I've tried many different >> spark configuration options, but can't seem to get better performance. >> >> I saw 80000 requests / second inserting these records into kafka using >> yarn / hbase / protobuf / kafka in a bulk fashion. >> >> While hbase inserts might not deliver the same throughput, I'd like to >> at least get 10%. >> >> My application looks like >> https://gist.github.com/drocsid/b0efa4ff6ff4a7c3c8bb56767d0b6877 >> >> This is my first spark application. I'd appreciate any assistance. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org