Is your process at 100% CPU? I suspect you're spending most of your time in JSON deserialization, but profile it and check.
Michael On Friday, January 16, 2015, Roger Hoover <[email protected]> wrote: > Hi guys, > > I'm testing a job that needs to load 40M records (6GB in Kafka as JSON) > from a bootstrap topic. The topic has 4 partitions and I'm running the job > using the ProcessJobFactory so all four tasks are in one container. > > Using RocksDB, it's taking 19 minutes to load all the data which amounts to > 35k records/sec or 5MB/s based on input size. I ran iostat during this > time as see the disk write throughput is 14MB/s. > > I didn't tweak any of the storage settings. > > A few questions: > 1) Does this seem low? I'm running on a Macbook Pro with SSD. > 2) Do you have any recommendations for improving the load speed? > > Thanks, > > Roger >
