Is your process at 100% CPU? I suspect you're spending most of your time in
JSON deserialization, but profile it and check.

Michael

On Friday, January 16, 2015, Roger Hoover <[email protected]> wrote:

> Hi guys,
>
> I'm testing a job that needs to load 40M records (6GB in Kafka as JSON)
> from a bootstrap topic.  The topic has 4 partitions and I'm running the job
> using the ProcessJobFactory so all four tasks are in one container.
>
> Using RocksDB, it's taking 19 minutes to load all the data which amounts to
> 35k records/sec or 5MB/s based on input size.  I ran iostat during this
> time as see the disk write throughput is 14MB/s.
>
> I didn't tweak any of the storage settings.
>
> A few questions:
> 1) Does this seem low?  I'm running on a Macbook Pro with SSD.
> 2) Do you have any recommendations for improving the load speed?
>
> Thanks,
>
> Roger
>

Reply via email to