Storm performing very slow

2014-09-22 Thread Kushan Maskey
I am trying to load 20 M records into Cassandra database through
Kafka-Storm. I am able to post all the data in 5 mins into Kafka. But
reading it from storm and inserting into Cassandra, Couch and Solr is kind
of very slow. It has been running for past 5 hours and so far only 2
Million records.

How do I make the storm perform faster? Coz in this pace it will take
couple of days to load all the data.

--
Kushan Maskey


Re: Storm performing very slow

2014-09-22 Thread Michael Rose
Storm is not your bottleneck. Check your Storm code to 1) ensure you're
parallelizing your writes and 2) you're batching writes to your external
resources if possible. Some quick napkin math shows you only doing 110
writes/s, which seems awfully low.

Michael Rose (@Xorlev )
Senior Platform Engineer, FullContact 
mich...@fullcontact.com

On Mon, Sep 22, 2014 at 8:05 PM, Kushan Maskey <
kushan.mas...@mmillerassociates.com> wrote:

> I am trying to load 20 M records into Cassandra database through
> Kafka-Storm. I am able to post all the data in 5 mins into Kafka. But
> reading it from storm and inserting into Cassandra, Couch and Solr is kind
> of very slow. It has been running for past 5 hours and so far only 2
> Million records.
>
> How do I make the storm perform faster? Coz in this pace it will take
> couple of days to load all the data.
>
> --
> Kushan Maskey
>
>


Re: Storm performing very slow

2014-09-22 Thread Kushan Maskey
Here is my storm config.


storm.config.setMaxTaskParallelism=4

storm.config.setNumWorkers=20

storm.config.setMaxSpoutPending=5000

storm.config.numAckers=1000


I am guessing I need to increase the maxTaskParallelism more. IF that is
the case how much would you suggest? Any  help will be highly appreciated.


Thanks.

--
Kushan Maskey
817.403.7500

On Mon, Sep 22, 2014 at 9:20 PM, Michael Rose 
wrote:

> Storm is not your bottleneck. Check your Storm code to 1) ensure you're
> parallelizing your writes and 2) you're batching writes to your external
> resources if possible. Some quick napkin math shows you only doing 110
> writes/s, which seems awfully low.
>
> Michael Rose (@Xorlev )
> Senior Platform Engineer, FullContact 
> mich...@fullcontact.com
>
> On Mon, Sep 22, 2014 at 8:05 PM, Kushan Maskey <
> kushan.mas...@mmillerassociates.com> wrote:
>
>> I am trying to load 20 M records into Cassandra database through
>> Kafka-Storm. I am able to post all the data in 5 mins into Kafka. But
>> reading it from storm and inserting into Cassandra, Couch and Solr is kind
>> of very slow. It has been running for past 5 hours and so far only 2
>> Million records.
>>
>> How do I make the storm perform faster? Coz in this pace it will take
>> couple of days to load all the data.
>>
>> --
>> Kushan Maskey
>>
>>
>


Re: Storm performing very slow

2014-09-22 Thread Kushan Maskey
Below is my Topology configuration and Topology status bases on the
configuration. Can anyone help me how to optimize the storm for faster
process of the 20 Million data?

Topology statsWindowEmittedTransferredComplete latency (ms)AckedFailed10m 0s

11142011142047305.48828610003h 0m 0s

11142011142047305.48828610001d 0h 0m 0s

11142011142047305.4882861000All time

11142011142047305.4882861000

Topology ConfigurationKeyValuedev.zookeeper.path/tmp/dev-storm-zookeeper
drpc.childopts-Xmx768mdrpc.invocations.port3773drpc.port3772drpc.queue.size
128drpc.request.timeout.secs600drpc.worker.threads64java.library.path
/usr/local/lib:/opt/local/lib:/usr/liblogviewer.appender.nameA1
logviewer.childopts-Xmx128mlogviewer.port8000nimbus.childopts-Xmx1024m
nimbus.cleanup.inbox.freq.secs600nimbus.file.copy.expiration.secs600
nimbus.hostmystormservernimbus.inbox.jar.expiration.secs3600
nimbus.monitor.freq.secs10nimbus.reassigntruenimbus.supervisor.timeout.secs
60nimbus.task.launch.secs120nimbus.task.timeout.secs30
nimbus.thrift.max_buffer_size1048576nimbus.thrift.port6627
nimbus.topology.validatorbacktype.storm.nimbus.DefaultTopologyValidator
storm.cluster.modedistributedstorm.config.properties[object Object]storm.id
CEXPStormTopology-1-1411442050storm.local.dir/data/disk00/storm/localdir
storm.local.mode.zmqfalsestorm.messaging.netty.buffer_size5242880
storm.messaging.netty.client_worker_threads1
storm.messaging.netty.flush.check.interval.ms10
storm.messaging.netty.max_retries30storm.messaging.netty.max_wait_ms1000
storm.messaging.netty.min_wait_ms100
storm.messaging.netty.server_worker_threads1
storm.messaging.netty.transfer.batch.size262144storm.messaging.transport
backtype.storm.messaging.netty.Contextstorm.thrift.transport
backtype.storm.security.auth.SimpleTransportPlugin
storm.zookeeper.connection.timeout15000storm.zookeeper.port2181
storm.zookeeper.retry.interval1000
storm.zookeeper.retry.intervalceiling.millis3storm.zookeeper.retry.times
5storm.zookeeper.root/stormstorm.zookeeper.serversmystormserver
storm.zookeeper.session.timeout2supervisor.childopts-Xmx256m
supervisor.enabletruesupervisor.heartbeat.frequency.secs5
supervisor.monitor.frequency.secs3supervisor.slots.ports
6700,6701,6702,6703,6704,6705,6706,6707,6708,6709,6710,6711,6712,6713,6714,6715,6716,6717,6718,6719,6720,6721,6722,6723,6724,6725,6726,6727,6728
supervisor.worker.start.timeout.secs120supervisor.worker.timeout.secs30
task.heartbeat.frequency.secs3task.refresh.poll.secs10
topology.acker.executors1000topology.builtin.metrics.bucket.size.secs60
topology.debugtruetopology.disruptor.wait.strategy
com.lmax.disruptor.BlockingWaitStrategytopology.enable.message.timeoutstrue
topology.error.throttle.interval.secs10topology.executor.receive.buffer.size
65536topology.executor.send.buffer.size65536
topology.fall.back.on.java.serializationtruetopology.kryo.decorators
topology.kryo.factorybacktype.storm.serialization.DefaultKryoFactory
topology.kryo.register[object Object]topology.max.error.report.per.interval5
topology.max.spout.pending5000topology.max.task.parallelism100
topology.message.timeout.secs60topology.multilang.serializer
backtype.storm.multilang.JsonSerializertopology.nameCEXPStormTopology
topology.receiver.buffer.size8topology.skip.missing.kryo.registrationsfalse
topology.sleep.spout.wait.strategy.time.ms1topology.spout.wait.strategy
backtype.storm.spout.SleepSpoutWaitStrategy
topology.state.synchronization.timeout.secs60topology.stats.sample.rate0.05
topology.taskstopology.tick.tuple.freq.secstopology.transfer.buffer.size32
topology.trident.batch.emit.interval.millis500topology.tuple.serializer
backtype.storm.serialization.types.ListDelegateSerializer
topology.worker.childoptstopology.worker.receiver.thread.count1
topology.worker.shared.thread.pool.size4topology.workers20
transactional.zookeeper.porttransactional.zookeeper.root/transactional
transactional.zookeeper.serversui.childopts-Xmx768mui.port8080
worker.childopts-Xmx1024mworker.heartbeat.frequency.secs1zmq.hwm0
zmq.linger.millis5000zmq.threads1


--
Kushan Maskey
817.403.7500

On Mon, Sep 22, 2014 at 9:25 PM, Kushan Maskey <
kushan.mas...@mmillerassociates.com> wrote:

> Here is my storm config.
>
>
> storm.config.setMaxTaskParallelism=4
>
> storm.config.setNumWorkers=20
>
> storm.config.setMaxSpoutPending=5000
>
> storm.config.numAckers=1000
>
>
> I am guessing I need to increase the maxTaskParallelism more. IF that is
> the case how much would you suggest? Any  help will be highly appreciated.
>
>
> Thanks.
>
> --
> Kushan Maskey
> 817.403.7500
>
> On Mon, Sep 22, 2014 at 9:20 PM, Michael Rose 
> wrote:
>
>> Storm is not your bottleneck.

Re: Storm performing very slow

2014-09-22 Thread Tom Brown
The screen shows a complete Latency of 47 seconds. That is really high. Is
there a screen that shows the performance/capacity of each bolt?

--Tom

On Mon, Sep 22, 2014 at 9:27 PM, Kushan Maskey <
kushan.mas...@mmillerassociates.com> wrote:

> Below is my Topology configuration and Topology status bases on the
> configuration. Can anyone help me how to optimize the storm for faster
> process of the 20 Million data?
>
> Topology statsWindowEmittedTransferredComplete latency (ms)AckedFailed10m
> 0s
> 
> 11142011142047305.48828610003h 0m 0s
> 
> 11142011142047305.48828610001d 0h 0m 0s
> 
> 11142011142047305.4882861000All time
> 
> 11142011142047305.4882861000
>
> Topology ConfigurationKeyValuedev.zookeeper.path/tmp/dev-storm-zookeeper
> drpc.childopts-Xmx768mdrpc.invocations.port3773drpc.port3772
> drpc.queue.size128drpc.request.timeout.secs600drpc.worker.threads64
> java.library.path/usr/local/lib:/opt/local/lib:/usr/lib
> logviewer.appender.nameA1logviewer.childopts-Xmx128mlogviewer.port8000
> nimbus.childopts-Xmx1024mnimbus.cleanup.inbox.freq.secs600
> nimbus.file.copy.expiration.secs600nimbus.hostmystormserver
> nimbus.inbox.jar.expiration.secs3600nimbus.monitor.freq.secs10
> nimbus.reassigntruenimbus.supervisor.timeout.secs60nimbus.task.launch.secs
> 120nimbus.task.timeout.secs30nimbus.thrift.max_buffer_size1048576
> nimbus.thrift.port6627nimbus.topology.validator
> backtype.storm.nimbus.DefaultTopologyValidatorstorm.cluster.mode
> distributedstorm.config.properties[object Object]storm.id
> CEXPStormTopology-1-1411442050storm.local.dir/data/disk00/storm/localdir
> storm.local.mode.zmqfalsestorm.messaging.netty.buffer_size5242880
> storm.messaging.netty.client_worker_threads1
> storm.messaging.netty.flush.check.interval.ms10
> storm.messaging.netty.max_retries30storm.messaging.netty.max_wait_ms1000
> storm.messaging.netty.min_wait_ms100
> storm.messaging.netty.server_worker_threads1
> storm.messaging.netty.transfer.batch.size262144storm.messaging.transport
> backtype.storm.messaging.netty.Contextstorm.thrift.transport
> backtype.storm.security.auth.SimpleTransportPlugin
> storm.zookeeper.connection.timeout15000storm.zookeeper.port2181
> storm.zookeeper.retry.interval1000
> storm.zookeeper.retry.intervalceiling.millis3
> storm.zookeeper.retry.times5storm.zookeeper.root/storm
> storm.zookeeper.serversmystormserverstorm.zookeeper.session.timeout2
> supervisor.childopts-Xmx256msupervisor.enabletrue
> supervisor.heartbeat.frequency.secs5supervisor.monitor.frequency.secs3
> supervisor.slots.ports
> 6700,6701,6702,6703,6704,6705,6706,6707,6708,6709,6710,6711,6712,6713,6714,6715,6716,6717,6718,6719,6720,6721,6722,6723,6724,6725,6726,6727,6728
> supervisor.worker.start.timeout.secs120supervisor.worker.timeout.secs30
> task.heartbeat.frequency.secs3task.refresh.poll.secs10
> topology.acker.executors1000topology.builtin.metrics.bucket.size.secs60
> topology.debugtruetopology.disruptor.wait.strategy
> com.lmax.disruptor.BlockingWaitStrategytopology.enable.message.timeouts
> truetopology.error.throttle.interval.secs10
> topology.executor.receive.buffer.size65536
> topology.executor.send.buffer.size65536
> topology.fall.back.on.java.serializationtruetopology.kryo.decorators
> topology.kryo.factorybacktype.storm.serialization.DefaultKryoFactory
> topology.kryo.register[object Object]
> topology.max.error.report.per.interval5topology.max.spout.pending5000
> topology.max.task.parallelism100topology.message.timeout.secs60
> topology.multilang.serializerbacktype.storm.multilang.JsonSerializer
> topology.nameCEXPStormTopologytopology.receiver.buffer.size8
> topology.skip.missing.kryo.registrationsfalse
> topology.sleep.spout.wait.strategy.time.ms1topology.spout.wait.strategy
> backtype.storm.spout.SleepSpoutWaitStrategy
> topology.state.synchronization.timeout.secs60topology.stats.sample.rate
> 0.05topology.taskstopology.tick.tuple.freq.secs
> topology.transfer.buffer.size32topology.trident.batch.emit.interval.millis
> 500topology.tuple.serializer
> backtype.storm.serialization.types.ListDelegateSerializer
> topology.worker.childoptstopology.worker.receiver.thread.count1
> topology.worker.shared.thread.pool.size4topology.workers20
> transactional.zookeeper.porttransactional.zookeeper.root/transactional
> transactional.zookeeper.serversui.childopts-Xmx768mui.port8080
> worker.childopts-Xmx1024mworker.heartbeat.frequency.secs1zmq.hwm0
> zmq.linger.millis5000zmq.threads1
>
>
> --
> Kushan Maskey
> 817.403.7500
>
> On Mon, Sep 22, 2014 at 9:25 PM, Kushan Maskey <
> kushan.mas...@mmillerassociates.com> wrote:
>
>> Here is my storm config.
>>
>>
>> storm.config.setMaxTaskParallel

Re: Storm performing very slow

2014-09-22 Thread Kushan Maskey
That is what is causing the storm to perform very slow data read and
process. And I am not sure what is causing it to be that slow.

--
Kushan Maskey
817.403.7500

On Mon, Sep 22, 2014 at 10:30 PM, Tom Brown  wrote:

> The screen shows a complete Latency of 47 seconds. That is really high. Is
> there a screen that shows the performance/capacity of each bolt?
>
> --Tom
>
> On Mon, Sep 22, 2014 at 9:27 PM, Kushan Maskey <
> kushan.mas...@mmillerassociates.com> wrote:
>
>> Below is my Topology configuration and Topology status bases on the
>> configuration. Can anyone help me how to optimize the storm for faster
>> process of the 20 Million data?
>>
>> Topology statsWindowEmittedTransferredComplete latency (ms)AckedFailed10m
>> 0s
>> 
>> 11142011142047305.48828610003h 0m 0s
>> 
>> 11142011142047305.48828610001d 0h 0m 0s
>> 
>> 11142011142047305.4882861000All time
>> 
>> 11142011142047305.4882861000
>>
>> Topology ConfigurationKeyValuedev.zookeeper.path/tmp/dev-storm-zookeeper
>> drpc.childopts-Xmx768mdrpc.invocations.port3773drpc.port3772
>> drpc.queue.size128drpc.request.timeout.secs600drpc.worker.threads64
>> java.library.path/usr/local/lib:/opt/local/lib:/usr/lib
>> logviewer.appender.nameA1logviewer.childopts-Xmx128mlogviewer.port8000
>> nimbus.childopts-Xmx1024mnimbus.cleanup.inbox.freq.secs600
>> nimbus.file.copy.expiration.secs600nimbus.hostmystormserver
>> nimbus.inbox.jar.expiration.secs3600nimbus.monitor.freq.secs10
>> nimbus.reassigntruenimbus.supervisor.timeout.secs60
>> nimbus.task.launch.secs120nimbus.task.timeout.secs30
>> nimbus.thrift.max_buffer_size1048576nimbus.thrift.port6627
>> nimbus.topology.validatorbacktype.storm.nimbus.DefaultTopologyValidator
>> storm.cluster.modedistributedstorm.config.properties[object Object]
>> storm.idCEXPStormTopology-1-1411442050storm.local.dir
>> /data/disk00/storm/localdirstorm.local.mode.zmqfalse
>> storm.messaging.netty.buffer_size5242880
>> storm.messaging.netty.client_worker_threads1
>> storm.messaging.netty.flush.check.interval.ms10
>> storm.messaging.netty.max_retries30storm.messaging.netty.max_wait_ms1000
>> storm.messaging.netty.min_wait_ms100
>> storm.messaging.netty.server_worker_threads1
>> storm.messaging.netty.transfer.batch.size262144storm.messaging.transport
>> backtype.storm.messaging.netty.Contextstorm.thrift.transport
>> backtype.storm.security.auth.SimpleTransportPlugin
>> storm.zookeeper.connection.timeout15000storm.zookeeper.port2181
>> storm.zookeeper.retry.interval1000
>> storm.zookeeper.retry.intervalceiling.millis3
>> storm.zookeeper.retry.times5storm.zookeeper.root/storm
>> storm.zookeeper.serversmystormserverstorm.zookeeper.session.timeout2
>> supervisor.childopts-Xmx256msupervisor.enabletrue
>> supervisor.heartbeat.frequency.secs5supervisor.monitor.frequency.secs3
>> supervisor.slots.ports
>> 6700,6701,6702,6703,6704,6705,6706,6707,6708,6709,6710,6711,6712,6713,6714,6715,6716,6717,6718,6719,6720,6721,6722,6723,6724,6725,6726,6727,6728
>> supervisor.worker.start.timeout.secs120supervisor.worker.timeout.secs30
>> task.heartbeat.frequency.secs3task.refresh.poll.secs10
>> topology.acker.executors1000topology.builtin.metrics.bucket.size.secs60
>> topology.debugtruetopology.disruptor.wait.strategy
>> com.lmax.disruptor.BlockingWaitStrategytopology.enable.message.timeouts
>> truetopology.error.throttle.interval.secs10
>> topology.executor.receive.buffer.size65536
>> topology.executor.send.buffer.size65536
>> topology.fall.back.on.java.serializationtruetopology.kryo.decorators
>> topology.kryo.factorybacktype.storm.serialization.DefaultKryoFactory
>> topology.kryo.register[object Object]
>> topology.max.error.report.per.interval5topology.max.spout.pending5000
>> topology.max.task.parallelism100topology.message.timeout.secs60
>> topology.multilang.serializerbacktype.storm.multilang.JsonSerializer
>> topology.nameCEXPStormTopologytopology.receiver.buffer.size8
>> topology.skip.missing.kryo.registrationsfalse
>> topology.sleep.spout.wait.strategy.time.ms1topology.spout.wait.strategy
>> backtype.storm.spout.SleepSpoutWaitStrategy
>> topology.state.synchronization.timeout.secs60topology.stats.sample.rate
>> 0.05topology.taskstopology.tick.tuple.freq.secs
>> topology.transfer.buffer.size32
>> topology.trident.batch.emit.interval.millis500topology.tuple.serializer
>> backtype.storm.serialization.types.ListDelegateSerializer
>> topology.worker.childoptstopology.worker.receiver.thread.count1
>> topology.worker.shared.thread.pool.size4topology.workers20
>> transactional.zookeeper.porttransactional.zookeeper.root/transactional
>> transactional.zookeeper.serversui.childopts-Xmx768mui.port8080
>> wo