Re: Storm's performance limits to 1000 tuples/sec
Serialisation across workers might be your problem, if you can use the localOrShuffle grouping and arrange that the number of spouts and bolts is a multiple of the number of workers then this will minimise the serialisation across workers. If there is only one counting bolt for the topology then tuples are serialised and sent to the worker with the single counting bolt. A better approach might be to have a single counting bolt per worker and aggregate those periodically. Regards Rob Turner. On 24 June 2014 15:10, jamesw...@yahoo.com.tw wrote: Hi all, I face a critical problem about performance of my storm topology. I can only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm to set my topology(not trident), and my topology information is as follows: [Machines] I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google compute engine) Number of workers:12 Number of executers:51 [Topology] Number of kafkaSpout: 13(fetch 13 topics from kafka brokers) Number of Bolts: 12(There are 5 mysql-dumper bolt here) KafkaSpout(topic) emits to boltA and boltB boltA(parallelism=9): parse the avro tuple from kafkaSpout boltB(parallelism=1): Counting number of bolt only Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5 mysql-dumper bolt's execute latency is more than 300ms(other bolts are less than 10ms). In addition, my complete latency of these kafkaspouts is more than 2000ms in the beggining, but it drops to 1000ms after a while. I found this topology can only process 1000 tuples/s or less, but my goal is to process 1 tuples/s. Is any wrong of my topology config? Actually, my topology is doing simple thing like counting and dumping to mysql only. It seems storm not to have a good performance as it says(million of tuples in a second in 10-node cluster). Can anyone give me some suggestion? Thanks a lot. Best regards, James -- Cheers Rob.
Re: How/where to specify Workers JMX port?
The range of worker ports is defined in the storm.yaml as follows: supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703 Cheers Rob. On 8 May 2014 15:03, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Where/how does one specify the JMX port range Storm should use for Workers? I'm trying to document that here: https://sematext.atlassian.net/wiki/display/PUBSPM/SPM+Monitor+-+Standalone#SPMMonitor-Standalone-Storm All examples of Worker JMX port ranges just show -Dcom.sun.management.jmxremote.port=1%ID%, and ports 16700, 16701... but I can't find where the range is defined. e.g. what if I want to use ports 26700, 26701..., or what if I want to use ports 1, 10001 Where/how would I specify that? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ -- Cheers Rob.
Re: [VOTE] Storm Logo Contest - Round 1
#6 - 5 pts Rob Turner. On 15 May 2014 17:28, P. Taylor Goetz ptgo...@gmail.com wrote: This is a call to vote on selecting the top 3 Storm logos from the 11 entries received. This is the first of two rounds of voting. In the first round the top 3 entries will be selected to move onto the second round where the winner will be selected. The entries can be viewed on the storm website here: http://storm.incubator.apache.org/blog.html VOTING Each person can cast a single vote. A vote consists of 5 points that can be divided among multiple entries. To vote, list the entry number, followed by the number of points assigned. For example: #1 - 2 pts. #2 - 1 pt. #3 - 2 pts. Votes cast by PPMC members are considered binding, but voting is open to anyone. This vote will be open until Thursday, May 22 11:59 PM UTC. - Taylor -- Cheers Rob.
Trident localOrShuffle
This appears to be missing from Stream, is there any way to specify this or is there a reason why this is not possible with Trident. -- Cheers Rob.
Re: Trident localOrShuffle
Thanks Tom for the quick reply. I need a single spout, based on IRichSpout, and parallelism 4 for my Trident functions per worker. If I do not do a shuffle the spout parallelism is set to 4 as well. It seems that I need the shuffle to have separate parallelism for the spout and functions. I would like to avoid serialization so localOrShuffle would be ideal, but not apparently an option in Trident. On 24 February 2014 22:17, Tom Brown tombrow...@gmail.com wrote: On trident, all operations happen locally unless you explicitly repartition (shuffle, partition by key, etc) On Monday, February 24, 2014, Robert Turner rob...@bigfoot.com wrote: This appears to be missing from Stream, is there any way to specify this or is there a reason why this is not possible with Trident. -- Cheers Rob. -- Cheers Rob.