Re: Storm's performance limits to 1000 tuples/sec

2014-06-25 Thread Robert Turner
Serialisation across workers might be your problem, if you can use the
localOrShuffle grouping and arrange that the number of spouts and bolts
is a multiple of the number of workers then this will minimise the
serialisation across workers. If there is only one counting bolt for the
topology then tuples are serialised and sent to the worker with the single
counting bolt. A better approach might be to have a single counting bolt
per worker and aggregate those periodically.

Regards
   Rob Turner.


On 24 June 2014 15:10, jamesw...@yahoo.com.tw wrote:

 Hi all,

 I face a critical problem about performance of my storm topology. I can
 only process 1000 tuples/sec from kafka by kafkaSpout. I use standard storm
 to set my topology(not trident), and my topology information is as follows:
 [Machines]
 I have 1 nimbus and 3 supervisors and each with 2-core CPU in GCE(google
 compute engine)
 Number of workers:12
 Number of executers:51
 [Topology]
 Number of kafkaSpout: 13(fetch 13 topics from kafka brokers)
 Number of Bolts: 12(There are 5 mysql-dumper bolt here)

 KafkaSpout(topic) emits to boltA and boltB
 boltA(parallelism=9): parse the avro tuple from kafkaSpout
 boltB(parallelism=1): Counting number of bolt only

 Ifound sometimes boltA's capacity is 1 or above in storm UI, and my 5
 mysql-dumper bolt's execute latency is more than 300ms(other bolts are less
 than 10ms). In addition, my complete latency of these kafkaspouts is more
 than 2000ms in the beggining, but it drops to 1000ms after a while.

 I found this topology can only process 1000 tuples/s or less, but my goal
 is to process 1 tuples/s. Is any wrong of my topology config? Actually,
 my topology is doing simple thing like counting and dumping to mysql only.
 It seems storm not to have a good performance as it says(million of tuples
 in a second in 10-node cluster). Can anyone give me some suggestion?

 Thanks a lot.

 Best regards,
 James




-- 
Cheers
   Rob.


Re: How/where to specify Workers JMX port?

2014-05-16 Thread Robert Turner
The range of worker ports is defined in the storm.yaml as follows:

supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703

Cheers
   Rob.

On 8 May 2014 15:03, Otis Gospodnetic otis.gospodne...@gmail.com wrote:

 Hi,

 Where/how does one specify the JMX port range Storm should use for Workers?

 I'm trying to document that here:

 https://sematext.atlassian.net/wiki/display/PUBSPM/SPM+Monitor+-+Standalone#SPMMonitor-Standalone-Storm

 All examples of Worker JMX port ranges just show
 -Dcom.sun.management.jmxremote.port=1%ID%, and ports 16700, 16701... but
 I can't find where the range is defined.
 e.g.
 what if I want to use ports 26700, 26701..., or
 what if I want to use ports 1, 10001

 Where/how would I specify that?

 Thanks,
 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics
 Solr  Elasticsearch Support * http://sematext.com/




-- 
Cheers
   Rob.


Re: [VOTE] Storm Logo Contest - Round 1

2014-05-16 Thread Robert Turner
#6 - 5 pts

Rob Turner.


On 15 May 2014 17:28, P. Taylor Goetz ptgo...@gmail.com wrote:

 This is a call to vote on selecting the top 3 Storm logos from the 11
 entries received. This is the first of two rounds of voting. In the first
 round the top 3 entries will be selected to move onto the second round
 where the winner will be selected.

 The entries can be viewed on the storm website here:

 http://storm.incubator.apache.org/blog.html

 VOTING

 Each person can cast a single vote. A vote consists of 5 points that can
 be divided among multiple entries. To vote, list the entry number, followed
 by the number of points assigned. For example:

 #1 - 2 pts.
 #2 - 1 pt.
 #3 - 2 pts.

 Votes cast by PPMC members are considered binding, but voting is open to
 anyone.

 This vote will be open until Thursday, May 22 11:59 PM UTC.

 - Taylor




-- 
Cheers
   Rob.


Trident localOrShuffle

2014-02-24 Thread Robert Turner
This appears to be missing from Stream, is there any way to specify this or
is there a reason why this is not possible with Trident.

-- 
Cheers
   Rob.


Re: Trident localOrShuffle

2014-02-24 Thread Robert Turner
Thanks Tom for the quick reply.

I need a single spout, based on IRichSpout, and parallelism 4 for my
Trident functions per worker. If I do not do a shuffle the spout
parallelism is set to 4 as well. It seems that I need the shuffle to have
separate parallelism for the spout and functions. I would like to avoid
serialization so localOrShuffle would be ideal, but not apparently an
option in Trident.


On 24 February 2014 22:17, Tom Brown tombrow...@gmail.com wrote:

 On trident, all operations happen locally unless you explicitly
 repartition (shuffle, partition by key, etc)


 On Monday, February 24, 2014, Robert Turner rob...@bigfoot.com wrote:

 This appears to be missing from Stream, is there any way to specify this
 or is there a reason why this is not possible with Trident.

 --
 Cheers
Rob.




-- 
Cheers
   Rob.