Do you filter out message in the nextTuple method and with no tuple sent at one
time it was called?
There's a internal mechanism that storm will sleep 1ms to call nextTuple if
this calling of nextTuple sent nothing.
2014年3月19日 上午7:14于 David Crossland 写道:
>
> Perhaps these screenshots might shed some light? I don't think there is much
> of a latency issue. I'm really starting to suspect there is some consumption
> rate issue from the topic.
>
> I set the spout to a high parallelism value as it did seem to improve
> throughput..
>
> But if there is anything you can spot that would be grand
>
> Thanks
> David
>
> From: Nathan Leung
> Sent: Tuesday, 18 March 2014 21:14
> To: user@storm.incubator.apache.org
>
> It could be bolt 3. What is the latency like between your worker and your
> redis server? Increasing the number of threads for bolt 3 will likely
> increase your throughput. Bolt 1 and 2 are probably CPU bound, but bolt 3 is
> probably restricted by your network access. Also I've found that
> localOrShuffleGrouping can improve performance due to reduced network
> communications.
>
>
> On Tue, Mar 18, 2014 at 3:55 PM, David Crossland
> wrote:
>>
>> A bit more information then
>>
>> There are 4 components
>>
>> Spout - This is reading from an azure service bus topic/subscription. A
>> connection is created in the open() method of the spout, nextTuple does a
>> peek on the message, and invokes the following code;
>>
>> StringWriter writer = new StringWriter();
>> IOUtils.copy(message.getBody(), writer);
>> String messageBody = writer.toString();
>>
>> It then deletes the message from the queue.
>>
>> Overall nothing all that exciting..
>>
>> Bolt 1 - Filtering
>>
>> Parses the message body (json string) and converts it to an object
>> representation. Filters out anything that isn't a monetise message. It
>> then emits the monetise message object to the next bolt. Monetise messages
>> account for ~ 0.03% of the total message volume.
>>
>> Bolt 2 - transformation
>>
>> Basically extracts from the monetise object the values that are interesting
>> and contracts a string which it emits
>>
>> Bolt 3 - Storage
>>
>> Stores the transformed string in Redis using the current date/time as key.
>>
>> -
>>
>> Shuffle grouping is used with the topology
>>
>> I ack every tuple irrespective of whether I emit the tuple or not. It
>> should not be attempting to replay tuple.
>>
>> -
>>
>> I don't think Bolt 2/3 are the cause of the bottleneck. They don't have to
>> process much data at all tbh.
>>
>> I can accept that perhaps there is something inefficient with the spout,
>> perhaps it just can't read from the service bus quickly enough. I will do
>> some more research on this and have a chat with the colleague who wrote this
>> component.
>>
>> I suppose I'm just trying to identify if I've configured something
>> incorrectly with respect to storm, whether I'm correct to relate the total
>> number of executors and tasks to the total number of cores I have available.
>> I find it strange that I get a better throughput when I choose an arbitrary
>> large number for the parallelism hint than if I constrain myself to a
>> maximum that equates to the number of cores.
>>
>> D
>>
>> From: Nathan Leung
>> Sent: Tuesday, 18 March 2014 18:38
>> To: user@storm.incubator.apache.org
>>
>> In my experience storm is able to make good use of CPU resources, if the
>> application is written appropriately. You shouldn't require too much
>> executor parallelism if your application is CPU intensive. If your bolts
>> are doing things like remote DB/NoSQL accesses, then that changes things and
>> parallelizing bolts will give you more throughput. Not knowing your
>> application, the best way to pin down the problem is to simplify your
>> topology. Cut out everything except for the Spout. How is your filtering
>> done? if you return without emitting, the latest versions of storm will
>> sleep before trying again. It may be worthwhile to loop in the spout until
>> you receive a valid message, or the bus is empty. How much throughput can
>> you achieve from the spout, emitting a tuple into the ether? Maybe the
>> problem is your message bus. Once you have achieve a level of performance
>> you are satisfied from the spout, add one bolt. What bottlenecks does the
>> bolt introduce? etc etc.
>>
>>
>> On Tue, Mar 18, 2014 at 2:31 PM, David Crossland
>> wrote:
>>>
>>> Could my issue relate to memory allocated to the JVM? Most of the setting
>>> are pretty much the defaults. Are there any other settings that could be
>>> throttling the topology?
>>>
>>> I'd like to be able to identify the issue without all this constant
>>> “stabbing in the dark”… 😃
>>>
>>> D
>>>
>>> From: David Crossland
>>> Sent: Tuesday, 18 March 2014 16:32
>>> To: user@storm.incubator.apache.org
>>>
>>> Being very new to storm I'