ack in downstream when using all grouping method

2016-12-19 Thread Xunyun Liu
Hi there,

As some grouping methods allow sending multiple copies of emitted data to
downstream bolt instances, I was wondering what will happen if any one of
them is not able to ack the tuple due to failures. The intrinsic question
is that, when the all grouping method is used, whether the recipients are
receiving the exact the same tuple or just duplications with different
tuple IDs. In the latter case, I believe the tuple tree is expanded with
regard to the number of parallelisms in downstream and each task has to
invoke ack() for the root tuple to be fully processed.

Any idea is much appreciated.


-- 
Best Regards.
==
Xunyun Liu
​​


Re: ack in downstream when using all grouping method

2016-12-19 Thread Ambud Sharma
Yes that is correct. All downstream tuples must be processed for the root
tuple to be acknowledged.

Type of grouping does not change the acking behavior.

On Dec 19, 2016 3:53 PM, "Xunyun Liu"  wrote:

> Hi there,
>
> As some grouping methods allow sending multiple copies of emitted data to
> downstream bolt instances, I was wondering what will happen if any one of
> them is not able to ack the tuple due to failures. The intrinsic question
> is that, when the all grouping method is used, whether the recipients are
> receiving the exact the same tuple or just duplications with different
> tuple IDs. In the latter case, I believe the tuple tree is expanded with
> regard to the number of parallelisms in downstream and each task has to
> invoke ack() for the root tuple to be fully processed.
>
> Any idea is much appreciated.
>
>
> --
> Best Regards.
> ==
> Xunyun Liu
> ​​
>
>


Re: ack in downstream when using all grouping method

2016-12-19 Thread Ambud Sharma
Forgot to answer your specific question. Storm message id is internal and
will be different so you will see a duplicate tuple with a different id.

On Dec 19, 2016 3:59 PM, "Ambud Sharma"  wrote:

> Yes that is correct. All downstream tuples must be processed for the root
> tuple to be acknowledged.
>
> Type of grouping does not change the acking behavior.
>
> On Dec 19, 2016 3:53 PM, "Xunyun Liu"  wrote:
>
>> Hi there,
>>
>> As some grouping methods allow sending multiple copies of emitted data to
>> downstream bolt instances, I was wondering what will happen if any one of
>> them is not able to ack the tuple due to failures. The intrinsic question
>> is that, when the all grouping method is used, whether the recipients are
>> receiving the exact the same tuple or just duplications with different
>> tuple IDs. In the latter case, I believe the tuple tree is expanded with
>> regard to the number of parallelisms in downstream and each task has to
>> invoke ack() for the root tuple to be fully processed.
>>
>> Any idea is much appreciated.
>>
>>
>> --
>> Best Regards.
>> ==
>> Xunyun Liu
>> ​​
>>
>>


Re: ack in downstream when using all grouping method

2016-12-19 Thread Xunyun Liu
Thank you for your answer, Ambud. My use case is that only some of the bolt
instances are critical that I need them responding to the signal through
proper acknowledgment. However, the rest of them are non-critical which are
preferably not to interfere the normal ack process, much like receiving an
unanchored tuple. Is there any way that I can achieve this?

On 20 December 2016 at 11:01, Ambud Sharma  wrote:

> Forgot to answer your specific question. Storm message id is internal and
> will be different so you will see a duplicate tuple with a different id.
>
> On Dec 19, 2016 3:59 PM, "Ambud Sharma"  wrote:
>
>> Yes that is correct. All downstream tuples must be processed for the root
>> tuple to be acknowledged.
>>
>> Type of grouping does not change the acking behavior.
>>
>> On Dec 19, 2016 3:53 PM, "Xunyun Liu"  wrote:
>>
>>> Hi there,
>>>
>>> As some grouping methods allow sending multiple copies of emitted data
>>> to downstream bolt instances, I was wondering what will happen if any one
>>> of them is not able to ack the tuple due to failures. The intrinsic
>>> question is that, when the all grouping method is used, whether the
>>> recipients are receiving the exact the same tuple or just duplications with
>>> different tuple IDs. In the latter case, I believe the tuple tree is
>>> expanded with regard to the number of parallelisms in downstream and each
>>> task has to invoke ack() for the root tuple to be fully processed.
>>>
>>> Any idea is much appreciated.
>>>
>>>
>>> --
>>> Best Regards.
>>> ==
>>> Xunyun Liu
>>> ​​
>>>
>>>


-- 
Best Regards.
==
Xunyun Liu
The Cloud Computing and Distributed Systems (CLOUDS) Laboratory,
The University of Melbourne


Re: ack in downstream when using all grouping method

2016-12-19 Thread Ambud Sharma
Storm is a framework built on replays, fundamentally replays are the way
guaranteed event processing is accomplished. Typically all Bolt Instances
in a given registered bolt should be running the same code, unless you are
doing some logic based on task ids. This implies that behavior of bolt
instances should be similar as well unless experiencing a hardware failure.

If I am understanding your use case you can either duplicate the data
outside storm (like write it to separate kafka topics) and have independent
spouts pick it up while keeping everything in 1 topology.

Grouping however is applied to one stream, you can have more than one
streams to have a logical separation as well.

I am still unsure about why would you get partial failures unless it's
frequent supervisor failure, may be you can provide more details about your
use case.

Lastly ALL groups are usually used for update delivery where
acknowledgements should matter, however if you can get away with using
unanchored tuples then that is also an alternative.


On Dec 19, 2016 4:17 PM, "Xunyun Liu"  wrote:

Thank you for your answer, Ambud. My use case is that only some of the bolt
instances are critical that I need them responding to the signal through
proper acknowledgment. However, the rest of them are non-critical which are
preferably not to interfere the normal ack process, much like receiving an
unanchored tuple. Is there any way that I can achieve this?

On 20 December 2016 at 11:01, Ambud Sharma  wrote:

> Forgot to answer your specific question. Storm message id is internal and
> will be different so you will see a duplicate tuple with a different id.
>
> On Dec 19, 2016 3:59 PM, "Ambud Sharma"  wrote:
>
>> Yes that is correct. All downstream tuples must be processed for the root
>> tuple to be acknowledged.
>>
>> Type of grouping does not change the acking behavior.
>>
>> On Dec 19, 2016 3:53 PM, "Xunyun Liu"  wrote:
>>
>>> Hi there,
>>>
>>> As some grouping methods allow sending multiple copies of emitted data
>>> to downstream bolt instances, I was wondering what will happen if any one
>>> of them is not able to ack the tuple due to failures. The intrinsic
>>> question is that, when the all grouping method is used, whether the
>>> recipients are receiving the exact the same tuple or just duplications with
>>> different tuple IDs. In the latter case, I believe the tuple tree is
>>> expanded with regard to the number of parallelisms in downstream and each
>>> task has to invoke ack() for the root tuple to be fully processed.
>>>
>>> Any idea is much appreciated.
>>>
>>>
>>> --
>>> Best Regards.
>>> ==
>>> Xunyun Liu
>>> ​​
>>>
>>>


-- 
Best Regards.
==
Xunyun Liu
The Cloud Computing and Distributed Systems (CLOUDS) Laboratory,
The University of Melbourne


Re: ack in downstream when using all grouping method

2016-12-19 Thread Xunyun Liu
​
Yes, my processing logic is task id dependent. Thus the behavior of
different bolt instances are similar but not exactly the same. This is also
the reason why I want some instances to be non-critical that do not affect
the ack procedure.

I would like to explore the possibility of modifying the ack logic so that
tuples emitted to non-critical tasks are not anchored. I will report any
progress I have on this matter.

Best Regards.



On 20 December 2016 at 11:32, Ambud Sharma  wrote:

> Storm is a framework built on replays, fundamentally replays are the way
> guaranteed event processing is accomplished. Typically all Bolt Instances
> in a given registered bolt should be running the same code, unless you are
> doing some logic based on task ids. This implies that behavior of bolt
> instances should be similar as well unless experiencing a hardware failure.
>
> If I am understanding your use case you can either duplicate the data
> outside storm (like write it to separate kafka topics) and have independent
> spouts pick it up while keeping everything in 1 topology.
>
> Grouping however is applied to one stream, you can have more than one
> streams to have a logical separation as well.
>
> I am still unsure about why would you get partial failures unless it's
> frequent supervisor failure, may be you can provide more details about your
> use case.
>
> Lastly ALL groups are usually used for update delivery where
> acknowledgements should matter, however if you can get away with using
> unanchored tuples then that is also an alternative.
>
>
> On Dec 19, 2016 4:17 PM, "Xunyun Liu"  wrote:
>
> Thank you for your answer, Ambud. My use case is that only some of the
> bolt instances are critical that I need them responding to the signal
> through proper acknowledgment. However, the rest of them are non-critical
> which are preferably not to interfere the normal ack process, much like
> receiving an unanchored tuple. Is there any way that I can achieve this?
>
> On 20 December 2016 at 11:01, Ambud Sharma  wrote:
>
>> Forgot to answer your specific question. Storm message id is internal and
>> will be different so you will see a duplicate tuple with a different id.
>>
>> On Dec 19, 2016 3:59 PM, "Ambud Sharma"  wrote:
>>
>>> Yes that is correct. All downstream tuples must be processed for the
>>> root tuple to be acknowledged.
>>>
>>> Type of grouping does not change the acking behavior.
>>>
>>> On Dec 19, 2016 3:53 PM, "Xunyun Liu"  wrote:
>>>
 Hi there,

 As some grouping methods allow sending multiple copies of emitted data
 to downstream bolt instances, I was wondering what will happen if any one
 of them is not able to ack the tuple due to failures. The intrinsic
 question is that, when the all grouping method is used, whether the
 recipients are receiving the exact the same tuple or just duplications with
 different tuple IDs. In the latter case, I believe the tuple tree is
 expanded with regard to the number of parallelisms in downstream and each
 task has to invoke ack() for the root tuple to be fully processed.

 Any idea is much appreciated.


 --
 Best Regards.
 ==
 Xunyun Liu
 ​​


>
>
> --
> Best Regards.
> ==
> Xunyun Liu
> The Cloud Computing and Distributed Systems (CLOUDS) Laboratory,
> The University of Melbourne
>
>
>


-- 
Best Regards.
==
Xunyun Liu
The Cloud Computing and Distributed Systems (CLOUDS) Laboratory,
The University of Melbourne


Re: ack in downstream when using all grouping method

2016-12-20 Thread S G
I am sure you would have thought of it, but I am curious to ask.
Why not do this in your bolt?

if ( criticalTask(taskId) == false )
   ack(tuple);

This way, your bolt acks immediately and the preceding component will treat
it as passed successfully.

On Mon, Dec 19, 2016 at 6:01 PM, Xunyun Liu  wrote:

> ​
> Yes, my processing logic is task id dependent. Thus the behavior of
> different bolt instances are similar but not exactly the same. This is also
> the reason why I want some instances to be non-critical that do not affect
> the ack procedure.
>
> I would like to explore the possibility of modifying the ack logic so that
> tuples emitted to non-critical tasks are not anchored. I will report any
> progress I have on this matter.
>
> Best Regards.
>
>
>
> On 20 December 2016 at 11:32, Ambud Sharma  wrote:
>
>> Storm is a framework built on replays, fundamentally replays are the way
>> guaranteed event processing is accomplished. Typically all Bolt Instances
>> in a given registered bolt should be running the same code, unless you are
>> doing some logic based on task ids. This implies that behavior of bolt
>> instances should be similar as well unless experiencing a hardware failure.
>>
>> If I am understanding your use case you can either duplicate the data
>> outside storm (like write it to separate kafka topics) and have independent
>> spouts pick it up while keeping everything in 1 topology.
>>
>> Grouping however is applied to one stream, you can have more than one
>> streams to have a logical separation as well.
>>
>> I am still unsure about why would you get partial failures unless it's
>> frequent supervisor failure, may be you can provide more details about your
>> use case.
>>
>> Lastly ALL groups are usually used for update delivery where
>> acknowledgements should matter, however if you can get away with using
>> unanchored tuples then that is also an alternative.
>>
>>
>> On Dec 19, 2016 4:17 PM, "Xunyun Liu"  wrote:
>>
>> Thank you for your answer, Ambud. My use case is that only some of the
>> bolt instances are critical that I need them responding to the signal
>> through proper acknowledgment. However, the rest of them are non-critical
>> which are preferably not to interfere the normal ack process, much like
>> receiving an unanchored tuple. Is there any way that I can achieve this?
>>
>> On 20 December 2016 at 11:01, Ambud Sharma 
>> wrote:
>>
>>> Forgot to answer your specific question. Storm message id is internal
>>> and will be different so you will see a duplicate tuple with a different id.
>>>
>>> On Dec 19, 2016 3:59 PM, "Ambud Sharma"  wrote:
>>>
 Yes that is correct. All downstream tuples must be processed for the
 root tuple to be acknowledged.

 Type of grouping does not change the acking behavior.

 On Dec 19, 2016 3:53 PM, "Xunyun Liu"  wrote:

> Hi there,
>
> As some grouping methods allow sending multiple copies of emitted data
> to downstream bolt instances, I was wondering what will happen if any one
> of them is not able to ack the tuple due to failures. The intrinsic
> question is that, when the all grouping method is used, whether the
> recipients are receiving the exact the same tuple or just duplications 
> with
> different tuple IDs. In the latter case, I believe the tuple tree is
> expanded with regard to the number of parallelisms in downstream and each
> task has to invoke ack() for the root tuple to be fully processed.
>
> Any idea is much appreciated.
>
>
> --
> Best Regards.
> ==
> Xunyun Liu
> ​​
>
>
>>
>>
>> --
>> Best Regards.
>> ==
>> Xunyun Liu
>> The Cloud Computing and Distributed Systems (CLOUDS) Laboratory,
>> The University of Melbourne
>>
>>
>>
>
>
> --
> Best Regards.
> ==
> Xunyun Liu
> The Cloud Computing and Distributed Systems (CLOUDS) Laboratory,
> The University of Melbourne
>