Re: Does storm support incremental windowing operation?

2017-12-21 Thread Jungtaek Lim
Hi Jinhua,

could you refer the link of doc for Flink? I'm not exactly aware of
incremental aggregation upon the window, so let me take a look at.

2017년 12월 21일 (목) 오후 2:09, Jinhua Luo 님이 작성:

> Hi All,
>
> The window may buffer many tuples before evaluation, does storm
> support incremental aggregation upon the window, just like flink does?
>


Re: Does storm support incremental windowing operation?

2017-12-21 Thread Stephen Powis
Is there a reason to use Guava cache to aggregate over just a plain old
Map?  Curious as aggregation is a common use case for us and never thought
to look to Guava for it.

Thanks!

On Fri, Dec 22, 2017 at 1:50 PM, Manish Sharma  wrote:

> We added a guava cache in the bolt's execute method to aggregate tuples and 
> wait for the tick signal.
>
> You can control the tick frequency with TOPOLOGY_TICK_TUPLE_FREQ_SECS in the 
> main topology.
>
>
> This is a Spark use case IMO.
>
>
> cheers, /Manish
>
>
>
>
> On Wed, Dec 20, 2017 at 9:09 PM, Jinhua Luo  wrote:
>
>> Hi All,
>>
>> The window may buffer many tuples before evaluation, does storm
>> support incremental aggregation upon the window, just like flink does?
>>
>
>


Re: Does storm support incremental windowing operation?

2017-12-21 Thread Manish Sharma
We added a guava cache in the bolt's execute method to aggregate
tuples and wait for the tick signal.

You can control the tick frequency with TOPOLOGY_TICK_TUPLE_FREQ_SECS
in the main topology.


This is a Spark use case IMO.


cheers, /Manish




On Wed, Dec 20, 2017 at 9:09 PM, Jinhua Luo  wrote:

> Hi All,
>
> The window may buffer many tuples before evaluation, does storm
> support incremental aggregation upon the window, just like flink does?
>


Re: Migrating from storm-kafka to storm-kafka-client

2017-12-21 Thread Manish Sharma
We solved this offset sync issue by making our topology idempotent, (we
could do that with our use case)
our storm topology consumes documents from kafka and commits to
elasticsearch & inserting records to cassandra..
our topology can re-consume from beginning of the queue, and the docids and
primary keys are chosen such that the records get overwritten with the same
document.

cheers, /Manish


On Thu, Dec 21, 2017 at 1:23 PM, Stig Rohde Døssing  wrote:

> Hi Nasron,
>
> I don't believe there's currently a tool to help you migrate. We did it
> manually by writing a small utility that looked up the commit offsets in
> Storm's Zookeeper, opened a KafkaConsumer with the new consumer group id
> and committed the offsets for the appropriate partitions. We stopped our
> topologies, used this utility and redeployed with the new spout.
>
> Assuming there isn't already a tool for migration floating around
> somewhere, I think we could probably build some migration support into the
> storm-kafka-client spout. If the path to the old offsets in Storm's
> Zookeeper is given, we might be able to extract them and start up the new
> spout from there.
>
> 2017-12-19 21:59 GMT+01:00 Nasron Cheong :
>
>> Hi,
>>
>> I'm trying to determine steps for migration to the storm-kafka-client in
>> order to use the new kafka client.
>>
>> It's not quite clear to me how offsets are migrated - is there a specific
>> set of steps to ensure offsets are moved from the ZK based offsets into the
>> kafka based offsets?
>>
>> Or is the original configuration respected, and storm-kafka-client can
>> mostly be a drop in replacement?
>>
>> I want to avoid having spouts reset to the beginning of topics after
>> deployment, due to this change.
>>
>> Thanks.
>>
>> - Nasron
>>
>
>


Re: Migrating from storm-kafka to storm-kafka-client

2017-12-21 Thread Stig Rohde Døssing
Hi Nasron,

I don't believe there's currently a tool to help you migrate. We did it
manually by writing a small utility that looked up the commit offsets in
Storm's Zookeeper, opened a KafkaConsumer with the new consumer group id
and committed the offsets for the appropriate partitions. We stopped our
topologies, used this utility and redeployed with the new spout.

Assuming there isn't already a tool for migration floating around
somewhere, I think we could probably build some migration support into the
storm-kafka-client spout. If the path to the old offsets in Storm's
Zookeeper is given, we might be able to extract them and start up the new
spout from there.

2017-12-19 21:59 GMT+01:00 Nasron Cheong :

> Hi,
>
> I'm trying to determine steps for migration to the storm-kafka-client in
> order to use the new kafka client.
>
> It's not quite clear to me how offsets are migrated - is there a specific
> set of steps to ensure offsets are moved from the ZK based offsets into the
> kafka based offsets?
>
> Or is the original configuration respected, and storm-kafka-client can
> mostly be a drop in replacement?
>
> I want to avoid having spouts reset to the beginning of topics after
> deployment, due to this change.
>
> Thanks.
>
> - Nasron
>


Storm issue - Shlex module not found

2017-12-21 Thread Sai Suman Mallela
Hello Team,

Am trying to get Stormcrawler working on my Debian VM. I get this error
when I do the following command:

 root@demo76:/opt# storm

Traceback (most recent call last):

 File "/opt/storm/apache-storm-1.1.0/bin/storm.py", line 23, in 

 import shlex

ImportError: No module named shlex

 I have python 2.7.9 version. I tried a simple python program with shlex
and it works fine but am not sure why the storm is not recognizing shlex.
Can you please help?


Kind Regards,

Sai


Re: Back-pressure Mechamism

2017-12-21 Thread Bobby Evans
Algoby,

When you say transfer queue, which queue do you mean exactly.  In storm
there are a lot of queues currently and they sometimes have confusing names.

There is the receive queue, which holds tuples to be processed by a
specific executor.  Then there is the send queue, or some times called the
batch transfer queue.  All of the emit calls from an executor go into this
and then a second thread handles batching the massages and routing them to
where they need to go.  The there is the transfer queue.  This queue gets
all of the tuples that need to be sent outside this worker.

We have looked at supporting all of these different queues for back
pressure.  The receive queue is the big one as it is where most of the user
code likely executes.  The send queue tends to back up if the time taken to
serialize an object is more then the processing needed to produce the
object.  This is not that common, but I have seen it where a single large
message gets split up into many messages, each that may be kind of
difficult to serialize.  I thought we had a patch to include the send queue
as part of back pressure, but I don't know what happened to it.

The transfer queue is much less likely to back up, but the consequences are
much worse when it does backup.  The thread that reads from the transfer
queue only routes messages to clients that are buffering the messages and
sending them to other workers.  There is not much work happening here.  The
clients themselves don't have any back pressure built in either. So if the
transfer queue is backing up then your worker likely is writing messages
into memory as fast as it can, and you are going to get an OOM some time
soon.  To really make this work you would need some kind of back pressure
in the netty clients that could also be involved with this.

A patch that we will likely merge into 2.x shortly
https://github.com/apache/storm/pull/2241 has done all of this and also
redesigned back pressure to not go off of high/low water marks with signals
through zookeeper, but instead to push back to the upstream component when
a queue is full.  The only downside there is that we will only be able to
support DAGs for processing.  No loops in user code, or you could deadlock.
Until we get 2.x out the door and stable (which I really want to do in Q1
2018) you are probably going to have to live with some of these issues.

- Bobby

On Thu, Dec 21, 2017 at 9:50 AM Waleed Al-Gobi 
wrote:

> Dear All,
>
> My concern is about on which queue Storm relies to for back-pressure.
>
> I did simple test for back-pressure supported by Storm.
> Each instance (executor) maintains incoming(receive) Q and
> outgoing(transfer) Q, and according to min and max threshold on these
> queues, a back-pressure works to slow down the spout in case of queue
> buildup.
>
> The purpose I wanted to make sure in case of link bottleneck whether
> back-pressure still helps or not.
> The conclusion, it helps only in case of queue buildup due to CPU
> bottleneck. I guess the reason for which why it could not make it for link
> bottleneck, because back-pessure relies only on the executor receive Q.
>
> Does this make sense? If so, could we anyway make the back-pressure also
> working if ececutor transfer Q is full in case of link bottleneck?
>
> Thanks!
>
> Best,
> Algoby
>