Re: High number of failed messages from MqttSpout

2019-03-06 Thread Ravi Sharma
Hi Kai,
Seems like tuple timeout errors(no failed tuples in bolts but spout reports
failure), Whats the value for max pending spout?
Set it to a smaller number like 10 just to test it. and then see how high
you can go based on what you doing in your topology.

Thanks
Ravi.

On Thu, Mar 7, 2019 at 2:58 AM 1733208392 <1733208...@qq.com> wrote:

> Hi storm experts,
>
> As you can see from this diagram, mqtt_spout appears to have high number
> of failed messages. I have tried following things
> 1) Set topology.ackers: 0
> 2) Walk through the topology to see if there are any bolt that does not
> ack or fail unexpectedly but I didn't see any.
> 3) Set the message timeout to 1 hours
>
> All of above does not work so we are wonderring what are the other options
> and why it appears to have high fail rate and how to fix it?
>
> Regards,
> Kai
>


Topology spouts lag

2018-05-24 Thread Ravi Sharma
Hi,
I am using storm 1.2.1, Kafka 0.10.2.1 and storm Kafka client 1.2.1, with
Kafka client 0.10.2.1.

I am able to run topology and able to read messages from Kafka but When I
go to storm ui and click on my topology name, it shows me "Loading topology
summary" message for a minute or so and then shows me all topology stats
with one table "Topology spout lag error" and message is "*unable to get
offset lays for kafka. Reason:
org.apache.kafka.shaded.common.errors.TimeoutException: Timeout expired
while fetching topic metadata.*

It's been few days I am trying to get to bottom of this problem but no luck.

Can't see any error in ui log, only log I see that it's trying to access
//lag end point and that's it.

I try to get offset from same topic outside Strom Topology by writing a
consumer in java and I can get the correct offset. Not sure what's wrong
when running as topology using KafkaSpout from strom Kafka client 1.2.1.

Does anyone have idea about it and any explanation on how topology spout
lag works.

Thanks in advance
Ravi


Re: Large number of very small streams

2016-09-22 Thread Ravi Sharma
Hi ivan,
I assume you are trying to do per user stream so that you can process each
user's event in same sequence as they arrive. Is this correct assumption?

if yes then with in storm you can manage this even if you read from one
kafka topic using one spout and output events on one stream. just read
stream grouping by field(*Fields grouping
)*.

If no then please can you explain bit more on why you want to create stream
per user?

Thanks
Ravi

On Wed, Sep 21, 2016 at 11:00 PM, Ivan Gozali  wrote:

> Hi everyone,
>
> I'm very new to Storm, and have read various documentation but haven't
> started using it.
>
> I have a use case where I could potentially have many users producing data
> points that are accumulated in one huge, single Kafka topic/Kinesis stream,
> and I was going to use Storm to "route" per-user mini-streams coming from
> this single huge stream to multiple processors.
>
> I was wondering how this use case is typically handled. I was going to
> create a topology (where the spout consumes the big stream) for each user's
> mini-stream that is then pushed to some new derived stream in
> Kinesis/Kafka, but this doesn't seem right, since there could be 100,000s,
> if not 1,000,000s of users and I would be creating 1,000,000 topics.
>
> Thanks in advance for any advice!
>
> --
> Regards,
>
>
> Ivan Gozali
>


Storm Integration Tests

2016-09-15 Thread Ravi Sharma
Hi Guys,
Recently i have written a small framework for integration tests(including
flux yaml file), thought of sharing with you all. May be it can help
someone.

https://github.com/ping2ravi/storm-integration-test


Thanks
Ravi.


Re: How will storm replay the tuple tree?

2016-09-14 Thread Ravi Sharma
Hi T.I.
Few things why Spout is responsible for replay rather then Various Bolts.

1. ack and fail messages carry only message ID, Usually your spouts
generate messaged Id and knows what tuple/message is linked to it(via
source i.e. jms  etc). If ack or fail happens then Spout can do various
things like on ack delete from queue, on fail put in some dead letter
queue. intermediate Bolt Wont know what message it sent, unless you
implement something of your own. Technically you can put Delete message
from JMS in bolts but then your whole topology knows from where you are
getting data, what if tommorow you start processing data from JMS, Http
rest service, Database and file system etc.

2. BoltB fails, it tells BoltA, BoltA retry 3 times, it fails 3 times, now
what BoltA should do,? Send it to another bolt(say BoltPreA exists between
him and spout) or send it to Spout.?
If it sends to BoltPreA that means BoltPreA will retry 3 times(just
using 3 number consider as N), that means for each try to BoltPreA, BoltA
will retry again 3 times, so total 9 retries.(basically total retries will
be based on Total bolt from Spout to Failure Bolt TB and total Retries TR,
it will be like TR + Power(TR,2) . + Power(TR,TB)
If you send back from failure from BoltA to Spout then we can argue
why not send it to Spout from BoltB, as a framework i shouldnt be looking
into if BoltB is really costly or BoltA is really costly.

3. Also failure scenario are suppose to be really really low, and if your
database is down(means 100% tuple will fail), then performance wont be your
only concern. your concern will be to make sure database comes up and
reprocess all failed tuple.

4. Also you will have to take care of retry logic in every Bolt. Currently
its only at one place.



*There is one thing i am looking forward from Storm is to inform Spout
about what kind of failure it was*. i.e. if it was ConnectionTimeout or
ReadTimeout etc, that means if i retry it may pass. But say it was null
pointer exception(java world) , i know the data which is being expected is
not there and my code is not handling that scenario, so either i will have
to change code or ask data provider to send that field, but retry wont help
me.

Currently only way to do is use a outside datastore like Redis, whichever
Bolt you fail add a key with mesageId and Exception/error detail in redis
before calling fail. and then let Spout read that data from redis with
messageId received in onFail call and then spout can decide if i want to
retry or not. I would usually Create two wrappers Retry-able Exception and
*non* Retry-able Exception, so each bolt can inform whether retry can help
or not. Its upto you where you put this decision making logic.



Thanks
Ravi.






On Wed, Sep 14, 2016 at 6:43 AM, Tech Id  wrote:

> Thanks Ambud,
>
> I did read some very good things about acking mechanism in Storm but I am
> not sure it explains why point to point checking is expensive.
>
> Consider the example: Spout--> BoltA--->BoltB.
>
> If BoltB fails, it will report failure to the acker.
> If the acker can ask the Spout to replay, then why can't the acker ask the
> parent of BoltB to replay at this point?
> I don't think keeping parent of a bolt could be expensive.
>
>
> On a related note, I am a little confused about a statement "When a new
> tupletree is born, the spout sends the XORed edge-ids of each tuple
> recipient, which the acker records in its pending ledger" in
> Acking-framework-implementation.html
> 
> .
> How does the spout know before hand which bolts would receive the tuple?
> Bolts forward tuples to other bolts based on groupings and dynamically
> generated fields. How does spout know what fields will be generated and
> which bolts will receive the tuples? If it does not know that, then how
> does it send the XOR of each tuple recipient in a tuple's path because each
> tuple's path will be different (I think, not sure though).
>
>
> Thx,
> T.I.
>
>
> On Tue, Sep 13, 2016 at 6:37 PM, Ambud Sharma 
> wrote:
>
>> Here is a post on it https://bryantsai.com/fault-to
>> lerant-message-processing-in-storm/.
>>
>> Point to point tracking is expensive unless you are using transactions.
>> Flume does point to point transfers using transactions.
>>
>> On Sep 13, 2016 3:27 PM, "Tech Id"  wrote:
>>
>>> I agree with this statement about code/architecture but in case of some
>>> system outages, like one of the end-points (Solr, Couchbase, Elastic-Search
>>> etc.) being down temporarily, a very large number of other fully-functional
>>> and healthy systems will receive a large number of duplicate replays
>>> (especially in heavy throughput topologies).
>>>
>>> If you can elaborate a little more on the performance cost of tracking
>>> tuples or point to a document reflecting the same, that will be of great
>>> 

Re: Fwd: [meetup-group-aYOqisgB] [SURVEY] What version of Storm are you using?

2016-09-10 Thread Ravi Sharma
1.0.2

On Wed, Sep 7, 2016 at 2:22 PM, davo...@crossing-technologies.com <
davo...@crossing-technologies.com> wrote:

> Ùuuh
>
>
>
> Sent from my Samsung Galaxy smartphone.
>
>  Original message 
> From: Manu Zhang 
> Date: 9/7/16 14:15 (GMT+01:00)
> To: user 
> Subject: Fwd: [meetup-group-aYOqisgB] [SURVEY] What version of Storm are
> you using?
>
> This answer is from an engineer at vipshop, an online discount retailer
> for brands in China. The words are in Chinese so I translate them to
> English here.
>
> *1. What version of Storm are you currently using?*
> The big clusters at vipshop are using Storm 0.9.4.  We plan to upgrade to
> 1.0 gradually.
> *2. If you are not on the most recent version, what is preventing you from
> upgrading?*
> The main concern for upgrade is its impact on business. It's very costly
> to upgrade a huge cluster with hundreds of jobs, where an import job even
> requires two copies to run simultaneously to ensure correctness.
> Besides, we are looking forward to Storm 2.0 which has a Java core since
> Clojure has a high barrier.
>
>
> -- Forwarded message -
> From: Clark,Song(vip.com) 
> Date: Wed, Sep 7, 2016 at 10:50 AM
> Subject: 答复: [meetup-group-aYOqisgB] [SURVEY] What version of Storm are
> you using?
> To: owenzhang1...@gmail.com 
>
>
> hi:
> *1. What version of Storm are you currently using?*
> 唯品会实时这边的几个大集群都是V0.9.4,计划会慢慢升级到V1.0
> *2. If you are not on the most recent version, what is preventing you from
> upgrading?*
> 版本升级主要考虑对业务的影响。特别是大集群百十个job情况下,升级的成本很高。重要的业务甚至需要双跑以确保数据质量。
> *另外:*
> 蛮期待storm2.0的java core,clojure门槛较高
>
>
>
>
> --
> *发件人:* meetup-group-ayoqisgb-annou...@meetup.com  annou...@meetup.com> 代表 manuzhang  annou...@meetup.com>
> *发送时间:* 2016年9月6日 18:56
> *收件人:* meetup-group-ayoqisgb-annou...@meetup.com
> *主题:* [meetup-group-aYOqisgB] [SURVEY] What version of Storm are you
> using?
>
>
> This is a survey started by Storm PMC Chair Taylor Goetz. I quote his
> words here
>
> On the Storm developer list, there are a number of discussions regarding
> ending support for various older versions of Storm. In order to make an
> informed decision I’d like to get an idea of what versions the user
> community is actively using. I’d like to ask the user community to answer
> the following questions so we can best determine which version lines we
> should continue to support, and which ones can be EOL’ed.
>
> *1. What version of Storm are you currently using? *
>
> *2. If you are not on the most recent version, what is preventing you from
> upgrading? *
>
> Thanks in advance. -Taylor
>
> The original link is at http://mail-archives.apache.org/mod_mbox/storm-
> user/201608.mbox/%3C5DCE24EC-0E48-4AA1-A700-F2AB48248304%40gmail.com%3E
>
> Please kindly answer Taylor's two questions by replying this email.
>
>
>
>
> --
> This message was sent by Meetup on behalf of manuzhang
> 
> from Shanghai Big Data Streaming Meetup
> .
> To report this message, please click here
> 
> To block the sender of this message, please click here
> 
>
> Never miss a last-minute change. Get the app.
> [image: iPhone App Store]
> <#m_-205107936281958342_m_-3615088255113836396_link_site(+'/z/'+'redirect_loc'+'https://itunes.apple.com/app/apple-store/id375990038?pt=288116=emailfooter=8'+)>[image:
> Google Play]
> <#m_-205107936281958342_m_-3615088255113836396_link_site(+'/z/'+'redirect_loc'+'http://play.google.com/store/apps/details?id=com.meetup=utm_source%3Demailfooter'+)>
>
> You're getting this message because your Meetup account is connected to
> this email address.
>
> Unsubscribe
> <#m_-205107936281958342_m_-3615088255113836396_link_unsubscribe(> from
> similar emails from this Meetup group. Manage your settings
> <#m_-205107936281958342_m_-3615088255113836396_link_site(+'/account/comm/'+)>
> for all types of email updates.
>
> Visit your account page
> <#m_-205107936281958342_m_-3615088255113836396_link_site(+'/account/'+)>
> to change your contact details, privacy settings, and other settings.
>
> Meetup Inc.
> <#m_-205107936281958342_m_-3615088255113836396_link_site(+'/'+)>, POB
> 4668 #37895 New York NY USA 10163
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or 

Re: Sending Some Context when Failing a tuple.

2016-01-05 Thread Ravi Sharma
Hi Ankur,

Thanks for detailed reply, I havent tried the code yet on cluster but just
running it on my mind.

Lets say Bean b's single instance was created for a message and passed as
part of tuple as well as messageId

I dont know the implementaion but i will assume messageId and tuple will be
serialized seprately and deserialized seprately in another node. So while
serializing same object will be serialized twice, but when deserializing
two new objects will be created, with exact same state/content.

so basically

(Bean)tuple.getValue(1) != messageIdObject, comparing only the memory
address after deserlizing.


So when i change Bean b in tuple , it will not be reflected in Messageid.

It may work in dev mode as there is no serialization/deserilization , but
in cluster mode it may not work, but i will try to run it in cluster mode
and see that it actually does what i have just said.


Also messageId and Tuple are suppose to be read only objects, which means
even if it works now it may not work in future version, if some kind of
caching comes in picture, i.e. spout keeps the cache, on ack bolt sends
only a key of tuple and spout get it from cache etc etc


Thanks

Ravi.






On 4 Jan 2016 5:14 pm, "Ankur Garg" <ankurga...@gmail.com> wrote:

> My Bad Ravi that I could not explain my point properly .
>
> Like you said in the fail method  when u call outputCollector.fail(tuple)
> , ISpout.fail(Object messageId) gets invoked.
>
> May be u r passing String as this messageId but the argument itself says
> Object .
>
> So what I meant to say that you can pass your same object as MessageId
> (perhaps name it better for understanding) .
>
> To Elaborate , Lets say I pass along simple Java Bean from Spout to Bolt .
>
> Here is my sample Bean
>
> public class Bean implements Serializable {
>
> private String a; // your custom attributes which are serializable .
>
> private int b;  your custom attributes which are serializable .
>
> private int c; // your custom attributes which are serializable .
>
> private String msgId; //your custom attributes which are serializable .
>
>private String failureReason;   // Failure Reason ..to be populated
> inside bolt when tuple fails
>
>
> //getter setter Methods
>
> }
>
> In your Spout inside nextTuple . Taking example from word count of Storm
> Starter project
>
>
> public void nextTuple() {
>
> try{
>
> final String[] words = new String[] { "nathan", "mike", "jackson",
>
> "golda", "bertels" };
>
> final Random rand = new Random();
>
> final String word = words[rand.nextInt(words.length)];
>
> String msgId = word +  UUID.randomUUID().toString();
>
> Bean b = new Bean();
>
> b.setA("String A");
>
> b.setB(123);
>
> b.setC(456);
>
> b.setMsgId(msgId); // not necessary to do
>
> * _collector.emit(new Values(word,b) , b);*
>
> LOG.info("Exit nextTuple Method ");
>
> }
>
> catch(Exception ex)
>
> {
>
>   ex.printStackTrace();
>
> }
>
> LOG.info("Final Exit nextTuple method ");
>
> }
>
>
> See the _collector.emit . I am passing the same bean object as MessageId
> Object .
>
> *And declareOutputFields method as *
>
> public void declareOutputFields(OutputFieldsDeclarer declarer) {
>
> declarer.declare(new Fields("word" , "correlationObject"));
>
> }
>
>
> Now in my Bolt
>
> In the execute Method
>
> @Override
>
> public void execute(Tuple tuple) {
>
>   Bean b1 = (Bean)tuple.getValue(1);
>
>  * b1.setFailureReason("It failed ");*
>
> _collector.fail(tuple);
>
> }
>
>
> Now , when _collector.fail method is called Spout's fail method gets
> invoked
>
> public void fail(Object msgId) {
>
>  Bean b1 = (Bean) msgId;
>
>  String failureReason = b1.getFailureReason();
>
> }
>
>
> *You will see the failureReason u set inside ur bolt received here inside
> fail method . *
>
> *Again , I am not saying that this is the best way to achieve what u want
> , but just proposing a way it can be done.*
>
> Hope this helps.
>
>
> Thanks
>
> Ankur
>
>
> On Mon, Jan 4, 2016 at 2:24 PM, Ravi Sharma <ping2r...@gmail.com> wrote:
>
>> Hi Ankur,
>> Various Storm API for this are like this
>>
>> Bolts recieve Tuple, which is immutable object.
>> Once something fails in Bolt we call outputCollector.fail(tuple)
>>
>> which in turn invoke Spout's ISpout.fail(Object messageId) method.
>>
>>
>>
>> now spout gets only the messageId back(which i created when processing
>> s

Sending Some Context when Failing a tuple.

2016-01-03 Thread Ravi Sharma
Hi All,
I would like to send some extra information back to spout when a tuple is
failed in some Bolt, so that Spout can decide if it want it to replay or
just put the message into queue outside storm for admins to view.

So is there any way i can attach some more information when sending back
failed tuple to spout.?

One way i can think of is keeping such information outside storm in some
datastore, with Tuple id and spout can lookup that, but looking for some
way to do it via storm without bringing in other integration/datastore.


Thanks
Ravi.


Re: Store previous calculated result

2015-11-09 Thread Ravi Sharma
Calculated in previous Bolt/Spout or calculated in previous run?

Ravi

On Sat, Nov 7, 2015 at 11:27 AM, Miguel Ángel Fernández Fernández <
miguelangelprogramac...@gmail.com> wrote:

> In a trident scenario, a realtime operation needs to know the previous
> calculated result.
>
> My current solution is very poor and probably incorrect (a hashmap in
> bolts). Now I'm thinking to incorporate a cache (redis, memcached ...)
>
> However, I *suppose*  
> that
> there is a standard solution for this problem in Trident (maybe a special
> state).
>
> What do you think is the best approach?
>
> Thanks for your time
>


Re: Does Storm work with Spring

2015-10-19 Thread Ravi Sharma
you may have to tell Spring that ur .yaml file is ur resource file.

Ravi.

On Mon, Oct 19, 2015 at 3:25 PM, Ankur Garg <ankurga...@gmail.com> wrote:

> Hi Ravi ,
>
> Need your help . So I created a local cluster and deployed my topology to
> it . Inside my Spout and Bolts , I am launching a Spring Boot application
> wrapped inside a singleton to initialise my context . Unfortunately , it
> appears to me that it is not working :  and annotations like
> @EnableAutoConfiguration is not picking up yml files from the classpath and
> injecting their values in the bean. And I am getting exceptions like
>
> Error creating bean with name 'inputQueueManager': Injection of autowired
> dependencies failed; nested exception is
> org.springframework.beans.factory.BeanCreationException: Could not autowire
> field: private int
> mqclient.rabbitmq.manager.impl.InputQueueManagerImpl.rabbitMqPort; nested
> exception is org.springframework.beans.TypeMismatchException: Failed to
> convert value of type 'java.lang.String' to required type 'int'; nested
> exception is java.lang.NumberFormatException: For input string:
> "${input.rabbitmq.port}" at
>
> has anyone here ever tried injecting dependencies from Spring . I am not
> sure why this is not working .
>
> It works like a charm in Local Cluster and now I am not passing context as
> a constructor argument , rather declaring and initializing it inside each
> spout and bolts :( .
>
> Is there any reason why Spring Annotations dont work inside a Remote
> Cluster .
>
> Need help urgently here .
>
> Thanks
> Ankur
>
> On Sun, Oct 11, 2015 at 1:01 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>
>> I think I don't  need to Autowire beans inside my spout and bolts .
>>
>> All I want my context to be available . Since I use Spring Boot , I am
>> delegating it to initialise all the beans and set up every bean (reading
>> yml file and create DB connections , connections to Message brokers etc ) .
>>
>> On my local cluster I am passing it as a constructor argument to Spouts
>> and Bolts . Since all r running in same jvm its available to all spouts and
>> bolts .
>>
>> But in a distributed cluster , this will blow up as Context is not
>> serializable and cannot be passed like above .
>>
>> So the problem is only to make this context available once per jvm .
>> Hence I thought I will wrap it under a singleton and make this available to
>> all spouts and bolts per jvm.
>>
>> Once I have this context initialized and loaded all I need to do is to
>> get the bean which I will do the same way I am doing inside local cluster
>> spouts and bolts .
>>
>>
>>
>>
>>
>> On Sun, Oct 11, 2015 at 12:46 PM, Ravi Sharma <ping2r...@gmail.com>
>> wrote:
>>
>>> Yes ur assumption is right
>>> Jvm1 will create application contexts say ac1
>>>
>>> And jvm2 will create another application instance ac2
>>>
>>> And all of it can be done via singleton classes.
>>>
>>> All bolts and spouts in same jvm instance need to access same
>>> application context.
>>>
>>> I have done same in cluster and it works
>>>
>>> Remember all spring beans need to be transient and also u need to set
>>> required=false in case u r going create spout and bolt using spring
>>>
>>> Public class mybolt  {
>>> @aurowired(required=false)
>>> Private transient MyServiceBean myServiceBean;
>>>
>>> 
>>> ...
>>> }
>>>
>>> Ravi
>>> On 11 Oct 2015 07:59, "Ankur Garg" <ankurga...@gmail.com> wrote:
>>>
>>>> Also , I think there can be some instances of spouts/bolts running on
>>>> JVM 1 and some on JVM 2 and so on...
>>>>
>>>> Is it possible for spouts and bolts running on same jvm to access same
>>>> applicationContext .
>>>>
>>>> I am thinking that I can make the place where I  launch my spring Boot
>>>> application  inside a singleton class , and so all the spouts and bolts
>>>> running on say JVM1 will have access to same context  (instead of launching
>>>> it in all spouts and bolts) . And for those in JVM 2 they will still
>>>> initialise it once and all the rest will get the same application Context .
>>>>
>>>> But all above is theoretical assumption  . I still need to try it out
>>>>  (unfortunately i dont have a cluster setup at my end) but if possible
>>>> please let me know if this can work .
>>>>
&

Re: Java Async in Storm Bolts - Good idea?

2015-10-17 Thread Ravi Sharma
Hi Rajiv,
I am not sure if this will increase throughput in any ways. You still have
fixed resources and work done is still same. Its just instead of using Bolt
main thread you are spawning new thread.
I see it as all -ve, because your all work must be done on Bolt thread,
thats how you will scale it and will know exactly whats happening in your
topology and if you need to increase parallelism of your bolts. With async
thread you will see that ur bolts are running really fast and you can do
more work, but in reality they are not.

Ravi.






On Fri, Oct 16, 2015 at 2:19 AM, Rajiv Jivan  wrote:

> In order to increase throughput a colleague suggest making the logic in
> the bolt asynchronous by using Guava Futures. The bolt is writing to a
> cassadra db. Is this a recommended way to relieve back pressure?
>
> Pseudo code
>
> @Override
> public void execute(Tuple tuple)  {
>
> // execute query on cassandra
> ResultSetFuture future =  session.executeAsync(".");
>
> Futures.addCallback(future, new FutureCallback ()  {
> @Override
> public void onSuccess(ResultSet result) {
> outputCollector.ack(tuple);
> }
>
> @Override
> public void onFailure(Throwable t)  {
> outputCollector.fail(tuple);
> }
> }, MoreExecutors.sameThreadExecutor());
> }
>
>


Re: Java Async in Storm Bolts - Good idea?

2015-10-17 Thread Ravi Sharma
I mean hidden from your Storm dashboard. For such threads you will have to
built ur own monitoring etc.

Ravi.

On Sat, Oct 17, 2015 at 10:31 AM, Enno Shioji <eshi...@gmail.com> wrote:

> Not all asynchronous processing implies hidden threads... Some clients use
> threads, others use async IO. E.g. a network client could be handling many
> thousands of connections with just a couple of threads.
>
> But agreed that it's uncommon that you benefit from it.
>
> On Sat, Oct 17, 2015 at 9:43 AM, Ravi Sharma <ping2r...@gmail.com> wrote:
>
>> Even in those cases it's better to increase parallelism of bolts.
>>
>> So right now if u create a thread pool of say 50 threads and then from
>> bolt put the work on this thread pool to do async work then it's better
>> just put the bolt parallelism to 50,  that way u will be able to track
>> performance of ur topology properly,  no hidden threads. Bolt capacity will
>> be true value, bolt latency will be true value etc etc
>>
>> Ravi
>> On 17 Oct 2015 09:27, "Enno Shioji" <eshi...@gmail.com> wrote:
>>
>>> Assuming the client does async IO, this strategy can be useful in some
>>> cases. For example, imagine instead of Cassandra you are fetching URL
>>> content from the web (from different domains). Because you can talk to a
>>> lot of servers that are slow to respond, it makes sense to fire a lot of
>>> connections at the same time and process them as they finish (and thus have
>>> thousands of live connections at a time). By using async IO, you can handle
>>> thousands of such mostly idle connections with very small number of threads.
>>>
>>> So if your Cassandra cluster is massive compared to your Storm cluster,
>>> and your queries are such that they take a long time and data returned to
>>> you is small enough etc. such that you can handle the responses in your
>>> smaller Storm cluster, it may well be sensible. But I think a situation
>>> like this will be very uncommon.
>>>
>>> If the client doesn't do async IO, it shouldn't be very different from
>>> increasing your task concurrency, while of course it won't be exactly the
>>> same. This type of clients are mostly there to allow you to do useful work
>>> while the request is running and thus improve your response time.
>>>
>>> That said you can always test to find out :)
>>>
>>>
>>>
>>> On Sat, Oct 17, 2015 at 7:53 AM, Ravi Sharma <ping2r...@gmail.com>
>>> wrote:
>>>
>>>> Hi Rajiv,
>>>> I am not sure if this will increase throughput in any ways. You still
>>>> have fixed resources and work done is still same. Its just instead of using
>>>> Bolt main thread you are spawning new thread.
>>>> I see it as all -ve, because your all work must be done on Bolt thread,
>>>> thats how you will scale it and will know exactly whats happening in your
>>>> topology and if you need to increase parallelism of your bolts. With async
>>>> thread you will see that ur bolts are running really fast and you can do
>>>> more work, but in reality they are not.
>>>>
>>>> Ravi.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Oct 16, 2015 at 2:19 AM, Rajiv Jivan <raji...@yahoo.com> wrote:
>>>>
>>>>> In order to increase throughput a colleague suggest making the logic
>>>>> in the bolt asynchronous by using Guava Futures. The bolt is writing to a
>>>>> cassadra db. Is this a recommended way to relieve back pressure?
>>>>>
>>>>> Pseudo code
>>>>>
>>>>> @Override
>>>>> public void execute(Tuple tuple)  {
>>>>>
>>>>> // execute query on cassandra
>>>>> ResultSetFuture future =  session.executeAsync(".");
>>>>>
>>>>> Futures.addCallback(future, new FutureCallback ()  {
>>>>> @Override
>>>>> public void onSuccess(ResultSet result) {
>>>>> outputCollector.ack(tuple);
>>>>> }
>>>>>
>>>>> @Override
>>>>> public void onFailure(Throwable t)  {
>>>>> outputCollector.fail(tuple);
>>>>> }
>>>>> }, MoreExecutors.sameThreadExecutor());
>>>>> }
>>>>>
>>>>>
>>>>
>>>
>


Re: Java Async in Storm Bolts - Good idea?

2015-10-17 Thread Ravi Sharma
Even in those cases it's better to increase parallelism of bolts.

So right now if u create a thread pool of say 50 threads and then from bolt
put the work on this thread pool to do async work then it's better just put
the bolt parallelism to 50,  that way u will be able to track performance
of ur topology properly,  no hidden threads. Bolt capacity will be true
value, bolt latency will be true value etc etc

Ravi
On 17 Oct 2015 09:27, "Enno Shioji" <eshi...@gmail.com> wrote:

> Assuming the client does async IO, this strategy can be useful in some
> cases. For example, imagine instead of Cassandra you are fetching URL
> content from the web (from different domains). Because you can talk to a
> lot of servers that are slow to respond, it makes sense to fire a lot of
> connections at the same time and process them as they finish (and thus have
> thousands of live connections at a time). By using async IO, you can handle
> thousands of such mostly idle connections with very small number of threads.
>
> So if your Cassandra cluster is massive compared to your Storm cluster,
> and your queries are such that they take a long time and data returned to
> you is small enough etc. such that you can handle the responses in your
> smaller Storm cluster, it may well be sensible. But I think a situation
> like this will be very uncommon.
>
> If the client doesn't do async IO, it shouldn't be very different from
> increasing your task concurrency, while of course it won't be exactly the
> same. This type of clients are mostly there to allow you to do useful work
> while the request is running and thus improve your response time.
>
> That said you can always test to find out :)
>
>
>
> On Sat, Oct 17, 2015 at 7:53 AM, Ravi Sharma <ping2r...@gmail.com> wrote:
>
>> Hi Rajiv,
>> I am not sure if this will increase throughput in any ways. You still
>> have fixed resources and work done is still same. Its just instead of using
>> Bolt main thread you are spawning new thread.
>> I see it as all -ve, because your all work must be done on Bolt thread,
>> thats how you will scale it and will know exactly whats happening in your
>> topology and if you need to increase parallelism of your bolts. With async
>> thread you will see that ur bolts are running really fast and you can do
>> more work, but in reality they are not.
>>
>> Ravi.
>>
>>
>>
>>
>>
>>
>> On Fri, Oct 16, 2015 at 2:19 AM, Rajiv Jivan <raji...@yahoo.com> wrote:
>>
>>> In order to increase throughput a colleague suggest making the logic in
>>> the bolt asynchronous by using Guava Futures. The bolt is writing to a
>>> cassadra db. Is this a recommended way to relieve back pressure?
>>>
>>> Pseudo code
>>>
>>> @Override
>>> public void execute(Tuple tuple)  {
>>>
>>> // execute query on cassandra
>>> ResultSetFuture future =  session.executeAsync(".");
>>>
>>> Futures.addCallback(future, new FutureCallback ()  {
>>> @Override
>>> public void onSuccess(ResultSet result) {
>>> outputCollector.ack(tuple);
>>> }
>>>
>>> @Override
>>> public void onFailure(Throwable t)  {
>>> outputCollector.fail(tuple);
>>> }
>>> }, MoreExecutors.sameThreadExecutor());
>>> }
>>>
>>>
>>
>


Re: Multiple Spouts in Same topology or Topology per spout

2015-10-11 Thread Ravi Sharma
That depends if ur spout error has affected jvm or normal application error

performance issue in case of lot of errors, I don't think there is any
issue be coz of errors themselves but ofcourse if u r retrying these
messages on failure then that means u will be processing lot of messages
then normal and overall throughput will go down

Ravi

If ur topology has enabled acknowledgment that means spout will always
receive
On 11 Oct 2015 18:15, "Ankur Garg" <ankurga...@gmail.com> wrote:

>
> Thanks for the reply Abhishek and Ravi .
>
> One question though , going with One topology with multiple spouts ...What
> if something goes wrong in One spout or its associated bolts .. Does it
> impact other Spout as well?
>
> Thanks
> Ankur
>
> On Sun, Oct 11, 2015 at 10:21 PM, Ravi Sharma <ping2r...@gmail.com> wrote:
>
>> No 100% right ansers , u will have to test and see what will fit..
>>
>> persoanlly i wud suggest Multiple spouts in one Topology and if you have
>> N node where topology will be running then each Spout(reading from one
>> queue) shud run N times in parallel.
>>
>> if 2 Queues and say 4 Nodes
>> then one topolgy
>> 4 Spouts reading from Queue1 in different nodes
>> 4 spouts reading from Queue2 in different nodes
>>
>> Ravi.
>>
>> On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya <abhishek.pr...@gmail.com
>> > wrote:
>>
>>> I guess this is a question where there r no really correct answers. I'll
>>> certainly avoid#1 as it is better to keep logic separate and lightweight.
>>>
>>> If your downstream bolts are same, then it makes senses to keep them in
>>> same topology but if they r totally different, I'll keep them in two
>>> different topologies. That will allow me to independently deploy and scale
>>> the topology. But if the rest of logic is same I topology scaling and
>>> resource utilization will be better with one topology.
>>>
>>> I hope this helps..
>>>
>>> Sent somehow
>>>
>>> > On Oct 11, 2015, at 9:07 AM, Ankur Garg <ankurga...@gmail.com> wrote:
>>> >
>>> > Hi ,
>>> >
>>> > So I have a situation where I want to read messages from different
>>> queues hosted in a Rabbitmq Server .
>>> >
>>> > Now , there are three ways which I can think to leverage Apache Storm
>>> here :-
>>> >
>>> > 1) Use the same Spout (say Spout A) to read messages from different
>>> queues and based on the messages received emit it to different Bolts.
>>> >
>>> > 2) Use different Spout (Spout A and Spout B and so on) within the same
>>> topology (say Topology A) to read messages from different queues .
>>> >
>>> > 3) Use Different Spouts one within eachTopology (Topology A , Topology
>>> B and so on) to read messages from different queues .
>>> >
>>> > Which is the best way to process this considering I want high
>>> throughput (more no of queue messages to be processed concurrently) .
>>> >
>>> > Also , If In use same Topology for all Spouts (currently though
>>> requirement is for 2 spouts)  will failure in one Spout (or its associated
>>> Bolts) effect the second or will they both continue working separately even
>>> if some failure is in Spout B ?
>>> >
>>> > Cost wise , how much would it be to maintain two different topologies .
>>> >
>>> > Looking for inputs from members here.
>>> >
>>> > Thanks
>>> > Ankur
>>> >
>>> >
>>>
>>
>>
>


Re: Multiple Spouts in Same topology or Topology per spout

2015-10-11 Thread Ravi Sharma
No 100% right ansers , u will have to test and see what will fit..

persoanlly i wud suggest Multiple spouts in one Topology and if you have N
node where topology will be running then each Spout(reading from one queue)
shud run N times in parallel.

if 2 Queues and say 4 Nodes
then one topolgy
4 Spouts reading from Queue1 in different nodes
4 spouts reading from Queue2 in different nodes

Ravi.

On Sun, Oct 11, 2015 at 5:25 PM, Abhishek priya 
wrote:

> I guess this is a question where there r no really correct answers. I'll
> certainly avoid#1 as it is better to keep logic separate and lightweight.
>
> If your downstream bolts are same, then it makes senses to keep them in
> same topology but if they r totally different, I'll keep them in two
> different topologies. That will allow me to independently deploy and scale
> the topology. But if the rest of logic is same I topology scaling and
> resource utilization will be better with one topology.
>
> I hope this helps..
>
> Sent somehow
>
> > On Oct 11, 2015, at 9:07 AM, Ankur Garg  wrote:
> >
> > Hi ,
> >
> > So I have a situation where I want to read messages from different
> queues hosted in a Rabbitmq Server .
> >
> > Now , there are three ways which I can think to leverage Apache Storm
> here :-
> >
> > 1) Use the same Spout (say Spout A) to read messages from different
> queues and based on the messages received emit it to different Bolts.
> >
> > 2) Use different Spout (Spout A and Spout B and so on) within the same
> topology (say Topology A) to read messages from different queues .
> >
> > 3) Use Different Spouts one within eachTopology (Topology A , Topology B
> and so on) to read messages from different queues .
> >
> > Which is the best way to process this considering I want high throughput
> (more no of queue messages to be processed concurrently) .
> >
> > Also , If In use same Topology for all Spouts (currently though
> requirement is for 2 spouts)  will failure in one Spout (or its associated
> Bolts) effect the second or will they both continue working separately even
> if some failure is in Spout B ?
> >
> > Cost wise , how much would it be to maintain two different topologies .
> >
> > Looking for inputs from members here.
> >
> > Thanks
> > Ankur
> >
> >
>


Re: Does Storm work with Spring

2015-10-10 Thread Ravi Sharma
Hi Ankur,
local it may be working but It wont work in Actual cluster.

Think about SpringContext is collection of your so many resoucres, like
Database connections , may be HTTP connections , Thread pools etc.
These things wont get serialised and just go to other machines and start
working.

SO basically in init methods of bolt and spout, you need to call some
singloton class like this

ApplicationContext ac = SingletonApplicationContext.getContext();

SingletonApplicationContext will have a static variable ApplicationContext
and in getContext you will check if static variable has been initialised if
not then u will initilize it, and then return it(normal Singleton class)


Now when Topolgy will move to any other node, Bolt and spouts will start
and first init call will initialize it and other bolt/spouts will just use
that.

As John mentioned, its very important to mark all Spring beans and Context
as transient.

Hope it helps.

Ravi.





On Sat, Oct 10, 2015 at 6:25 AM, Ankur Garg <ankurga...@gmail.com> wrote:

> Hi Javier ,
>
> So , I am using a Local cluster on my dev machine where I am using Eclipse
> . Here , I am passing Springs ApplicationContext as constructor argument to
> spouts and bolts .
>
> TopologyBuilder builder = new TopologyBuilder();
>
> builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext), 10);
>
> builder.setBolt("mapBolt", new GroupingBolt(appContext),
> 10).shuffleGrouping("rabbitMqSpout");
>
> builder.setBolt("reduceBolt", new PublishingBolt(appContext),
> 10).shuffleGrouping("mapBolt");
>
> Config conf = new Config();
>
> conf.registerSerialization(EventBean.class); /
>
> conf.registerSerialization(InputQueueManagerImpl.class);
>
> conf.setDebug(true);
>
>  LocalCluster cluster = new LocalCluster();
>
> cluster.submitTopology("test", conf, builder.createTopology());
>
>
> And in my spouts and Bolts ,
>
> I make my Application Context variable as static  . So when it is launched
> by c;uster.submitTopology , my context is still avalilable
>
>
> private static ApplicationContext ctx;
>
> public RabbitListnerSpout(ApplicationContext appContext) {
>
> LOG.info("RabbitListner Constructor called");
>
> ctx = appContext;
>
> }
>
>
> @SuppressWarnings("rawtypes")
>
> @Override
>
> public void open(Map conf, TopologyContext context,SpoutOutputCollector
> collector) {
>
> LOG.info("Inside the open Method for RabbitListner Spout");
>
> inputManager = (InputQueueManagerImpl) ctx.getBean(InputQueueManagerImpl.
> class);
>
> notificationManager = (NotificationQueueManagerImpl) ctx
> .getBean(NotificationQueueManagerImpl.class);
>
> eventExchange = ctx.getEnvironment().getProperty(
> "input.rabbitmq.events.exchange");
>
> routingKey = ctx.getEnvironment().getProperty(
> "input.rabbitmq.events.routingKey");
>
> eventQueue = ctx.getEnvironment().getProperty(
> "input.rabbitmq.events.queue");
>
> _collector = collector;
>
> LOG.info("Exiting the open Method for RabbitListner Spout");
>
> }
>
>
> This is working like a charm (my ApplicationContext is initialized
> seperately ) . As we all know , ApplicationContext is not serializable .
> But this works well in LocalCluster.
>
> My assumption is that it will work in a seperate Cluster too . Is my
> assumption correct ??
>
> On Fri, Oct 9, 2015 at 9:04 PM, Javier Gonzalez <jagon...@gmail.com>
> wrote:
>
>> IIRC, only if everything you use in your spouts and bolts is
>> serializable.
>> On Oct 6, 2015 11:29 PM, "Ankur Garg" <ankurga...@gmail.com> wrote:
>>
>>> Hi Ravi ,
>>>
>>> I was able to make an Integration with Spring but the problem is that I
>>> have to autowire for every bolt and spout . That means that even if i
>>> parallelize spout and bolt it will get started to each instance  . Is there
>>> some way that I only have to do for bolts and spouts once (I mean if I
>>> parallelize bolts or spouts individually it can share the conf from
>>> somewhere) . IS this possible??
>>>
>>> Thanks
>>> Ankur
>>>
>>> On Tue, Sep 29, 2015 at 7:57 PM, Ravi Sharma <ping2r...@gmail.com>
>>> wrote:
>>>
>>>> Yes this is for annotation also...
>>>>
>>>> you can call this method in prepare()  method of bolt and onOpen()
>>>> method
>>>> in every Spout and make sure you don't use any autowire bean before this
>>>> call.
>>>>
>>>>
>>>>
>>>>
>>>>

Re: Does Storm work with Spring

2015-10-10 Thread Ravi Sharma
Basically u will have two context defined at different time/phase

When u r about to submit the topology, u need to build topology, that
context only need information about spouts and bolts.  You don't need any
application bean like database accessories or ur services etc, as at this
level u r not running ur application but u r just creating a topology and
defining how bolts and spouts are connected to each other etc etc

Now once topology is submitted, topology will be moved to one of the
supervisor node and will start running, all spouts and bolts will be
initialized,  at this moment u will need ur application context, which
doesn't need ur earlier topology context

So I will suggest keep both context separate.

Topology is not complex to build, smaller topology can be built via code
only, I. E. Which bolt listening to which spout, but if u want to go with
good design, I say just write a small wrapper to read some json where u can
define ur bolts and spouts and use that to build topology (u can use spring
but it's not much needed)

In past I have done it using both json setting (without spring) and xml
setting (with spring) both works good

Ravi
On 11 Oct 2015 06:38, "Ankur Garg" <ankurga...@gmail.com> wrote:

> Oh The problem here is I have many beans and which need to be initialized
> (some are reading conf from yml files , database connection , thread pool
> initialization etc) .
>
>
> Now , I have written a spring boot application which takes care of all the
> above and I define my topology inside one of the beans , Here is my bean
>
> @Autowired
> ApplicationContext appContext;
>
> @Bean
> public void submitTopology() throws
> AlreadyAliveException,InvalidTopologyException {
>
>TopologyBuilder builder = new TopologyBuilder();
>
>builder.setSpout("rabbitMqSpout", new RabbitListnerSpout(appContext),
> 10);
>
>builder.setBolt("mapBolt", new GroupingBolt(appContext),
> 10).shuffleGrouping("rabbitMqSpout");
>
> builder.setBolt("reduceBolt", new PublishingBolt(appContext),
> 10).shuffleGrouping("mapBolt");
>
> Config conf = new Config();
>
> conf.registerSerialization(EventBean.class); // To be registered with Kyro
> for Storm
>
> conf.registerSerialization(InputQueueManagerImpl.class);
>
> conf.setDebug(true);
>
>  conf.setMessageTimeoutSecs(200);
>
>LocalCluster cluster = new LocalCluster();
>
>   cluster.submitTopology("test", conf, builder.createTopology());
>
> }
>
>
> When this bean is initialized , I already have appContext initialized by
> my Spring Boot Application . So , the thing is , I am using SpringBoot to
> initialize and load my context with all beans .
>
> Now this is the context which I want to leverage in my spouts and bolts .
>
> So , if what I suggested earlier does  not work on Storm Distributed
> Cluster , I need to find a way of initializing my AppContext somehow:(
>
> I would be really thankful if anyone here can help me :(
>
>
> Thanks
>
> Ankur
>
> On Sun, Oct 11, 2015 at 5:54 AM, Javier Gonzalez <jagon...@gmail.com>
> wrote:
>
>> The local cluster runs completely within a single JVM AFAIK. The local
>> cluster is useful for development, testing your topology, etc. The real
>> deployment has to go through nimbus, run on workers started by supervisors
>> on one or more nodes, etc. Kind of difficult to simulate all that on a
>> single box.
>>
>> On Sat, Oct 10, 2015 at 1:45 PM, Ankur Garg <ankurga...@gmail.com> wrote:
>>
>>> Oh ...So I will have to test it in a cluster.
>>>
>>> Having said that, how is local cluster which we use is too different
>>> from normal cluster.. Ideally ,it shud simulate normal cluster..
>>> On Oct 10, 2015 7:51 PM, "Ravi Sharma" <ping2r...@gmail.com> wrote:
>>>
>>>> Hi Ankur,
>>>> local it may be working but It wont work in Actual cluster.
>>>>
>>>> Think about SpringContext is collection of your so many resoucres, like
>>>> Database connections , may be HTTP connections , Thread pools etc.
>>>> These things wont get serialised and just go to other machines and
>>>> start working.
>>>>
>>>> SO basically in init methods of bolt and spout, you need to call some
>>>> singloton class like this
>>>>
>>>> ApplicationContext ac = SingletonApplicationContext.getContext();
>>>>
>>>> SingletonApplicationContext will have a static variable
>>>> ApplicationContext and in getContext you will check if static variable has
>>>> been initialised if not then u will initilize

Re: Please help

2015-10-08 Thread Ravi Sharma
Hi Steve,
Storm 's basic design is to process stream(open ended or say No End) in
real time. There will be few hack ways of stoping the cluster once file is
finished but i guess none of them will be good looking.
Basically your storm cluster should be running all the time and waiting for
more message to process.

One dirty way is to send poison tuple after end of file, Bolt will receive
it, acknowledge it, and then when you receive ack in Spout you can kill ur
JVM manually. (Not good looking).

Ravi.

On Thu, Oct 8, 2015 at 5:17 PM, steve tueno  wrote:

> Hi,
> I'm implementing WordCount with storm. Sentences are read in a file and my
> problem is what it the best way to stop LocalCluster when splout reach end
> of file...
>
> Cordialement,
> TUENO FOTSO STEVE JEFFREY
> Élève Ingénieur
> 5GI ENSP
> +237 676 57 17 28
> https://play.google.com/store/apps/details?id=com.polytech.remotecomputer
> http://github.com/stuenofotso/remoteComputer
> https://play.google.com/store/apps/details?id=com.polytech.androidsmssender
>
> https://github.com/stuenofotso/notre-jargon
>


Re: Approach to parallelism

2015-10-06 Thread Ravi Sharma
Nick,
Look into your queue sizing. Both network bound and in memory.

Also i also try to use this pattern

say i have Spout S1 and two bolts B1 and B2 doing something for it. (S1 ->
B1-> B2)

lets say i have to run bolts in parallel (means 2 instance of B1 and two
instance of B2 )

and assume i have 2 workers available on two different nodes

then i try to run S1 also 2 times(if my spout source allow concurrency)

that S1 -> B1 -> B2  will be in one JVM
and S2 -> B1 -> B2 in another JVM, that decreases network bound message
from one JVM to other.

So if you have 6 JVMs running, i say start with 6 spouts.

This is simple scenario, topologies can be complex and you can tweak this
rule little bit to fit your use case.

Thanks
Ravi.


On Mon, Oct 5, 2015 at 4:29 PM, Nick R. Katsipoulakis  wrote:

> Hello guys,
>
> This is a really interesting discussion. I am also trying to fine-tune the
> performance of my cluster and especially my end-to-end-latency which ranges
> from 200-1200 msec for a topology with 2 spouts (each one with 2k tuples
> per second input rate) and 3 bolts. My cluster consists of 3 zookeeper
> nodes (1 shared with nimbus) and 6 supervisor nodes, all of them being AWS
> m4.xlarge instances.
>
> I am pretty sure that the latency I am experiencing is ridiculous and I
> currently have no ideas what to do to improve that. I have 3 workers per
> node, which I will drop it to one worker per node after this discussion and
> see if I have better results.
>
> Thanks,
> Nick
>
> On Mon, Oct 5, 2015 at 10:40 AM, Kashyap Mhaisekar 
> wrote:
>
>> Anshu,
>> My methodology was as follows. Since the true parallelism of a machine is
>> the the no. of cores, I set the workers equal to no. of cores. (5 in my
>> case). That being said, since we have 32 GB per box, we usually leave 50%
>> off leaving us 16 GB spread across 5 machines. Hence we set the worker heap
>> at 3g.
>>
>> This was before Javiers and Michaels suggestion of keeping one JVM per
>> node...
>>
>> Ours is a single topology running on the boxes and hence I would be
>> changing it to one JVM (worker) per box and rerunning.
>>
>> Thanks
>> Kashyap
>>
>> On Mon, Oct 5, 2015 at 9:18 AM, anshu shukla 
>> wrote:
>>
>>> Sorry for reposting !! Any suggestions Please .
>>>
>>> Just one query How we can map -
>>> *1-no of workers to number of  cores *
>>> *2-no of slots on one machine to number of cores over that machine*
>>>
>>> On Mon, Oct 5, 2015 at 7:32 PM, John Yost 
>>> wrote:
>>>
 Hi Javier,

 Gotcha, I am seeing the same thing, and I see a ton of worker restarts
 as well.

 Thanks

 --John

 On Mon, Oct 5, 2015 at 9:01 AM, Javier Gonzalez 
 wrote:

> I don't have numbers, but I did see a very noticeable degradation of
> throughput and latency when using multiple workers per node with the same
> topology.
> On Oct 5, 2015 7:25 AM, "John Yost"  wrote:
>
>> Hi Everyone,
>>
>> I am curious--are there any benchmark numbers that demonstrate how
>> much better one worker per node is?  The reason I ask is that I may need 
>> to
>> double up the workers on my cluster and I was wondering how much of a
>> throughput hit I may take from having two workers per node.
>>
>> Any info would be very much appreciated--thanks! :)
>>
>> --John
>>
>>
>>
>> On Sat, Oct 3, 2015 at 9:04 AM, Javier Gonzalez 
>> wrote:
>>
>>> I would suggest sticking with a single worker per machine. It makes
>>> memory allocation easier and it makes inter-component communication much
>>> more efficient. Configure the executors with your parallelism hints to 
>>> take
>>> advantage of all your availabe CPU cores.
>>>
>>> Regards,
>>> JG
>>>
>>> On Sat, Oct 3, 2015 at 12:10 AM, Kashyap Mhaisekar <
>>> kashya...@gmail.com> wrote:
>>>
 Hi,
 I was trying to come up with an approach to evaluate the
 parallelism needed for a topology.

 Assuming I have 5 machines with 8 cores and 32 gb. And my topology
 has one spout and 5 bolts.

 1. Define one worker port per CPU to start off. (= 8 workers per
 machine ie 40 workers over all)
 2. Each worker spawns one executor per component per worker, it
 translates to 6 executors per worker which is 40x6= 240 executors.
 3. Of this, if the bolt logic is CPU intensive, then leave
 parallelism hint  at 40 (total workers), else increase parallelism hint
 beyond 40 till you hit a number beyond which there is no more visible
 performance.

 Does this look right?

 Thanks
 Kashyap

>>>
>>>
>>>
>>> --
>>> Javier González Nicolini
>>>

Re: Does Storm work with Spring

2015-09-29 Thread Ravi Sharma
Yes this is for annotation also...

you can call this method in prepare()  method of bolt and onOpen() method
in every Spout and make sure you don't use any autowire bean before this
call.




Ravi.




On Tue, Sep 29, 2015 at 2:22 PM, Ankur Garg <ankurga...@gmail.com> wrote:

> Hi Ravi ,
>
> Thanks for your reply . I am using annotation based configuration and using
> Spring Boot.
>
> Any idea how to do it using annotations ?
>
>
>
> On Tue, Sep 29, 2015 at 6:41 PM, Ravi Sharma <ping2r...@gmail.com> wrote:
>
> > Bolts and Spouts are created by Storm and not known to Spring Context.
> You
> > need to manually add them to SpringContext, there are few methods
> available
> > i.e.
> >
> >
> >
> SpringContext.getContext().getAutowireCapableBeanFactory().autowireBeanProperties(this,
> > AutowireCapableBeanFactory.AUTOWIRE_AUTODETECT, false);
> >
> > SpringContext is my own class where i have injected SpringContext so
> > SpringContext.getContext() returns the actuall Spring Context
> >
> >
> >
> >
> > Ravi.
> >
> >
> > On Tue, Sep 29, 2015 at 1:03 PM, Ankur Garg <ankurga...@gmail.com>
> wrote:
> >
> > > Hi ,
> > >
> > > I am building a Storm topology with set of Spouts and Bolts  and also
> > using
> > > Spring for Dependency Injection .
> > >
> > > Unfortunately , none of my fields are getting autowired even though I
> > have
> > > declared all my spouts and Bolts as @Components .
> > >
> > > However the place where I am declaring my topology , Spring is working
> > fine
> > > .
> > >
> > > Is it because cluster.submitTopology("test", conf,
> > > builder.createTopology())
> > >  submits the topology to a cluster (locally it spawns different thread
> > for
> > > Spouts and Bolts) that Autowiring is not working?
> > >
> > > Please suggest .
> > >
> >
>