Re: How long until fields grouping gets overwhelmed with data?

2016-08-11 Thread Gireesh Ramji
It does not matter who hashes it as long as they all use the same hash function 
it will go to the same bolt

  From: Navin Ipe 
 To: user@storm.apache.org 
 Sent: Thursday, August 11, 2016 4:56 PM
 Subject: Re: How long until fields grouping gets overwhelmed with data?
   
If the hash is dynamically computed and is stateless, then that brings up one 
more question.

Let's say there are two spout classes S1 and S2. I create 10 tasks of S1 and 10 
tasks of S2.
There are 10 tasks of a bolt B.

S1 and S2 are fieldsGrouped with B.

I receive data x in S1 and another data x in S2. 

If S1's emit of x goes to task1 of B, then will S2's emit of x also go to task1 
of B?

Basically the question is: Is the hash value decided by the Spout or by Storm? 
Because if it is decided by the spout, then S1's emit of x can go to task 1 but 
S2's emit of x might go to some other task of the bolt, and that won't serve 
the purpose of someone who wants all x'es to go to one bolt.




On Wed, Aug 10, 2016 at 8:58 PM, Navin Ipe  
wrote:

Oh that's good to know. I assume it works like this: 
https://en.wikipedia.org/wiki/ Hash_function#Hashing_ uniformly_distributed_data

On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung  wrote:

It's based on a modulo of a hash of the field. The fields grouping is stateless.
On Aug 10, 2016 8:18 AM, "Navin Ipe"  wrote:

Hi,

For spouts to be able to continuously send a fields grouped tuple to the same 
bolt, it would have to store a key value map something like this, right?

field1023 ---> Bolt1
field1343 ---> Bolt3
field1629 ---> Bolt5
field1726 ---> Bolt1
field1481 ---> Bolt3

So if my topology runs for a very long time and the spout generates many unique 
field values, won't this key value map run out of memory eventually? 

OR is there a failsafe or a map limit that Storm has to handle this without 
crashing?

If memory problems could happen, what would be an alternative way to solve this 
problem where many unique fields could get generated over time?

-- 
Regards,Navin




-- 
Regards,Navin



-- 
Regards,Navin

   

Token awareness of storm bolts

2016-03-07 Thread Gireesh Ramji
In my topology, I have a bolt that subscribes from its predecessor using a 
fields grouping. Is there an easy way for me to know, in the bolt's execute 
method, the range of keys that will get hashed to a given instance of the bolt 
under the fields grouping?
ThanksGireesh

Questions on storm-cassandra-cql

2016-01-28 Thread Gireesh Ramji
1.) Am I correct in saying that if I define a CassandraCqlMapState with a 
parallelism of N, then  N different instances of the Session object will be 
created? If this is correct, does this not go against what is recommended by 
DataStax: 
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
 where they say that a single Session object should be used across the 
application? Or does that not make sense in the storm context where the state 
could be distributed across different physical nodes?
2.) For all queries, CassandraCqlStateMap prepares a BatchStatement to submit 
to Cassandra. Is this guaranteed to have better performance than executing the 
individual statements asynchronously? Ref: 
https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/.
 
ThanksGireesh