Re: Implementing a Trident Spout

2014-03-02 Thread David Smith
Also I don't see a way to a fail batch programmatically like you can do
with traditional storm? What happens if I throw a Failed Exception from a
within function, state query or persist?


On Sun, Mar 2, 2014 at 9:27 AM, David Smith davidksmit...@gmail.com wrote:

 I'm trying to implement ITridentSpout but I'm having a hard time figure
 out where acking for a batch happens. What's the difference between:

- ITridentSpout.BatchCoordinator.success
- ITridentSpout.Emitter.success

 What will be called when the whole batch is completed by trident topology?

 Thanks,
 David



Zookeepr on different ports

2014-03-02 Thread Arun Sethia
Hi,

We have setup three zookeeper instances on one virtual machine, they
are running on different ports (2181,2182,2183).

Eventually in production we will have each instance on separate
virtual machine and we can have same port (2181).

We have seen we can configure multiple zookeeper instance (cluster)
using storm.zookeeper.servers, and we can use storm.zookeeper.port to
define a port.

Since we have zookeeper on one machine on different ports
(2181,2182,2183), but not able to configure different ports using
storm.zookeeper.port.

Any help will be great for us.

Regards,
Arun


Re: Netty Errors, chain reaction, topology breaks down

2014-03-02 Thread Sean Allen
We have the same issue and after attempting a few fixes, we switched back
to using 0mq for now.


On Sun, Mar 2, 2014 at 2:46 PM, Drew Goya d...@gradientx.com wrote:

 Hey All, I'm running a 0.9.0.1 storm topology in AWS EC2 and I
 occasionally run into a strange and pretty catastrophic error.  One of my
 workers is either overloaded or stuck and gets killed and restarted.  This
 usually works fine but once in a while the whole topology breaks down, all
 the workers are killed and restarted continually.  Looking through the logs
 it looks like some netty errors on initialization kill the Async Loop.  The
 topology is never able to recover, I have to kill it manually and relaunch
 it.

 Is this something anyone else has come across?  Any tips? Config settings
 I could change?

 This is a pastebin of the errors:  http://pastebin.com/XXZBsEj1




-- 

Ce n'est pas une signature


Re: Tuning and nimbus at 99%

2014-03-02 Thread Sean Solbak
This is the first step of 4. When I save to db I'm actually saving to a queue, 
(just using db for now).  The 2nd step we index the data and 3rd we do 
aggregation/counts for reporting.  The last is a search that I'm planning on 
using drpc for.  Within step 2 we pipe certain datasets in real time to the 
clients it applies to.  I'd like this and the drpc to be sub 2s which should be 
reasonable.

Your right that I could speed up step 1 by not using trident but our 
requirements seem like a good use case for the other 3 steps.  With many 
results per second batching should effect performance a ton if the batch size 
is small enough.

What would cause nimbus to be at 100% CPU with the topologies killed? 

Sent from my iPhone

 On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com wrote:
 
 Is there a reason you are using trident? 
 
 If you don't need to handle the events as a batch, you are probably going to 
 get performance w/o it.
 
 
 On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote:
 Im writing a fairly basic trident topology as follows:
 
 - 4 spouts of events
 - merges into one stream
 - serializes the object as an event in a string
 - saves to db
 
 I split the serialization task away from the spout as it was cpu intensive 
 to speed it up.
 
 The problem I have is that after 10 minutes there is over 910k tuples 
 emitted/transfered but only 193k records are saved.
 
 The overall load of the topology seems fine.
  
 - 536.404 ms complete latency at the topolgy level
 - The highest capacity of any bolt is 0.3 which is the serialization one.
 - each bolt task has sub 20 ms execute latency and sub 40 ms process latency.
 
 So it seems trident has all the records internally, but I need these events 
 as close to realtime as possible.
 
 Does anyone have any guidance as to how to increase the throughput?  Is it 
 simply a matter of tweeking max spout pending and the batch size?
 
 Im running it on 2 m1-smalls for now.  I dont see the need to upgrade it 
 until the demand on the boxes seems higher.  Although CPU usage on the 
 nimbus box is pinned.  Its at like 99%.  Why would that be?  Its at 99% even 
 when all the topologies are killed.
 
 We are currently targeting processing 200 million records per day which 
 seems like it should be quite easy based on what Ive read that other people 
 have achieved.  I realize that hardware should be able to boost this as well 
 but my first goal is to get trident to push the records to the db quicker.
 
 Thanks in advance,
 Sean
 
 
 
 -- 
 
 Ce n'est pas une signature


Re: Zookeepr on different ports

2014-03-02 Thread Michael Rose
I'd recommend just using one Zookeeper instance if they're on the same
physical host. There's no reason why a development ZK ensemble needs 3
nodes.

Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com


On Sun, Mar 2, 2014 at 10:15 AM, Arun Sethia sethia.a...@gmail.com wrote:

 Hi,

 We have setup three zookeeper instances on one virtual machine, they
 are running on different ports (2181,2182,2183).

 Eventually in production we will have each instance on separate
 virtual machine and we can have same port (2181).

 We have seen we can configure multiple zookeeper instance (cluster)
 using storm.zookeeper.servers, and we can use storm.zookeeper.port to
 define a port.

 Since we have zookeeper on one machine on different ports
 (2181,2182,2183), but not able to configure different ports using
 storm.zookeeper.port.

 Any help will be great for us.

 Regards,
 Arun



Re: Tuning and nimbus at 99%

2014-03-02 Thread Sean Solbak
No, they are on seperate machines.  Its a 4 machine cluster - 2 workers, 1
nimbus and 1 zookeeper.

I suppose I could just create a new cluster but Id like to know why this is
occurring to avoid future production outages.

Thanks,
S



On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose mich...@fullcontact.comwrote:

 Are you running Zookeeper on the same machine as the Nimbus box?

 Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak s...@solbak.ca wrote:

 This is the first step of 4. When I save to db I'm actually saving to a
 queue, (just using db for now).  The 2nd step we index the data and 3rd we
 do aggregation/counts for reporting.  The last is a search that I'm
 planning on using drpc for.  Within step 2 we pipe certain datasets in real
 time to the clients it applies to.  I'd like this and the drpc to be sub 2s
 which should be reasonable.

 Your right that I could speed up step 1 by not using trident but our
 requirements seem like a good use case for the other 3 steps.  With many
 results per second batching should effect performance a ton if the batch
 size is small enough.

 What would cause nimbus to be at 100% CPU with the topologies killed?

 Sent from my iPhone

 On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com
 wrote:

 Is there a reason you are using trident?

 If you don't need to handle the events as a batch, you are probably going
 to get performance w/o it.


 On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote:

 Im writing a fairly basic trident topology as follows:

 - 4 spouts of events
 - merges into one stream
 - serializes the object as an event in a string
 - saves to db

 I split the serialization task away from the spout as it was cpu
 intensive to speed it up.

 The problem I have is that after 10 minutes there is over 910k tuples
 emitted/transfered but only 193k records are saved.

 The overall load of the topology seems fine.

 - 536.404 ms complete latency at the topolgy level
 - The highest capacity of any bolt is 0.3 which is the serialization one.
 - each bolt task has sub 20 ms execute latency and sub 40 ms process
 latency.

 So it seems trident has all the records internally, but I need these
 events as close to realtime as possible.

 Does anyone have any guidance as to how to increase the throughput?  Is
 it simply a matter of tweeking max spout pending and the batch size?

 Im running it on 2 m1-smalls for now.  I dont see the need to upgrade it
 until the demand on the boxes seems higher.  Although CPU usage on the
 nimbus box is pinned.  Its at like 99%.  Why would that be?  Its at 99%
 even when all the topologies are killed.

 We are currently targeting processing 200 million records per day which
 seems like it should be quite easy based on what Ive read that other people
 have achieved.  I realize that hardware should be able to boost this as
 well but my first goal is to get trident to push the records to the db
 quicker.

 Thanks in advance,
 Sean




 --

 Ce n'est pas une signature





-- 
Thanks,

Sean Solbak, BsC, MCSD
Solbak Technologies Inc.
780.893.7326 (m)


Re: Tuning and nimbus at 99%

2014-03-02 Thread Sean Solbak
  uintx ErgoHeapSizeLimit = 0
{product}
uintx InitialHeapSize  := 27080896
 {product}
uintx LargePageHeapSizeThreshold= 134217728
{product}
uintx MaxHeapSize  := 698351616
{product}


so initial size of ~25mb and max of ~666 mb

Its a client process (not server ie the command is java -client
-Dstorm.options...).  The process gets killed and restarted continously
with a new PID (which makes getting the PID tough to get stats on).  I dont
have VisualVM but if I run

jstat -gc PID, I get

 S0CS1CS0US1U  EC   EUOC OU   PC
  PUYGC YGCTFGCFGCT GCT
832.0  832.0   0.0   352.9   7168.0   1115.9   17664.0 1796.0   21248.0
16029.6  50.268   0  0.0000.268

At this point I'll likely just rebuild the cluster.  Its not in prod yet as
I still need to tune it.  I should have wrote 2 separate emails :)

Thanks,
S




On Sun, Mar 2, 2014 at 7:10 PM, Michael Rose mich...@fullcontact.comwrote:

 I'm not seeing too much to substantiate that. What size heap are you
 running, and is it near filled? Perhaps attach VisualVM and check for GC
 activity.

  Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak s...@solbak.ca wrote:

 Here it is.  Appears to be some kind of race condition.

 http://pastebin.com/dANT8SQR


 On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose mich...@fullcontact.comwrote:

 Can you do a thread dump and pastebin it? It's a nice first step to
 figure this out.

 I just checked on our Nimbus and while it's on a larger machine, it's
 using 1% CPU. Also look in your logs for any clues.


 Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak s...@solbak.ca wrote:

 No, they are on seperate machines.  Its a 4 machine cluster - 2
 workers, 1 nimbus and 1 zookeeper.

 I suppose I could just create a new cluster but Id like to know why
 this is occurring to avoid future production outages.

 Thanks,
 S



 On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose 
 mich...@fullcontact.comwrote:

 Are you running Zookeeper on the same machine as the Nimbus box?

  Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak s...@solbak.ca wrote:

 This is the first step of 4. When I save to db I'm actually saving to
 a queue, (just using db for now).  The 2nd step we index the data and 3rd
 we do aggregation/counts for reporting.  The last is a search that I'm
 planning on using drpc for.  Within step 2 we pipe certain datasets in 
 real
 time to the clients it applies to.  I'd like this and the drpc to be sub 
 2s
 which should be reasonable.

 Your right that I could speed up step 1 by not using trident but our
 requirements seem like a good use case for the other 3 steps.  With many
 results per second batching should effect performance a ton if the batch
 size is small enough.

 What would cause nimbus to be at 100% CPU with the topologies killed?

 Sent from my iPhone

 On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com
 wrote:

 Is there a reason you are using trident?

 If you don't need to handle the events as a batch, you are probably
 going to get performance w/o it.


 On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote:

 Im writing a fairly basic trident topology as follows:

 - 4 spouts of events
 - merges into one stream
 - serializes the object as an event in a string
 - saves to db

 I split the serialization task away from the spout as it was cpu
 intensive to speed it up.

 The problem I have is that after 10 minutes there is over 910k
 tuples emitted/transfered but only 193k records are saved.

 The overall load of the topology seems fine.

 - 536.404 ms complete latency at the topolgy level
 - The highest capacity of any bolt is 0.3 which is the serialization
 one.
 - each bolt task has sub 20 ms execute latency and sub 40 ms process
 latency.

 So it seems trident has all the records internally, but I need these
 events as close to realtime as possible.

 Does anyone have any guidance as to how to increase the throughput?
  Is it simply a matter of tweeking max spout pending and the batch size?

 Im running it on 2 m1-smalls for now.  I dont see the need to
 upgrade it until the demand on the boxes seems higher.  Although CPU 
 usage
 on the nimbus box is pinned.  Its at like 99%.  Why would that be?  Its 
 at
 99% even when all the topologies are killed.

 We are currently targeting processing 200 million records per day
 which seems like it should be quite easy based on what Ive read that 
 other
 people have 

Re: Tuning and nimbus at 99%

2014-03-02 Thread Michael Rose
The fact that the process is being killed constantly is a red flag. Also,
why are you running it as a client VM?

Check your nimbus.log to see why it's restarting.

Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com


On Sun, Mar 2, 2014 at 7:50 PM, Sean Solbak s...@solbak.ca wrote:

   uintx ErgoHeapSizeLimit = 0
 {product}
 uintx InitialHeapSize  := 27080896
  {product}
 uintx LargePageHeapSizeThreshold= 134217728
 {product}
 uintx MaxHeapSize  := 698351616
 {product}


 so initial size of ~25mb and max of ~666 mb

 Its a client process (not server ie the command is java -client
 -Dstorm.options...).  The process gets killed and restarted continously
 with a new PID (which makes getting the PID tough to get stats on).  I dont
 have VisualVM but if I run

 jstat -gc PID, I get

  S0CS1CS0US1U  EC   EUOC OU   PC
   PUYGC YGCTFGCFGCT GCT
 832.0  832.0   0.0   352.9   7168.0   1115.9   17664.0 1796.0
 21248.0 16029.6  50.268   0  0.0000.268

 At this point I'll likely just rebuild the cluster.  Its not in prod yet
 as I still need to tune it.  I should have wrote 2 separate emails :)

 Thanks,
 S




 On Sun, Mar 2, 2014 at 7:10 PM, Michael Rose mich...@fullcontact.comwrote:

 I'm not seeing too much to substantiate that. What size heap are you
 running, and is it near filled? Perhaps attach VisualVM and check for GC
 activity.

  Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak s...@solbak.ca wrote:

 Here it is.  Appears to be some kind of race condition.

 http://pastebin.com/dANT8SQR


 On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose mich...@fullcontact.comwrote:

 Can you do a thread dump and pastebin it? It's a nice first step to
 figure this out.

 I just checked on our Nimbus and while it's on a larger machine, it's
 using 1% CPU. Also look in your logs for any clues.


 Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak s...@solbak.ca wrote:

 No, they are on seperate machines.  Its a 4 machine cluster - 2
 workers, 1 nimbus and 1 zookeeper.

 I suppose I could just create a new cluster but Id like to know why
 this is occurring to avoid future production outages.

 Thanks,
 S



 On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose 
 mich...@fullcontact.comwrote:

 Are you running Zookeeper on the same machine as the Nimbus box?

  Michael Rose (@Xorlev https://twitter.com/xorlev)
 Senior Platform Engineer, FullContact http://www.fullcontact.com/
 mich...@fullcontact.com


 On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak s...@solbak.ca wrote:

 This is the first step of 4. When I save to db I'm actually saving
 to a queue, (just using db for now).  The 2nd step we index the data and
 3rd we do aggregation/counts for reporting.  The last is a search that 
 I'm
 planning on using drpc for.  Within step 2 we pipe certain datasets in 
 real
 time to the clients it applies to.  I'd like this and the drpc to be 
 sub 2s
 which should be reasonable.

 Your right that I could speed up step 1 by not using trident but our
 requirements seem like a good use case for the other 3 steps.  With many
 results per second batching should effect performance a ton if the batch
 size is small enough.

 What would cause nimbus to be at 100% CPU with the topologies
 killed?

 Sent from my iPhone

 On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com
 wrote:

 Is there a reason you are using trident?

 If you don't need to handle the events as a batch, you are probably
 going to get performance w/o it.


 On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote:

 Im writing a fairly basic trident topology as follows:

 - 4 spouts of events
 - merges into one stream
 - serializes the object as an event in a string
 - saves to db

 I split the serialization task away from the spout as it was cpu
 intensive to speed it up.

 The problem I have is that after 10 minutes there is over 910k
 tuples emitted/transfered but only 193k records are saved.

 The overall load of the topology seems fine.

 - 536.404 ms complete latency at the topolgy level
 - The highest capacity of any bolt is 0.3 which is the
 serialization one.
 - each bolt task has sub 20 ms execute latency and sub 40 ms
 process latency.

 So it seems trident has all the records internally, but I need
 these events as close to realtime as possible.

 Does anyone have any guidance as to how to increase the throughput?
  Is it simply a matter of tweeking max spout pending and the batch 
 size?

 Im running it on 2 

Re: Snapshottable and Snapshotget

2014-03-02 Thread Nathan Marz
Snapshottable is used for storing a single value, like a global count.
SnapshotGet retrieves that value into your Stream.

The globalKey is fixed


On Sun, Mar 2, 2014 at 8:02 PM, Jahagirdar, Madhu 
madhu.jahagir...@philips.com wrote:

   All,

  1) Could any one explain the usecase where Snapshottable and Snapshotget
 be used ?
 2) Also, while using Snapshottable a globalkey = $GLOABL$ is used , is it
 fixed or $ gets replaced by something during the run time ?

  Thanks and Regards,
 Madhu Jahagirdar

 --
 The information contained in this message may be confidential and legally
 protected under applicable law. The message is intended solely for the
 addressee(s). If you are not the intended recipient, you are hereby
 notified that any use, forwarding, dissemination, or reproduction of this
 message is strictly prohibited and may be unlawful. If you are not the
 intended recipient, please contact the sender by return e-mail and destroy
 all copies of the original message.




-- 
Twitter: @nathanmarz
http://nathanmarz.com


Re: Implementing a Trident Spout

2014-03-02 Thread Nathan Marz
Throwing a FailedException is how you programatically fail a batch.


On Sun, Mar 2, 2014 at 8:45 AM, David Smith davidksmit...@gmail.com wrote:

 Also I don't see a way to a fail batch programmatically like you can do
 with traditional storm? What happens if I throw a Failed Exception from a
 within function, state query or persist?


 On Sun, Mar 2, 2014 at 9:27 AM, David Smith davidksmit...@gmail.comwrote:

 I'm trying to implement ITridentSpout but I'm having a hard time figure
 out where acking for a batch happens. What's the difference between:

- ITridentSpout.BatchCoordinator.success
- ITridentSpout.Emitter.success

 What will be called when the whole batch is completed by trident
 topology?

 Thanks,
 David





-- 
Twitter: @nathanmarz
http://nathanmarz.com