Re: Implementing a Trident Spout
Also I don't see a way to a fail batch programmatically like you can do with traditional storm? What happens if I throw a Failed Exception from a within function, state query or persist? On Sun, Mar 2, 2014 at 9:27 AM, David Smith davidksmit...@gmail.com wrote: I'm trying to implement ITridentSpout but I'm having a hard time figure out where acking for a batch happens. What's the difference between: - ITridentSpout.BatchCoordinator.success - ITridentSpout.Emitter.success What will be called when the whole batch is completed by trident topology? Thanks, David
Zookeepr on different ports
Hi, We have setup three zookeeper instances on one virtual machine, they are running on different ports (2181,2182,2183). Eventually in production we will have each instance on separate virtual machine and we can have same port (2181). We have seen we can configure multiple zookeeper instance (cluster) using storm.zookeeper.servers, and we can use storm.zookeeper.port to define a port. Since we have zookeeper on one machine on different ports (2181,2182,2183), but not able to configure different ports using storm.zookeeper.port. Any help will be great for us. Regards, Arun
Re: Netty Errors, chain reaction, topology breaks down
We have the same issue and after attempting a few fixes, we switched back to using 0mq for now. On Sun, Mar 2, 2014 at 2:46 PM, Drew Goya d...@gradientx.com wrote: Hey All, I'm running a 0.9.0.1 storm topology in AWS EC2 and I occasionally run into a strange and pretty catastrophic error. One of my workers is either overloaded or stuck and gets killed and restarted. This usually works fine but once in a while the whole topology breaks down, all the workers are killed and restarted continually. Looking through the logs it looks like some netty errors on initialization kill the Async Loop. The topology is never able to recover, I have to kill it manually and relaunch it. Is this something anyone else has come across? Any tips? Config settings I could change? This is a pastebin of the errors: http://pastebin.com/XXZBsEj1 -- Ce n'est pas une signature
Re: Tuning and nimbus at 99%
This is the first step of 4. When I save to db I'm actually saving to a queue, (just using db for now). The 2nd step we index the data and 3rd we do aggregation/counts for reporting. The last is a search that I'm planning on using drpc for. Within step 2 we pipe certain datasets in real time to the clients it applies to. I'd like this and the drpc to be sub 2s which should be reasonable. Your right that I could speed up step 1 by not using trident but our requirements seem like a good use case for the other 3 steps. With many results per second batching should effect performance a ton if the batch size is small enough. What would cause nimbus to be at 100% CPU with the topologies killed? Sent from my iPhone On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com wrote: Is there a reason you are using trident? If you don't need to handle the events as a batch, you are probably going to get performance w/o it. On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote: Im writing a fairly basic trident topology as follows: - 4 spouts of events - merges into one stream - serializes the object as an event in a string - saves to db I split the serialization task away from the spout as it was cpu intensive to speed it up. The problem I have is that after 10 minutes there is over 910k tuples emitted/transfered but only 193k records are saved. The overall load of the topology seems fine. - 536.404 ms complete latency at the topolgy level - The highest capacity of any bolt is 0.3 which is the serialization one. - each bolt task has sub 20 ms execute latency and sub 40 ms process latency. So it seems trident has all the records internally, but I need these events as close to realtime as possible. Does anyone have any guidance as to how to increase the throughput? Is it simply a matter of tweeking max spout pending and the batch size? Im running it on 2 m1-smalls for now. I dont see the need to upgrade it until the demand on the boxes seems higher. Although CPU usage on the nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% even when all the topologies are killed. We are currently targeting processing 200 million records per day which seems like it should be quite easy based on what Ive read that other people have achieved. I realize that hardware should be able to boost this as well but my first goal is to get trident to push the records to the db quicker. Thanks in advance, Sean -- Ce n'est pas une signature
Re: Zookeepr on different ports
I'd recommend just using one Zookeeper instance if they're on the same physical host. There's no reason why a development ZK ensemble needs 3 nodes. Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 10:15 AM, Arun Sethia sethia.a...@gmail.com wrote: Hi, We have setup three zookeeper instances on one virtual machine, they are running on different ports (2181,2182,2183). Eventually in production we will have each instance on separate virtual machine and we can have same port (2181). We have seen we can configure multiple zookeeper instance (cluster) using storm.zookeeper.servers, and we can use storm.zookeeper.port to define a port. Since we have zookeeper on one machine on different ports (2181,2182,2183), but not able to configure different ports using storm.zookeeper.port. Any help will be great for us. Regards, Arun
Re: Tuning and nimbus at 99%
No, they are on seperate machines. Its a 4 machine cluster - 2 workers, 1 nimbus and 1 zookeeper. I suppose I could just create a new cluster but Id like to know why this is occurring to avoid future production outages. Thanks, S On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose mich...@fullcontact.comwrote: Are you running Zookeeper on the same machine as the Nimbus box? Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak s...@solbak.ca wrote: This is the first step of 4. When I save to db I'm actually saving to a queue, (just using db for now). The 2nd step we index the data and 3rd we do aggregation/counts for reporting. The last is a search that I'm planning on using drpc for. Within step 2 we pipe certain datasets in real time to the clients it applies to. I'd like this and the drpc to be sub 2s which should be reasonable. Your right that I could speed up step 1 by not using trident but our requirements seem like a good use case for the other 3 steps. With many results per second batching should effect performance a ton if the batch size is small enough. What would cause nimbus to be at 100% CPU with the topologies killed? Sent from my iPhone On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com wrote: Is there a reason you are using trident? If you don't need to handle the events as a batch, you are probably going to get performance w/o it. On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote: Im writing a fairly basic trident topology as follows: - 4 spouts of events - merges into one stream - serializes the object as an event in a string - saves to db I split the serialization task away from the spout as it was cpu intensive to speed it up. The problem I have is that after 10 minutes there is over 910k tuples emitted/transfered but only 193k records are saved. The overall load of the topology seems fine. - 536.404 ms complete latency at the topolgy level - The highest capacity of any bolt is 0.3 which is the serialization one. - each bolt task has sub 20 ms execute latency and sub 40 ms process latency. So it seems trident has all the records internally, but I need these events as close to realtime as possible. Does anyone have any guidance as to how to increase the throughput? Is it simply a matter of tweeking max spout pending and the batch size? Im running it on 2 m1-smalls for now. I dont see the need to upgrade it until the demand on the boxes seems higher. Although CPU usage on the nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% even when all the topologies are killed. We are currently targeting processing 200 million records per day which seems like it should be quite easy based on what Ive read that other people have achieved. I realize that hardware should be able to boost this as well but my first goal is to get trident to push the records to the db quicker. Thanks in advance, Sean -- Ce n'est pas une signature -- Thanks, Sean Solbak, BsC, MCSD Solbak Technologies Inc. 780.893.7326 (m)
Re: Tuning and nimbus at 99%
uintx ErgoHeapSizeLimit = 0 {product} uintx InitialHeapSize := 27080896 {product} uintx LargePageHeapSizeThreshold= 134217728 {product} uintx MaxHeapSize := 698351616 {product} so initial size of ~25mb and max of ~666 mb Its a client process (not server ie the command is java -client -Dstorm.options...). The process gets killed and restarted continously with a new PID (which makes getting the PID tough to get stats on). I dont have VisualVM but if I run jstat -gc PID, I get S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 832.0 832.0 0.0 352.9 7168.0 1115.9 17664.0 1796.0 21248.0 16029.6 50.268 0 0.0000.268 At this point I'll likely just rebuild the cluster. Its not in prod yet as I still need to tune it. I should have wrote 2 separate emails :) Thanks, S On Sun, Mar 2, 2014 at 7:10 PM, Michael Rose mich...@fullcontact.comwrote: I'm not seeing too much to substantiate that. What size heap are you running, and is it near filled? Perhaps attach VisualVM and check for GC activity. Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak s...@solbak.ca wrote: Here it is. Appears to be some kind of race condition. http://pastebin.com/dANT8SQR On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose mich...@fullcontact.comwrote: Can you do a thread dump and pastebin it? It's a nice first step to figure this out. I just checked on our Nimbus and while it's on a larger machine, it's using 1% CPU. Also look in your logs for any clues. Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak s...@solbak.ca wrote: No, they are on seperate machines. Its a 4 machine cluster - 2 workers, 1 nimbus and 1 zookeeper. I suppose I could just create a new cluster but Id like to know why this is occurring to avoid future production outages. Thanks, S On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose mich...@fullcontact.comwrote: Are you running Zookeeper on the same machine as the Nimbus box? Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak s...@solbak.ca wrote: This is the first step of 4. When I save to db I'm actually saving to a queue, (just using db for now). The 2nd step we index the data and 3rd we do aggregation/counts for reporting. The last is a search that I'm planning on using drpc for. Within step 2 we pipe certain datasets in real time to the clients it applies to. I'd like this and the drpc to be sub 2s which should be reasonable. Your right that I could speed up step 1 by not using trident but our requirements seem like a good use case for the other 3 steps. With many results per second batching should effect performance a ton if the batch size is small enough. What would cause nimbus to be at 100% CPU with the topologies killed? Sent from my iPhone On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com wrote: Is there a reason you are using trident? If you don't need to handle the events as a batch, you are probably going to get performance w/o it. On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote: Im writing a fairly basic trident topology as follows: - 4 spouts of events - merges into one stream - serializes the object as an event in a string - saves to db I split the serialization task away from the spout as it was cpu intensive to speed it up. The problem I have is that after 10 minutes there is over 910k tuples emitted/transfered but only 193k records are saved. The overall load of the topology seems fine. - 536.404 ms complete latency at the topolgy level - The highest capacity of any bolt is 0.3 which is the serialization one. - each bolt task has sub 20 ms execute latency and sub 40 ms process latency. So it seems trident has all the records internally, but I need these events as close to realtime as possible. Does anyone have any guidance as to how to increase the throughput? Is it simply a matter of tweeking max spout pending and the batch size? Im running it on 2 m1-smalls for now. I dont see the need to upgrade it until the demand on the boxes seems higher. Although CPU usage on the nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% even when all the topologies are killed. We are currently targeting processing 200 million records per day which seems like it should be quite easy based on what Ive read that other people have
Re: Tuning and nimbus at 99%
The fact that the process is being killed constantly is a red flag. Also, why are you running it as a client VM? Check your nimbus.log to see why it's restarting. Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 7:50 PM, Sean Solbak s...@solbak.ca wrote: uintx ErgoHeapSizeLimit = 0 {product} uintx InitialHeapSize := 27080896 {product} uintx LargePageHeapSizeThreshold= 134217728 {product} uintx MaxHeapSize := 698351616 {product} so initial size of ~25mb and max of ~666 mb Its a client process (not server ie the command is java -client -Dstorm.options...). The process gets killed and restarted continously with a new PID (which makes getting the PID tough to get stats on). I dont have VisualVM but if I run jstat -gc PID, I get S0CS1CS0US1U EC EUOC OU PC PUYGC YGCTFGCFGCT GCT 832.0 832.0 0.0 352.9 7168.0 1115.9 17664.0 1796.0 21248.0 16029.6 50.268 0 0.0000.268 At this point I'll likely just rebuild the cluster. Its not in prod yet as I still need to tune it. I should have wrote 2 separate emails :) Thanks, S On Sun, Mar 2, 2014 at 7:10 PM, Michael Rose mich...@fullcontact.comwrote: I'm not seeing too much to substantiate that. What size heap are you running, and is it near filled? Perhaps attach VisualVM and check for GC activity. Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak s...@solbak.ca wrote: Here it is. Appears to be some kind of race condition. http://pastebin.com/dANT8SQR On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose mich...@fullcontact.comwrote: Can you do a thread dump and pastebin it? It's a nice first step to figure this out. I just checked on our Nimbus and while it's on a larger machine, it's using 1% CPU. Also look in your logs for any clues. Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak s...@solbak.ca wrote: No, they are on seperate machines. Its a 4 machine cluster - 2 workers, 1 nimbus and 1 zookeeper. I suppose I could just create a new cluster but Id like to know why this is occurring to avoid future production outages. Thanks, S On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose mich...@fullcontact.comwrote: Are you running Zookeeper on the same machine as the Nimbus box? Michael Rose (@Xorlev https://twitter.com/xorlev) Senior Platform Engineer, FullContact http://www.fullcontact.com/ mich...@fullcontact.com On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak s...@solbak.ca wrote: This is the first step of 4. When I save to db I'm actually saving to a queue, (just using db for now). The 2nd step we index the data and 3rd we do aggregation/counts for reporting. The last is a search that I'm planning on using drpc for. Within step 2 we pipe certain datasets in real time to the clients it applies to. I'd like this and the drpc to be sub 2s which should be reasonable. Your right that I could speed up step 1 by not using trident but our requirements seem like a good use case for the other 3 steps. With many results per second batching should effect performance a ton if the batch size is small enough. What would cause nimbus to be at 100% CPU with the topologies killed? Sent from my iPhone On Mar 2, 2014, at 5:46 PM, Sean Allen s...@monkeysnatchbanana.com wrote: Is there a reason you are using trident? If you don't need to handle the events as a batch, you are probably going to get performance w/o it. On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak s...@solbak.ca wrote: Im writing a fairly basic trident topology as follows: - 4 spouts of events - merges into one stream - serializes the object as an event in a string - saves to db I split the serialization task away from the spout as it was cpu intensive to speed it up. The problem I have is that after 10 minutes there is over 910k tuples emitted/transfered but only 193k records are saved. The overall load of the topology seems fine. - 536.404 ms complete latency at the topolgy level - The highest capacity of any bolt is 0.3 which is the serialization one. - each bolt task has sub 20 ms execute latency and sub 40 ms process latency. So it seems trident has all the records internally, but I need these events as close to realtime as possible. Does anyone have any guidance as to how to increase the throughput? Is it simply a matter of tweeking max spout pending and the batch size? Im running it on 2
Re: Snapshottable and Snapshotget
Snapshottable is used for storing a single value, like a global count. SnapshotGet retrieves that value into your Stream. The globalKey is fixed On Sun, Mar 2, 2014 at 8:02 PM, Jahagirdar, Madhu madhu.jahagir...@philips.com wrote: All, 1) Could any one explain the usecase where Snapshottable and Snapshotget be used ? 2) Also, while using Snapshottable a globalkey = $GLOABL$ is used , is it fixed or $ gets replaced by something during the run time ? Thanks and Regards, Madhu Jahagirdar -- The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message. -- Twitter: @nathanmarz http://nathanmarz.com
Re: Implementing a Trident Spout
Throwing a FailedException is how you programatically fail a batch. On Sun, Mar 2, 2014 at 8:45 AM, David Smith davidksmit...@gmail.com wrote: Also I don't see a way to a fail batch programmatically like you can do with traditional storm? What happens if I throw a Failed Exception from a within function, state query or persist? On Sun, Mar 2, 2014 at 9:27 AM, David Smith davidksmit...@gmail.comwrote: I'm trying to implement ITridentSpout but I'm having a hard time figure out where acking for a batch happens. What's the difference between: - ITridentSpout.BatchCoordinator.success - ITridentSpout.Emitter.success What will be called when the whole batch is completed by trident topology? Thanks, David -- Twitter: @nathanmarz http://nathanmarz.com