Re: Performance problems with opensocial rpc calls
Matt, I think further investigation is warranted. I really think you need to find a way to trace through the code and find where the slowdown is occurring. That will help us narrow down what the problem is. I know it is production, but getting some code on there that starts timing method calls and such can be very useful. On Tue, Jul 15, 2014 at 3:04 PM, Merrill, Matt mmerr...@mitre.org wrote: Hi Ryan, Thanks for responding! I’ve attached our ehcacheConfig, however, comparing it to the default configuration the only difference is the overall amount of elements (1 in ours vs 1000 in default) and also the temp disk store location. I’m assuming you are asking if each user in our system has the exact same set of gadgets to render, correct? If that’s the case: different users have different sets of gadgets, however, many of them have a default set we give them when they are initially setup in our system. So, many people will hit the same gadgets over and over again. This default subset of gadgets is about 10-12 different gadgets and that is by and large what many users have. However, we have a total of 48 different gadgets that could be rendered by a user at any given time on this instance of shindig. We do run another instance of shindig which could render a different subset of gadgets, but that has a much lower usage and only renders about 10 different gadgets altogether. I am admittedly rusty with my ehCache configuration knowledge, but here’s a couple things I noticed: * I notice that the maxBytesLocalHeap in the ehCacheConfig is 50mb, which seems low, however, this is the same setting we had in shindig 2.0, so I have to wonder if that has anything to do with it. * Our old ehCache configuration for shindig 2.0 specified a defaultCache maxElementsInMemory of 1000 but NO sizeOfPolicy at all. * Our new ehCache configuration for shindig 2.5 specifies a sizeOfPolicy maxDepth of 1 but NO defaultCache maxElementsInMemory. Our heap sizes in tomcat are 2048mb which based on a 50m max heap for a cache seems adequate. This is the same heap size from when we were using shindig 2.0. Unfortunately, we don’t have profiling tools enabled on our Tomcat instances so i can’t see what the heap looked like when things crashed, and like I said, we’re unable to reproduce this in int. I think we might be on to something here… I will keep searching but if any devs out there have any ideas, please let me know. Thanks shindig list! -Matt On 7/13/14, 10:12 AM, Ryan Baxter rbaxte...@gmail.com wrote: Matt can you tell us more about how you have configured the caches in shindig? When you are rendering these gadgets are you rendering the same gadget across all users? -Ryan On Jul 9, 2014, at 3:31 PM, Merrill, Matt mmerr...@mitre.org wrote: Stanton, Thanks for responding! This is one instance of shindig. If you mean the configuration within the container and for the shindig java app, then yes, the locked domains are the same. In fact, the configuration with the exception of shindig¹s host URL¹s is exactly the same from what I can tell. Unfortunately, I don¹t have any way to trace that exact message, but I did do a traceroute from the server running shindig to the URL that is being called for rpc calls to make sure there weren¹t any extra network hops, and there weren¹t, it actually only had one, as expected for an app making an HTTP call to itself. Thanks again for responding. -Matt On 7/9/14, 3:08 PM, Stanton Sievers ssiev...@apache.org wrote: Hi Matt, Is the configuration for locked domains and security tokens consistent between your test and production environments? Do you have any way of tracing the request in the log entry you provided through the network? Is this a single Shindig server or is there any load balancing occurring? Regards, -Stanton On Wed, Jul 9, 2014 at 2:40 PM, Merrill, Matt mmerr...@mitre.org wrote: Hi shindig devs, We are in the process of upgrading from shindig 2.0 to 2.5-update1 and everything has gone ok, however, once we got into our production environment, we are seeing significant slowdowns for the opensocial RPC calls that shindig makes to itself when rendering a gadget. This is obviously very dependent on how we¹ve implemented the shindig interfaces in our own code, and also our infrastructure, however, so we¹re hoping someone on the list can help give us some more ideas for areas to investigate inside shindig itself or in general. Here¹s what¹s happening: * Gadgets load fine when the app is not experiencing much load ( 10 users rendering 10-12 gadgets on a page) * Once a reasonable subset of users begins rendering gadgets, gadget render calls through the ³ifr² endpoint start taking a very long time to respond * The problem gets worse from there * Even with extensive load testing we can¹t recreate this problem in our testing environments * Our system adminstrators have assured
Re: Performance problems with opensocial rpc calls
Yep, that’s where I’m headed next. Obviously there’s some hesitation to do that on the part of our product owners so it takes a while to get to that point. Will let you know what I find. Thanks! -Matt On 7/18/14, 8:44 AM, Ryan Baxter rbaxte...@gmail.com wrote: Matt, I think further investigation is warranted. I really think you need to find a way to trace through the code and find where the slowdown is occurring. That will help us narrow down what the problem is. I know it is production, but getting some code on there that starts timing method calls and such can be very useful. On Tue, Jul 15, 2014 at 3:04 PM, Merrill, Matt mmerr...@mitre.org wrote: Hi Ryan, Thanks for responding! I’ve attached our ehcacheConfig, however, comparing it to the default configuration the only difference is the overall amount of elements (1 in ours vs 1000 in default) and also the temp disk store location. I’m assuming you are asking if each user in our system has the exact same set of gadgets to render, correct? If that’s the case: different users have different sets of gadgets, however, many of them have a default set we give them when they are initially setup in our system. So, many people will hit the same gadgets over and over again. This default subset of gadgets is about 10-12 different gadgets and that is by and large what many users have. However, we have a total of 48 different gadgets that could be rendered by a user at any given time on this instance of shindig. We do run another instance of shindig which could render a different subset of gadgets, but that has a much lower usage and only renders about 10 different gadgets altogether. I am admittedly rusty with my ehCache configuration knowledge, but here’s a couple things I noticed: * I notice that the maxBytesLocalHeap in the ehCacheConfig is 50mb, which seems low, however, this is the same setting we had in shindig 2.0, so I have to wonder if that has anything to do with it. * Our old ehCache configuration for shindig 2.0 specified a defaultCache maxElementsInMemory of 1000 but NO sizeOfPolicy at all. * Our new ehCache configuration for shindig 2.5 specifies a sizeOfPolicy maxDepth of 1 but NO defaultCache maxElementsInMemory. Our heap sizes in tomcat are 2048mb which based on a 50m max heap for a cache seems adequate. This is the same heap size from when we were using shindig 2.0. Unfortunately, we don’t have profiling tools enabled on our Tomcat instances so i can’t see what the heap looked like when things crashed, and like I said, we’re unable to reproduce this in int. I think we might be on to something here… I will keep searching but if any devs out there have any ideas, please let me know. Thanks shindig list! -Matt On 7/13/14, 10:12 AM, Ryan Baxter rbaxte...@gmail.com wrote: Matt can you tell us more about how you have configured the caches in shindig? When you are rendering these gadgets are you rendering the same gadget across all users? -Ryan On Jul 9, 2014, at 3:31 PM, Merrill, Matt mmerr...@mitre.org wrote: Stanton, Thanks for responding! This is one instance of shindig. If you mean the configuration within the container and for the shindig java app, then yes, the locked domains are the same. In fact, the configuration with the exception of shindig¹s host URL¹s is exactly the same from what I can tell. Unfortunately, I don¹t have any way to trace that exact message, but I did do a traceroute from the server running shindig to the URL that is being called for rpc calls to make sure there weren¹t any extra network hops, and there weren¹t, it actually only had one, as expected for an app making an HTTP call to itself. Thanks again for responding. -Matt On 7/9/14, 3:08 PM, Stanton Sievers ssiev...@apache.org wrote: Hi Matt, Is the configuration for locked domains and security tokens consistent between your test and production environments? Do you have any way of tracing the request in the log entry you provided through the network? Is this a single Shindig server or is there any load balancing occurring? Regards, -Stanton On Wed, Jul 9, 2014 at 2:40 PM, Merrill, Matt mmerr...@mitre.org wrote: Hi shindig devs, We are in the process of upgrading from shindig 2.0 to 2.5-update1 and everything has gone ok, however, once we got into our production environment, we are seeing significant slowdowns for the opensocial RPC calls that shindig makes to itself when rendering a gadget. This is obviously very dependent on how we¹ve implemented the shindig interfaces in our own code, and also our infrastructure, however, so we¹re hoping someone on the list can help give us some more ideas for areas to investigate inside shindig itself or in general. Here¹s what¹s happening: * Gadgets load fine when the app is not experiencing much load ( 10 users rendering 10-12 gadgets on a page) * Once a reasonable subset of users begins rendering
Re: Performance problems with opensocial rpc calls
Hi Ryan, Thanks for responding! I’ve attached our ehcacheConfig, however, comparing it to the default configuration the only difference is the overall amount of elements (1 in ours vs 1000 in default) and also the temp disk store location. I’m assuming you are asking if each user in our system has the exact same set of gadgets to render, correct? If that’s the case: different users have different sets of gadgets, however, many of them have a default set we give them when they are initially setup in our system. So, many people will hit the same gadgets over and over again. This default subset of gadgets is about 10-12 different gadgets and that is by and large what many users have. However, we have a total of 48 different gadgets that could be rendered by a user at any given time on this instance of shindig. We do run another instance of shindig which could render a different subset of gadgets, but that has a much lower usage and only renders about 10 different gadgets altogether. I am admittedly rusty with my ehCache configuration knowledge, but here’s a couple things I noticed: * I notice that the maxBytesLocalHeap in the ehCacheConfig is 50mb, which seems low, however, this is the same setting we had in shindig 2.0, so I have to wonder if that has anything to do with it. * Our old ehCache configuration for shindig 2.0 specified a defaultCache maxElementsInMemory of 1000 but NO sizeOfPolicy at all. * Our new ehCache configuration for shindig 2.5 specifies a sizeOfPolicy maxDepth of 1 but NO defaultCache maxElementsInMemory. Our heap sizes in tomcat are 2048mb which based on a 50m max heap for a cache seems adequate. This is the same heap size from when we were using shindig 2.0. Unfortunately, we don’t have profiling tools enabled on our Tomcat instances so i can’t see what the heap looked like when things crashed, and like I said, we’re unable to reproduce this in int. I think we might be on to something here… I will keep searching but if any devs out there have any ideas, please let me know. Thanks shindig list! -Matt On 7/13/14, 10:12 AM, Ryan Baxter rbaxte...@gmail.com wrote: Matt can you tell us more about how you have configured the caches in shindig? When you are rendering these gadgets are you rendering the same gadget across all users? -Ryan On Jul 9, 2014, at 3:31 PM, Merrill, Matt mmerr...@mitre.org wrote: Stanton, Thanks for responding! This is one instance of shindig. If you mean the configuration within the container and for the shindig java app, then yes, the locked domains are the same. In fact, the configuration with the exception of shindig¹s host URL¹s is exactly the same from what I can tell. Unfortunately, I don¹t have any way to trace that exact message, but I did do a traceroute from the server running shindig to the URL that is being called for rpc calls to make sure there weren¹t any extra network hops, and there weren¹t, it actually only had one, as expected for an app making an HTTP call to itself. Thanks again for responding. -Matt On 7/9/14, 3:08 PM, Stanton Sievers ssiev...@apache.org wrote: Hi Matt, Is the configuration for locked domains and security tokens consistent between your test and production environments? Do you have any way of tracing the request in the log entry you provided through the network? Is this a single Shindig server or is there any load balancing occurring? Regards, -Stanton On Wed, Jul 9, 2014 at 2:40 PM, Merrill, Matt mmerr...@mitre.org wrote: Hi shindig devs, We are in the process of upgrading from shindig 2.0 to 2.5-update1 and everything has gone ok, however, once we got into our production environment, we are seeing significant slowdowns for the opensocial RPC calls that shindig makes to itself when rendering a gadget. This is obviously very dependent on how we¹ve implemented the shindig interfaces in our own code, and also our infrastructure, however, so we¹re hoping someone on the list can help give us some more ideas for areas to investigate inside shindig itself or in general. Here¹s what¹s happening: * Gadgets load fine when the app is not experiencing much load ( 10 users rendering 10-12 gadgets on a page) * Once a reasonable subset of users begins rendering gadgets, gadget render calls through the ³ifr² endpoint start taking a very long time to respond * The problem gets worse from there * Even with extensive load testing we can¹t recreate this problem in our testing environments * Our system adminstrators have assured us that the configurations of our servers are the same between int and prod This is an example of what we¹re seeing from the logs inside BasicHttpFetcher: http://238redacteddnsprefix234.gadgetsv2.company.com:7001/gmodules/rpc? st =mycontainer%3AvY2rb-teGXuk9HX8d6W0rm6wE6hkLxM95ByaSMQlV8RudwohiAFqAliy wV wc5yQ8maFSwK7IEhogNVnoUXa-doA3_h7EbSDGq_DW5i_VvC0CFEeaTKtr70A9XgYlAq5T9 5j