Hi Reka This looks similar. I can only get the autoscaler rule to execute for one, and one only, of a given set of cartridge subscriptions. Which looks like the same symptom as per the issue you list. And the description would account for it (stats sent only to one cluster, therefore reset flags would remain false for all the other clusters).
Is there any log pattern I can look for, that would confirm this is the same issue ? Thanks David. From: Reka Thirunavukkarasu [mailto:r...@wso2.com] Sent: 13 August 2014 06:16 To: dev Cc: Akila Ravihansa Perera; Shaheed Haque Subject: Re: autoscaler issue with multiple cartridges + subscriptions Hi All, Sorry for being late on this thread. A similar issue has been reported in https://issues.apache.org/jira/browse/STRATOS-675. Faced this issue when having multiple clusters. In that case, CEP always sends the stats to a particular cluster rather than sending stats per cluster. So, one cluster got autoscaled all the time even though there were multiple clusters. The root cause of this issue is that the CEP execution plan was not partitioned by clusters and network partition. @David, Are you also experiencing a similar issue? Thanks, Reka On Mon, Aug 11, 2014 at 7:51 PM, David Waddell <david.wadd...@owmobility.com<mailto:david.wadd...@owmobility.com>> wrote: Hi guys Just wondering if was anybody able to look at this ? As we're just not sure if we are setup incorrectly or if there are issues in the autoscaler. Thanks again David -----Original Message----- From: David Waddell [mailto:david.wadd...@owmobility.com<mailto:david.wadd...@owmobility.com>] Sent: 08 August 2014 16:22 To: Akila Ravihansa Perera Cc: dev@stratos.apache.org<mailto:dev@stratos.apache.org>; Shaheed Haque Subject: RE: autoscaler issue with multiple cartridges + subscriptions Guys Shaheed has been in transit so I'm not sure when he will be able to answer. Akila - I diff'd the scaling.drl file from github against ours and they are the same. Debug logging was not on the for the rules, perhaps this lead to your query. I've tried again today to look at this; running various combinations , I only ever see one of my cartridge subscriptions enter the auto scaling rule. I think there may be race conditions/ mismodelling in the way the NetworkPartitionContext object is being used. I may have misunderstood it but here goes : 1) There is a cluster per cartridge subscription and the clusters (subscriptions) are all using the same NetworkPartitionContext Instance in our example (N1). 2) Aggregate perf data when received for cluster, is written into NetworkPartionContext instance. 3) A cluster monitor exists for each cluster, each on its own thread and sleep for 60 seconds between checks. So 2 issues a) A cluster monitor may pick up stats for a cluster other than it's own, as these can be overwritten by another cluster's data being written into the NetworkPartitionContext b) As ClusterMonitor instances share the same NetworkPartitionContext, the first ClusterMonitor to wake from it's sleep will set reset=false in the NetworkPartitionContext. Preventing subsequent clusters from evaluating rules. Is it possible we are misconfigured and should not have multiple subscriptions in a network partition ? Or are these genuine issues ? Thanks David. -----Original Message----- From: David Waddell Sent: 06 August 2014 16:40 To: 'Akila Ravihansa Perera' Cc: dev@stratos.apache.org<mailto:dev@stratos.apache.org>; Shaheed Haque Subject: RE: autoscaler issue with multiple cartridges + subscriptions Hi Akila Please find the rule attached. Product version - netiq@octl-01:/opt/wso2/apache-stratos$ cat bin/version.txt Apache Stratos v4.0.0 netiq@octl-01:/opt/wso2/apache-stratos$ cat ./bin/wso2carbon-version.txt WSO2 Carbon Framework v4.2.0 Shaheed - can you comment on the build origin ? Thanks David. -----Original Message----- From: Akila Ravihansa Perera [mailto:raviha...@wso2.com<mailto:raviha...@wso2.com>] Sent: 06 August 2014 16:28 To: David Waddell Cc: dev@stratos.apache.org<mailto:dev@stratos.apache.org>; Shaheed Haque Subject: Re: autoscaler issue with multiple cartridges + subscriptions Hi David, Seems like the Drool file for scaling rule (scaling.drl) [1] is different in the packs that you are working with. Have you made changes to the existing scaling rule? Could you please tell us the Stratos version you're working with? (locally built from source, distribution release etc.) Please share the scaling rule found in <stratos_path>/repository/conf/scaling.drl [1] https://github.com/apache/stratos/blob/4.0.0/products/stratos/modules/distribution/src/main/conf/scaling.drl Thanks. On Wed, Aug 6, 2014 at 6:20 PM, David Waddell <david.wadd...@owmobility.com<mailto:david.wadd...@owmobility.com>> wrote: > .. clarified cartridges/policies inline below.. > > -----Original Message----- > From: David Waddell > [mailto:david.wadd...@owmobility.com<mailto:david.wadd...@owmobility.com>] > Sent: 06 August 2014 13:46 > To: Akila Ravihansa Perera; dev > Cc: Shaheed Haque > Subject: RE: autoscaler issue with multiple cartridges + subscriptions > > Hi Guys > Sorry for delayed reply. > > Akila - apologies for confusion on deployment policies - I had tested > with 2 polices, and the logs show the mix. Logs this time are not, and > issue exists. > Cloud-controller.xml attached. > > I reproduced this with DEBUG on for scaler. I ran with 2 cartridges, > opwv-fe with a policy static-1, with min/max instances of 1; and opwv-vos > with policy autoscale-1-2, min=1/max=2. > > When launched I then stressed the CPU on the opwv-vos instance to induce > scaling. > > What I found was (log name in brackets) > - when only the opwv-vos cartridge is subscribed, scaling did > occur (wso2carbon.log.singlesub) > - with the opwv-fe subscribed first, then the opwv-fe cartridge > subscribed, scaling did not occur (wso2carbon.log.2sub) > - with the opwv-vos subscribed first, then the opwv-vos, scaling > did occur (wso2carbon.log.2sub_reverse). > > I.e. It appeared the order of subscription creation affects. At a > rough glance it might be that both are sharing the same > NetworkPartitionContext instance. > > Please let me know if you would like any more detail, and thanks. > > > Rgds > David > > > > -----Original Message----- > From: Akila Ravihansa Perera > [mailto:raviha...@wso2.com<mailto:raviha...@wso2.com>] > Sent: 01 August 2014 07:58 > To: dev > Cc: Shaheed Haque > Subject: Re: autoscaler issue with multiple cartridges + subscriptions > > Hi David, > > I have few concerns about your deployment policies. > > 1. I see only 2 deployment policies defined: static-1, autoscale-1-2 But in > the cartridge subscription I can see a deployment policy named autoscale-1-5. > Where did that come from? Can you share your complete deployment > policy/policies? > > 2. In deployment policy autoscale-1-2, you have given the provide as: > "provider":"openstack-Core". Is this correct? Can you share your > cloud-controller.xml? > > As Nirmal suggested, enabling DEBUG logs will give more insights into what is > actually causing this issue. It's better if you can enable DEBUG logs for the > autoscaler package to get the complete picture. > > log4j.logger.org.apache.stratos.autoscaler=INFO > > Thanks. > > > On Fri, Aug 1, 2014 at 11:30 AM, Nirmal Fernando > <nirmal070...@gmail.com<mailto:nirmal070...@gmail.com>> wrote: >> >> >> >> On Fri, Aug 1, 2014 at 11:30 AM, Nirmal Fernando >> <nirmal070...@gmail.com<mailto:nirmal070...@gmail.com>> >> wrote: >>> >>> Hi David, >>> >>> Is there any possibility of enabling following logger in the >>> log4j.properties file ? >>> >>> log4j.logger.org.apache.stratos.autoscaler.rule.RuleLog=DEBUG >>> >>> >>> For each service cluster, we run a Cluster Monitor and that is >>> responsible for monitoring and scaling the cluster. >>> >>> >>> >>> On Thu, Jul 31, 2014 at 10:56 PM, David Waddell >>> <david.wadd...@owmobility.com<mailto:david.wadd...@owmobility.com>> wrote: >>>> >>>> Hi guys >>>> >>>> We’re experiencing an issue on stratos 4.0 - the autoscaler >>>> doesn’t seem to be kicking in when multiple cartridges are subscribed. >>>> >>>> When deploying only one cartridge, the autoscaler works as >>>> expected . >>>> >>>> >>>> >>>> 3 cartridges are defined : opwv-oam-01, opwv-oam-02, opwv-vos. >>>> >>>> >>>> >>>> {"displayName":"opwv-vos","description":"opwv-vos >>>> Cartridge","cartridgeAlias":"-","cartridgeType":"opwv-vos","activeI >>>> n >>>> stances":0,"provider":"cisco","version":"1","multiTenant":false,"ho >>>> s tName":"qmog.cisco.com<http://qmog.cisco.com>","loadBalancer":false} >>>> >>>> {"displayName":"opwv-oam-01","description":"opwv-oam-01 >>>> Cartridge","cartridgeAlias":"-","cartridgeType":"opwv-oam-01","acti >>>> v >>>> eInstances":0,"provider":"cisco","version":"1","multiTenant":false," >>>> hostName":"qmog.cisco.com<http://qmog.cisco.com>","loadBalancer":false} >>>> >>>> {"displayName":"opwv-oam-02","description":"opwv-oam-02 >>>> Cartridge","cartridgeAlias":"-","cartridgeType":"opwv-oam-02","acti >>>> v >>>> eInstances":0,"provider":"cisco","version":"1","multiTenant":false," >>>> hostName":"qmog.cisco.com<http://qmog.cisco.com>","loadBalancer":false} >>>> >>>> >>>> >>>> Deployment policies : >>>> >>>> >>>> >>>> >>>> {"id":"static-1","partitionGroup":[{"id":"N1","partitionAlgo":"one- >>>> a >>>> fter-another","partition":[{"id":"RegionOne","partitionMin":1,"part >>>> i >>>> tionMax":1,"provider":"openstack-Core","property":[{"name":"region" >>>> , >>>> "value":"RegionOne"}]}]}]} >>>> >>>> >>>> {"id":"autoscale-1-2","partitionGroup":[{"id":"N1","partitionAlgo":" >>>> one-after-another","partition":[{"id":"RegionOne","pa >>>> >>>> >>>> rtitionMin":1,"partitionMax":2,"provider":"openstack-Core","propert >>>> y ":[{"name":"region","value":"RegionOne"}]}]}]} >>>> >>>> >>>> >>>> Scaling policy : >>>> >>>> >>>> >>>> >>>> {"id":"economyPolicy","loadThresholds":{"requestsInFlight":{"averag >>>> e >>>> ":300.0,"secondDerivative":0.0,"gradient":0.0,"scaleDownMarginOfGra >>>> d >>>> ient":1.0,"scaleDownMarginOfSecondDerivative":0.2},"memoryConsumpti >>>> o >>>> n":{"average":6000.0,"secondDerivative":0.0,"gradient":0.0,"scaleDo >>>> w >>>> nMarginOfGradient":1.0,"scaleDownMarginOfSecondDerivative":0.2},"lo >>>> a >>>> dAverage":{"average":40.0,"secondDerivative":0.0,"gradient":0.0,"sc >>>> a >>>> leDownMarginOfGradient":1.0,"scaleDownMarginOfSecondDerivative":0.2 >>>> } >>>> }} >>>> >>>> >>>> >>>> If we subscribe cartridge opwv-vos by itself : >>>> >>>> >>>> >>>> TID: [0] [STRATOS] [2014-07-31 15:15:39,836] INFO >>>> {org.apache.stratos.manager.manager.CartridgeSubscriptionManager} - >>>> Successful Subscription: CartridgeSubscription [subscriptionId=0, >>>> type=opwv-vos, alias=opwv-vos, autoscalingPolicyName=economyPolicy, >>>> deploymentPolicyName=autoscale-1-5, subscriber=Subscriber >>>> [adminUserName=admin, tenantId=-1234, tenantDomain=carbon.super], >>>> repository=Repository [id=0, url=null, userName=, >>>> isPrivateRepository=false], >>>> cartridgeInfo=org.apache.stratos.cloud.controller.stub.pojo.Cartrid >>>> g >>>> eInfo@5288e5b6, >>>> payload=SERVICE_NAME=opwv-vos,HOST_NAME=opwv-vos.qmog.cisco.com<http://opwv-vos.qmog.cisco.com>,MUL >>>> T >>>> ITENANT=false,TENANT_ID=-1234,TENANT_RANGE=-1234,CARTRIDGE_ALIAS=op >>>> w >>>> v-vos,CLUSTER_ID=opwv-vos.opwv-vos.domain,CARTRIDGE_KEY=J5xTyGg9k1o >>>> d >>>> 0Dvl,REPO_URL=null,PORTS=22,PROVIDER=cisco,PUPPET_IP=PUPPET_IP,PUPP >>>> E >>>> T_HOSTNAME=PUPPET_HOSTNAME,PUPPET_ENV=PUPPET_ENV,OPWV_INTEGRA_oam_r >>>> o >>>> =opwv-oam-02,TRUSTSTORE_PASSWORD=wso2carbon,OPWV_INTEGRA_fe_server_ >>>> t >>>> ype=VOS,OPWV_INTEGRA_wait_for_hosts=oam01~oam02,CEP_PORT=7611,MONIT >>>> O >>>> RING_SERVER_SECURE_PORT=0,NO_CARTRIDGE_SUBSCRIBE=false,MB_PORT=6161 >>>> 6 >>>> ,MB_IP=octl.qmog.cisco.com<http://octl.qmog.cisco.com>,CEP_IP=octl.qmog.cisco.com<http://octl.qmog.cisco.com>,DEPLOYMENT=de >>>> f >>>> ault,OPWV_INTEGRA_region=Core,ENABLE_DATA_PUBLISHER=false,OPWV_INTE >>>> G >>>> RA_swap_size=2G,MONITORING_SERVER_ADMIN_PASSWORD=xxxx,MONITORING_SE >>>> R >>>> VER_IP=octl.qmog.cisco.com<http://octl.qmog.cisco.com>,COMMIT_ENABLED=false,MONITORING_SERVER_A >>>> D >>>> MIN_USERNAME=xxxx,OPWV_INTEGRA_oam_server_role=,CERT_TRUSTSTORE=/op >>>> t >>>> /apache-stratos-cartridge-agent/security/client-truststore.jks,OPWV >>>> _ >>>> INTEGRA_oam_server_type=OAMClient,MONITORING_SERVER_PORT=0,OPWV_INT >>>> E GRA_oam_rw=opwv-oam-01,OPWV_INTEGRA_sys_component=Core~CC, >>>> cluster=opwv-vos.opwv-vos.domain], subscriptionDomainMap={} >>>> {org.apache.stratos.manager.manager.CartridgeSubscriptionManager} >>>> >>>> >>>> >>>> VM is created, we run a stress load on that VM, we see the >>>> load average increase in aggregator.log, and the autoscaler >>>> correctly kicks in and spawns a second instance : >>>> >>>> >>>> >>>> TID: [0] [STRATOS] [2014-07-31 15:24:42,614] INFO >>>> {org.apache.stratos.autoscaler.rule.RuleLog} - [scale-up] >>>> Partition available, hence trying to spawn an instance to scale up! >>>> {org.apache.stratos.autoscaler.rule.RuleLog} >>>> >>>> >>>> >>>> However when we subscribe all the cartridges together, a >>>> stress on the op-vos instance does not trigger autoscale (although >>>> the aggregate log correctly reports the load) . >>>> >>>> >>>> >>>> Logs are attached. >>>> >>>> Run with single subscription + successful auto scale is from >>>> 2014-07-31 16:03:25,320 -> 2014-07-31 16:07:48,220 >>>> >>>> Run with multiple subscription and no auto scale occurring, is from >>>> to 2014-07-31 15:59:50,635 -> 2014-07-31 15:50:08,122 >>>> >>>> >>>> >>>> Apologies for hitting the dev list but there doesn’t appear to >>>> be a user list; If anyone can take a look appreciated J >>>> >>>> >>>> >>>> >>>> >>>> Thanks >>>> >>>> David. >>> >>> >>> >>> >>> -- >>> Best Regards, >>> Nirmal >>> >>> Nirmal Fernando. >>> PPMC Member & Committer of Apache Stratos, Senior Software Engineer, >>> WSO2 Inc. >>> >>> Blog: http://nirmalfdo.blogspot.com/ >> >> >> >> >> -- >> Best Regards, >> Nirmal >> >> Nirmal Fernando. >> PPMC Member & Committer of Apache Stratos, Senior Software Engineer, >> WSO2 Inc. >> >> Blog: http://nirmalfdo.blogspot.com/ > > > > -- > Akila Ravihansa Perera > Software Engineer > WSO2 Inc. > http://wso2.com > > Phone: +94 77 64 154 38<tel:%2B94%2077%2064%20154%2038> > Blog: http://ravihansa3000.blogspot.com > -- Akila Ravihansa Perera Software Engineer WSO2 Inc. http://wso2.com Phone: +94 77 64 154 38<tel:%2B94%2077%2064%20154%2038> Blog: http://ravihansa3000.blogspot.com -- Reka Thirunavukkarasu Senior Software Engineer, WSO2, Inc.:http://wso2.com, Mobile: +94776442007<tel:%2B94776442007>