Re: After box restart CS 4.5.2 fails to start

2015-09-15 Thread Rohit Yadav
Did you try running cloudstack-setup-management as root user ?
On 15-Sep-2015, at 8:24 pm, Keerthiraja SJ 
mailto:sjkeer...@gmail.com>> wrote:

Hi All,

Today I installed CS 4.5.2 on CentOS 6.7 and able to start the app
successfully.

All of a sudden the box reboot then while I started the cloudsatck it fails
to start where I could see a different issue and ERROR on catalina.out

ERROR
==
INFO  [c.c.s.ConfigurationServerImpl] (main:null) Processing
updateSSLKeyStore
INFO  [c.c.s.ConfigurationServerImpl] (main:null) SSL keystore located at
/etc/cloudstack/management/cloudmanagementserver.keystore
WARN  [c.c.s.ConfigurationServerImpl] (main:null) Would use fail-safe
keystore to continue.
java.io.IOException: Fail to create keystore file!
   at
com.cloud.server.ConfigurationServerImpl.updateSSLKeystore(ConfigurationServerImpl.java:664)
   at
com.cloud.server.ConfigurationServerImpl.persistDefaultValues(ConfigurationServerImpl.java:304)
   at
com.cloud.server.ConfigurationServerImpl.configure(ConfigurationServerImpl.java:166)
   at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle$3.with(CloudStackExtendedLifeCycle.java:114)
   at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.with(CloudStackExtendedLifeCycle.java:153)
   at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.configure(CloudStackExtendedLifeCycle.java:110)
   at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:56)
   at
org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:167)
   at
org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:51)
   at
org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:339)
   at
org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:143)
   at
org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:108)
   at
org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:945)
   at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:145)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:122)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:245)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:250)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:250)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:233)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContexts(DefaultModuleDefinitionSet.java:117)
   at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.load(DefaultModuleDefinitionSet.java:79)
   at
org.apache.cloudstack.spring.module.factory.ModuleBasedContextFactory.loadModules(ModuleBasedContextFactory.java:37)
   at
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.init(CloudStackSpringContext.java:70)
   at
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.(CloudStackSpringContext.java:57)
   at
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.(CloudStackSpringContext.java:61)
   at
org.apache.cloudstack.spring.module.web.CloudStackContextLoaderListener.contextInitialized(CloudStackContextLoaderListener.java:52)
   at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210)
   at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4709)
   at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
   at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
   at
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
   at
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
   at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
   at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
   at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
   at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
   at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
   at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
   at org.apache.c

Re: Recurring snapshot taking full every time

2015-09-15 Thread Abhinandan Prateek
Can you check in the SMlog on the concerned host if you see any error when the 
plugin is being invoked ?

> On 14-Sep-2015, at 5:01 pm, raja sekhar  wrote:
>
> Hi abhinandan,
>
> Thanks for your reply.
> We are facing the issue only for root volumes in that host. The snapshot
> recurring policy for data disks in same host is working fine.
> if the host configuration is not properly instrumented the policy for data
> disks also fails, but it is working fine.
> please suggest us.
>
> Note: The manual snapshots for root disk is working fine,and it is taking
> child when second manual is configured , problem only with root daily
> snapshots every day it is taking full instead of taking child snapshots
> throwing error "*Failed to get parent snapshot**, due to There was a
> failure communicating with the plugin*".
>
> Regards,
> rajasekhar.
>
> On Mon, Sep 14, 2015 at 4:00 PM, Abhinandan Prateek <
> abhinandan.prat...@shapeblue.com> wrote:
>
>> Looks like the host is not instrumented properly and management server is
>> unable to invoke plugin on the host.
>> Can you force-reconnect the host that is throwing this error ? That should
>> recopy the plugins on to the host.
>>
>>> On 14-Sep-2015, at 3:15 pm, raja sekhar  wrote:
>>>
>>> Hi All,
>>>
>>> I am using CS 4.3.1 and xenserver 6.2
>>> I have configured daily recurring snapshot policy for all the root
>> volumes.
>>> but recent three days onwards it is taking full snapshots and not taking
>>> any child snapshots.
>>> I have configured snapshot.delta.max to 16.
>>> please help me my secondary storage is becoming full.
>>>
>>> The log file details shows:
>>>
>>> 2015-09-14 01:03:29,973 DEBUG [c.c.a.m.DirectAgentAttache]
>>> (DirectAgent-24:ctx-d68622c5) Seq 4-482873005: Response Received:
>>> 2015-09-14 01:03:29,973 DEBUG [c.c.a.m.ClusteredAgentManagerImpl]
>>> (DirectAgent-24:ctx-d68622c5) Seq 4-482873005: MgmtId 195780927071877:
>>> Resp: Routing to peer
>>> 2015-09-14 01:03:32,233 DEBUG [c.c.c.ConsoleProxyManagerImpl]
>>> (consoleproxy-1:ctx-b1db30de) Zone 1 is ready to launch console proxy
>>> 2015-09-14 01:03:32,334 DEBUG [c.c.s.s.SecondaryStorageManagerImpl]
>>> (secstorage-1:ctx-e024d41b) Zone 1 is ready to launch secondary storage
>> VM
>>> 2015-09-14 01:03:33,934 WARN  [c.c.u.n.Link] (AgentManager-Selector:null)
>>> SSL: Fail to find the generated keystore. Loading fail-safe one to
>> continue.
>>> 2015-09-14 01:03:34,044 DEBUG [c.c.a.t.Request]
>>> (AgentManager-Handler-10:null) Seq 1-457572529: Executing:  { Cmd ,
>> MgmtId:
>>> 108371418365724, via: 1, Ver: v1, Flags: 100011,
>>>
>> [{"org.apache.cloudstack.storage.command.CreateObjectCommand":{"data":{"org.apache.cloudstack.storage.to.SnapshotObjectTO":{"volume":{"uuid":"bcf7653a-33db-461a-9355-e5d518012679","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"165fb777-4225-326a-bc48-f5b465f9486b","id":1,"poolType":"NetworkFilesystem","host":"172.30.36.51","path":"/vS02304090GCSP_NAS05","port":2049,"url":"NetworkFilesystem://
>>>
>> 172.30.36.51//vS02304090GCSP_NAS05/?ROLE=Primary&STOREUUID=165fb777-4225-326a-bc48-f5b465f9486b
>>>
>> "}},"name":"ROOT-189","size":85899345920,"path":"9deefbf6-86e6-4359-acb3-99e1aebcb359","volumeId":696,"vmName":"i-11-189-VM","accountId":11,"format":"VHD","id":696,"deviceId":0,"hypervisorType":"XenServer"},"parentSnapshotPath":"3a6e9e1a-fd95-4530-ae67-03a368b3e2fd","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"165fb777-4225-326a-bc48-f5b465f9486b","id":1,"poolType":"NetworkFilesystem","host":"172.30.36.51","path":"/vS02304090GCSP_NAS05","port":2049,"url":"NetworkFilesystem://
>>>
>> 172.30.36.51//vS02304090GCSP_NAS05/?ROLE=Primary&STOREUUID=165fb777-4225-326a-bc48-f5b465f9486b
>> "}},"vmName":"i-11-189-VM","name":"SAGXD02_ROOT-189_20150914090333","hypervisorType":"XenServer","id":6732,"quiescevm":false,"parents":["3a6e9e1a-fd95-4530-ae67-03a368b3e2fd"],"physicalSize":0}},"wait":0}}]
>>> }
>>> 2015-09-14 01:03:34,045 DEBUG [c.c.a.m.DirectAgentAttache]
>>> (DirectAgent-333:ctx-c3e7c331) Seq 1-457572529: Executing request
>>> 2015-09-14 01:03:35,421 WARN  [c.c.h.x.r.CitrixResourceBase]
>>> (DirectAgent-333:ctx-c3e7c331) callHostPlugin failed for cmd:
>> getVhdParent
>>> with args snapshotUuid: 3a6e9e1a-fd95-4530-ae67-03a368b3e2fd, isISCSI:
>>> false, primaryStorageSRUuid: a6a7f65a-a1d2-83e3-d3b0-f38792863fb6,  due
>> to
>>> There was a failure communicating with the plugin.
>>> 2015-09-14 01:03:35,421 DEBUG [c.c.h.x.r.XenServerStorageProcessor]
>>> (DirectAgent-333:ctx-c3e7c331) Failed to get parent snapshot
>>> com.cloud.utils.exception.CloudRuntimeException: callHostPlugin failed
>> for
>>> cmd: getVhdParent with args snapshotUuid:
>>> 3a6e9e1a-fd95-4530-ae67-03a368b3e2fd, isISCSI: false,
>> primaryStorageSRUuid:
>>> a6a7f65a-a1d2-83e3-d3b0-f38792863fb6,  due to There was a failure
>>> communicating with the plugin.
>>>   at
>>>
>> com.cloud.hypervisor.xen.resource.CitrixResourceBase.callHos

Re: After box restart CS 4.5.2 fails to start

2015-09-15 Thread Abhinandan Prateek
This looks like a user privilege issue. Can you make sure that you are running 
cloudstack with enough privileges (sudo included).

> On 15-Sep-2015, at 8:24 pm, Keerthiraja SJ  wrote:
>
> Hi All,
>
> Today I installed CS 4.5.2 on CentOS 6.7 and able to start the app
> successfully.
>
> All of a sudden the box reboot then while I started the cloudsatck it fails
> to start where I could see a different issue and ERROR on catalina.out
>
> ERROR
> ==
> INFO  [c.c.s.ConfigurationServerImpl] (main:null) Processing
> updateSSLKeyStore
> INFO  [c.c.s.ConfigurationServerImpl] (main:null) SSL keystore located at
> /etc/cloudstack/management/cloudmanagementserver.keystore
> WARN  [c.c.s.ConfigurationServerImpl] (main:null) Would use fail-safe
> keystore to continue.
> java.io.IOException: Fail to create keystore file!
>at
> com.cloud.server.ConfigurationServerImpl.updateSSLKeystore(ConfigurationServerImpl.java:664)
>at
> com.cloud.server.ConfigurationServerImpl.persistDefaultValues(ConfigurationServerImpl.java:304)
>at
> com.cloud.server.ConfigurationServerImpl.configure(ConfigurationServerImpl.java:166)
>at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle$3.with(CloudStackExtendedLifeCycle.java:114)
>at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.with(CloudStackExtendedLifeCycle.java:153)
>at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.configure(CloudStackExtendedLifeCycle.java:110)
>at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:56)
>at
> org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:167)
>at
> org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:51)
>at
> org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:339)
>at
> org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:143)
>at
> org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:108)
>at
> org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:945)
>at
> org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:145)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:122)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:245)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:250)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:250)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:233)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContexts(DefaultModuleDefinitionSet.java:117)
>at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.load(DefaultModuleDefinitionSet.java:79)
>at
> org.apache.cloudstack.spring.module.factory.ModuleBasedContextFactory.loadModules(ModuleBasedContextFactory.java:37)
>at
> org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.init(CloudStackSpringContext.java:70)
>at
> org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.(CloudStackSpringContext.java:57)
>at
> org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.(CloudStackSpringContext.java:61)
>at
> org.apache.cloudstack.spring.module.web.CloudStackContextLoaderListener.contextInitialized(CloudStackContextLoaderListener.java:52)
>at
> org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210)
>at
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4709)
>at
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
>at
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
>at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
>at
> org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
>at
> org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
>at
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
>at
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
>at
> org.apache.catalina.util.Lifecycl

RE: Dynamic Scalable Template issue

2015-09-15 Thread Somesh Naidu
I believe XS does not allocate more than the dynamic-max to the VM. That task 
manger, perfmon, etc. would report incorrect memory usage.

One way to find out how much memory is available to the Windows guest is 
running the following in a powershell (with admin privilege):
$a = gwmi -n root\wmi -cl CitrixXenStoreSession
$a[0].getvalue("memory/target").value

The value returned would represent “real” amount of memory (in KB) available to 
Windows guest.

Regards,
Somesh

From: Todd Pigram [mailto:t...@toddpigram.com]
Sent: Tuesday, September 15, 2015 4:26 PM
To: Somesh Naidu
Cc: CloudStack Users
Subject: Re: Dynamic Scalable Template issue

I understand this but in the case of MS SQL it will use all the memory that is 
available to Windows. As in my case a DDC with SQL was using 31.5gb of ram 
while looking in perfmon. But the system offering was limited to 8GB.

Basically Dynamic Scales works but is bad for SQL servers. I just turned it off.

On Tuesday, September 15, 2015, Somesh Naidu 
mailto:somesh.na...@citrix.com>> wrote:
Well, that defect is considered a doc defect from CCP/CS perspective so there 
are no code changes involved. After listing the limitation in the CCP 4.5 
release notes the defect is marked as fixed.

The behavior is controlled/affected by how XS DMC works and how Windows guests 
view available memory. Windows guests always see static-max whereas the usable 
memory (allowed by Xen) for the guest can never exceed dynamic-max. In case of 
dynamic scaling, CCP/CS sets static-max higher (4 times dynamic-min) than 
dynamic-max. The memory represented by static-max – dynamic-max is occupied by 
XS balloon driver (contained in the XS tools) which inflates/deflates based on 
dynamic memory scaling requests.

Essentially, what Windows guests reports as available memory is incorrect. The 
functionality itself isn’t affected, that is, the actual memory available to 
the Windows guest is correct.

Somesh
CloudPlatform Escalations
Citrix Systems, Inc.

From: Todd Pigram 
[mailto:t...@toddpigram.com]
Sent: Tuesday, September 15, 2015 11:33 AM
To: Somesh Naidu
Cc: CloudStack Users
Subject: Re: Dynamic Scalable Template issue

Is this fixed in 4.5.1 as the issue is not listed in the release notes under 
CS-27425/CS-21217?

As a FYI, on the one VM the new tools worked but not on any others. I had to 
set ‘enable.dynamic.scale.vm’ to false, restart management service. Then I have 
to shutdown the instance. then start, to get the right memory. A reboot doesn’t 
fix it.



Todd Pigram
http://about.me/ToddPigram
www.linkedin.com/in/toddpigram/
@pigram86 on twitter
https://plus.google.com/+ToddPigram86
Mobile - 216-224-5769

PGP Public Key

On Sep 15, 2015, at 11:11 AM, Somesh Naidu 
>
 wrote:

A similar issue is listed in CCP 4.5 release notes – look for 
“CS-27425/CS-21217”.

Somesh
CloudPlatform Escalations
Citrix Systems, Inc.

From: Todd Pigram 
[mailto:t...@toddpigram.com]
Sent: Wednesday, September 09, 2015 11:22 AM
To: CloudStack Users
Subject: Re: Dynamic Scalable Template issue

Well its seems to be a XS Tools issue. I am waiting to here back from support 
on the ‘Official’ word, but with the new tools installed and the Dynamic 
Scalable option set, I am currently running like I was before the upgrade to 
CCP 4.3.0.2.

At this point it is closed.

Thanks for the help.

Todd Pigram
http://about.me/ToddPigram
www.linkedin.com/in/toddpigram/>
@pigram86 on twitter
https://plus.google.com/+ToddPigram86
Mobile - 216-224-5769

PGP Public Key

On Sep 9, 2015, at 10:53 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.netmailto:va...@kickcloud.net');>>>
 wrote:

Todd,

  Is VM guest shows the correct amount of RAM? Is this issue resolved?

Vadim.

On 2015-09-09 15:34, Todd Pigram wrote:


Vadium,
Yes to both with the new tools from XS62ESP1028. Both Windows and CentOS have 
'Dynamic Scalable' Option selected and in XenCenter they show the correct RAM.
Todd Pigram
http://about.me/ToddPigram [1]
www.linkedin.com/in/toddpigram/>
 [2] @pigram86 on twitter
https://plus.google.com/+ToddPigram86 [3] Mobile - 216-224-5769
PGP Public Key [4]
On Sep 9, 2015, at 2:49 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.netmailto:va...@kickcloud.net');>>>
 wrote:
Todd,
Have you tried to do the following manual tests on the cluster where you have 
problem:
1. Dynamically scalable with CentOS ?
2. Dynamically scalable with Windows ?
What do they show as available RAM?
Regards,
On 2015-09-08 19:44, Todd Pigram wrote:
Vadium,
After installing XS62ESP1028 via CLI (no reboot on hosts yet) and building an 
Centos65 instance w/o 'Dyna

Re: Dynamic Scalable Template issue

2015-09-15 Thread Todd Pigram
I understand this but in the case of MS SQL it will use all the memory that
is available to Windows. As in my case a DDC with SQL was using 31.5gb of
ram while looking in perfmon. But the system offering was limited to 8GB.

Basically Dynamic Scales works but is bad for SQL servers. I just turned it
off.

On Tuesday, September 15, 2015, Somesh Naidu 
wrote:

> Well, that defect is considered a doc defect from CCP/CS perspective so
> there are no code changes involved. After listing the limitation in the CCP
> 4.5 release notes the defect is marked as fixed.
>
>
>
> The behavior is controlled/affected by how XS DMC works and how Windows
> guests view available memory. Windows guests always see static-max whereas
> the usable memory (allowed by Xen) for the guest can never exceed
> dynamic-max. In case of dynamic scaling, CCP/CS sets static-max higher (4
> times dynamic-min) than dynamic-max. The memory represented by static-max –
> dynamic-max is occupied by XS balloon driver (contained in the XS tools)
> which inflates/deflates based on dynamic memory scaling requests.
>
>
>
> Essentially, what Windows guests reports as available memory is incorrect.
> The functionality itself isn’t affected, that is, the actual memory
> available to the Windows guest is correct.
>
>
>
> Somesh
>
> CloudPlatform Escalations
>
> Citrix Systems, Inc.
>
>
>
> *From:* Todd Pigram [mailto:t...@toddpigram.com
> ]
> *Sent:* Tuesday, September 15, 2015 11:33 AM
> *To:* Somesh Naidu
> *Cc:* CloudStack Users
> *Subject:* Re: Dynamic Scalable Template issue
>
>
>
> Is this fixed in 4.5.1 as the issue is not listed in the release notes
> under CS-27425/CS-21217?
>
>
>
> As a FYI, on the one VM the new tools worked but not on any others. I had
> to set ‘enable.dynamic.scale.vm’ to false, restart management service. Then
> I have to shutdown the instance. then start, to get the right memory. A
> reboot doesn’t fix it.
>
>
>
>
>
>
>
> Todd Pigram
> http://about.me/ToddPigram
> www.linkedin.com/in/toddpigram/
>
> @pigram86 on twitter
> https://plus.google.com/+ToddPigram86
>
> Mobile - 216-224-5769
>
>
>
> PGP Public Key
> 
>
>
>
> On Sep 15, 2015, at 11:11 AM, Somesh Naidu  > wrote:
>
>
>
> A similar issue is listed in CCP 4.5 release notes – look for
> “CS-27425/CS-21217”.
>
> Somesh
> CloudPlatform Escalations
> Citrix Systems, Inc.
>
> From: Todd Pigram [mailto:t...@toddpigram.com
> ]
> Sent: Wednesday, September 09, 2015 11:22 AM
> To: CloudStack Users
> Subject: Re: Dynamic Scalable Template issue
>
> Well its seems to be a XS Tools issue. I am waiting to here back from
> support on the ‘Official’ word, but with the new tools installed and the
> Dynamic Scalable option set, I am currently running like I was before the
> upgrade to CCP 4.3.0.2.
>
> At this point it is closed.
>
> Thanks for the help.
>
> Todd Pigram
> http://about.me/ToddPigram
> www.linkedin.com/in/toddpigram/ 
> >
> @pigram86 on twitter
> https://plus.google.com/+ToddPigram86
> Mobile - 216-224-5769
>
> PGP Public Key<
> http://pgp.mit.edu/pks/lookup?op=get&search=0x96B7B0F0C55933BB>
>
> On Sep 9, 2015, at 10:53 AM, Vadim Kimlaychuk <
> va...@kickcloud.net mailto:va...@kickcloud.net');>>>
> wrote:
>
> Todd,
>
>   Is VM guest shows the correct amount of RAM? Is this issue resolved?
>
> Vadim.
>
> On 2015-09-09 15:34, Todd Pigram wrote:
>
>
> Vadium,
> Yes to both with the new tools from XS62ESP1028. Both Windows and CentOS
> have 'Dynamic Scalable' Option selected and in XenCenter they show the
> correct RAM.
> Todd Pigram
> http://about.me/ToddPigram [1]
> www.linkedin.com/in/toddpigram/ >
> [2] @pigram86 on twitter
> https://plus.google.com/+ToddPigram86 [3] Mobile - 216-224-5769
> PGP Public Key [4]
> On Sep 9, 2015, at 2:49 AM, Vadim Kimlaychuk <
> va...@kickcloud.net mailto:va...@kickcloud.net');>>>
> wrote:
> Todd,
> Have you tried to do the following manual tests on the cluster where you
> have problem:
> 1. Dynamically scalable with CentOS ?
> 2. Dynamically scalable with Windows ?
> What do they show as available RAM?
> Regards,
> On 2015-09-08 19:44, Todd Pigram wrote:
> Vadium,
> After installing XS62ESP1028 via CLI (no reboot on hosts yet) and building
> an Centos65 instance w/o 'Dynamic Scalable' option checked, it showed right
> in XenCenter. I installed the new tools from (XS62ESP1028) and still good.
> I created a template from this instance. Deployed said template, and the
> memory is still good in XenCenter.
> I tested on a Windows VM and it is the same with the new tools, however,
> as this particular tenant bypasses the virtual router, I have to reboot
> twice and reset networking as t

RE: Dynamic Scalable Template issue

2015-09-15 Thread Somesh Naidu
Well, that defect is considered a doc defect from CCP/CS perspective so there 
are no code changes involved. After listing the limitation in the CCP 4.5 
release notes the defect is marked as fixed.

The behavior is controlled/affected by how XS DMC works and how Windows guests 
view available memory. Windows guests always see static-max whereas the usable 
memory (allowed by Xen) for the guest can never exceed dynamic-max. In case of 
dynamic scaling, CCP/CS sets static-max higher (4 times dynamic-min) than 
dynamic-max. The memory represented by static-max – dynamic-max is occupied by 
XS balloon driver (contained in the XS tools) which inflates/deflates based on 
dynamic memory scaling requests.

Essentially, what Windows guests reports as available memory is incorrect. The 
functionality itself isn’t affected, that is, the actual memory available to 
the Windows guest is correct.

Somesh
CloudPlatform Escalations
Citrix Systems, Inc.

From: Todd Pigram [mailto:t...@toddpigram.com]
Sent: Tuesday, September 15, 2015 11:33 AM
To: Somesh Naidu
Cc: CloudStack Users
Subject: Re: Dynamic Scalable Template issue

Is this fixed in 4.5.1 as the issue is not listed in the release notes under 
CS-27425/CS-21217?

As a FYI, on the one VM the new tools worked but not on any others. I had to 
set ‘enable.dynamic.scale.vm’ to false, restart management service. Then I have 
to shutdown the instance. then start, to get the right memory. A reboot doesn’t 
fix it.



Todd Pigram
http://about.me/ToddPigram
www.linkedin.com/in/toddpigram/
@pigram86 on twitter
https://plus.google.com/+ToddPigram86
Mobile - 216-224-5769

PGP Public Key

On Sep 15, 2015, at 11:11 AM, Somesh Naidu 
mailto:somesh.na...@citrix.com>> wrote:

A similar issue is listed in CCP 4.5 release notes – look for 
“CS-27425/CS-21217”.

Somesh
CloudPlatform Escalations
Citrix Systems, Inc.

From: Todd Pigram [mailto:t...@toddpigram.com]
Sent: Wednesday, September 09, 2015 11:22 AM
To: CloudStack Users
Subject: Re: Dynamic Scalable Template issue

Well its seems to be a XS Tools issue. I am waiting to here back from support 
on the ‘Official’ word, but with the new tools installed and the Dynamic 
Scalable option set, I am currently running like I was before the upgrade to 
CCP 4.3.0.2.

At this point it is closed.

Thanks for the help.

Todd Pigram
http://about.me/ToddPigram
www.linkedin.com/in/toddpigram/>
@pigram86 on twitter
https://plus.google.com/+ToddPigram86
Mobile - 216-224-5769

PGP Public Key

On Sep 9, 2015, at 10:53 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>>
 wrote:

Todd,

  Is VM guest shows the correct amount of RAM? Is this issue resolved?

Vadim.

On 2015-09-09 15:34, Todd Pigram wrote:


Vadium,
Yes to both with the new tools from XS62ESP1028. Both Windows and CentOS have 
'Dynamic Scalable' Option selected and in XenCenter they show the correct RAM.
Todd Pigram
http://about.me/ToddPigram [1]
www.linkedin.com/in/toddpigram/>
 [2] @pigram86 on twitter
https://plus.google.com/+ToddPigram86 [3] Mobile - 216-224-5769
PGP Public Key [4]
On Sep 9, 2015, at 2:49 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>>
 wrote:
Todd,
Have you tried to do the following manual tests on the cluster where you have 
problem:
1. Dynamically scalable with CentOS ?
2. Dynamically scalable with Windows ?
What do they show as available RAM?
Regards,
On 2015-09-08 19:44, Todd Pigram wrote:
Vadium,
After installing XS62ESP1028 via CLI (no reboot on hosts yet) and building an 
Centos65 instance w/o 'Dynamic Scalable' option checked, it showed right in 
XenCenter. I installed the new tools from (XS62ESP1028) and still good. I 
created a template from this instance. Deployed said template, and the memory 
is still good in XenCenter.
I tested on a Windows VM and it is the same with the new tools, however, as 
this particular tenant bypasses the virtual router, I have to reboot twice and 
reset networking as the new XenTools reset the networking stack.
Todd Pigram
http://about.me/ToddPigram [1] [1]
www.linkedin.com/in/toddpigram/>
 [2] [2] @pigram86 on twitter
https://plus.google.com/+ToddPigram86 [3] [3] Mobile - 216-224-5769
PGP Public Key [4]
On Sep 6, 2015, at 4:57 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>>
 wrote:
Todd,
Can you try Linux template with same dynamic s

Re: Dynamic Scalable Template issue

2015-09-15 Thread Todd Pigram
Is this fixed in 4.5.1 as the issue is not listed in the release notes under 
CS-27425/CS-21217?

As a FYI, on the one VM the new tools worked but not on any others. I had to 
set ‘enable.dynamic.scale.vm’ to false, restart management service. Then I have 
to shutdown the instance. then start, to get the right memory. A reboot doesn’t 
fix it.




Todd Pigram
http://about.me/ToddPigram 
www.linkedin.com/in/toddpigram/ 
@pigram86 on twitter
https://plus.google.com/+ToddPigram86 
Mobile - 216-224-5769

PGP Public Key 
> On Sep 15, 2015, at 11:11 AM, Somesh Naidu  wrote:
> 
> A similar issue is listed in CCP 4.5 release notes – look for 
> “CS-27425/CS-21217”.
> 
> Somesh
> CloudPlatform Escalations
> Citrix Systems, Inc.
> 
> From: Todd Pigram [mailto:t...@toddpigram.com]
> Sent: Wednesday, September 09, 2015 11:22 AM
> To: CloudStack Users
> Subject: Re: Dynamic Scalable Template issue
> 
> Well its seems to be a XS Tools issue. I am waiting to here back from support 
> on the ‘Official’ word, but with the new tools installed and the Dynamic 
> Scalable option set, I am currently running like I was before the upgrade to 
> CCP 4.3.0.2.
> 
> At this point it is closed.
> 
> Thanks for the help.
> 
> Todd Pigram
> http://about.me/ToddPigram
> www.linkedin.com/in/toddpigram/
> @pigram86 on twitter
> https://plus.google.com/+ToddPigram86
> Mobile - 216-224-5769
> 
> PGP Public Key
> 
> On Sep 9, 2015, at 10:53 AM, Vadim Kimlaychuk 
> mailto:va...@kickcloud.net>> wrote:
> 
> Todd,
> 
>   Is VM guest shows the correct amount of RAM? Is this issue resolved?
> 
> Vadim.
> 
> On 2015-09-09 15:34, Todd Pigram wrote:
> 
> 
> Vadium,
> Yes to both with the new tools from XS62ESP1028. Both Windows and CentOS have 
> 'Dynamic Scalable' Option selected and in XenCenter they show the correct RAM.
> Todd Pigram
> http://about.me/ToddPigram [1]
> www.linkedin.com/in/toddpigram/ [2] 
> @pigram86 on twitter
> https://plus.google.com/+ToddPigram86 [3] Mobile - 216-224-5769
> PGP Public Key [4]
> On Sep 9, 2015, at 2:49 AM, Vadim Kimlaychuk 
> mailto:va...@kickcloud.net>> wrote:
> Todd,
> Have you tried to do the following manual tests on the cluster where you have 
> problem:
> 1. Dynamically scalable with CentOS ?
> 2. Dynamically scalable with Windows ?
> What do they show as available RAM?
> Regards,
> On 2015-09-08 19:44, Todd Pigram wrote:
> Vadium,
> After installing XS62ESP1028 via CLI (no reboot on hosts yet) and building an 
> Centos65 instance w/o 'Dynamic Scalable' option checked, it showed right in 
> XenCenter. I installed the new tools from (XS62ESP1028) and still good. I 
> created a template from this instance. Deployed said template, and the memory 
> is still good in XenCenter.
> I tested on a Windows VM and it is the same with the new tools, however, as 
> this particular tenant bypasses the virtual router, I have to reboot twice 
> and reset networking as the new XenTools reset the networking stack.
> Todd Pigram
> http://about.me/ToddPigram [1] [1]
> www.linkedin.com/in/toddpigram/ [2] 
> [2] @pigram86 on twitter
> https://plus.google.com/+ToddPigram86 [3] [3] Mobile - 216-224-5769
> PGP Public Key [4]
> On Sep 6, 2015, at 4:57 AM, Vadim Kimlaychuk 
> mailto:va...@kickcloud.net>> wrote:
> Todd,
> Can you try Linux template with same dynamic scale option on the same pool? I 
> wonder if there is a problem with Windows or any guest OS.
> Regards,
> Vadim.
> On 2015-09-05 21:04, Todd Pigram wrote:
> Vadium
> That makes sense. I will see if I can replicate the issue in a lab. But given 
> the holiday weekend, might not be until next week
> On Saturday, September 5, 2015, Vadim Kimlaychuk 
> mailto:va...@kickcloud.net>> wrote:
> Todd,
> I have seeing similar problem with Xen 4.1 (not XenServer). Linux guests were 
> able to see (and use) entire host resources on any guest VM. That was a bug 
> of configuration. If you think about what could be different after CS update 
> - it could be VM registration procedure. Still guest VM should not be able to 
> see static max. Your XenCenter shows that effective dynamic VM memory size is 
> 8Gb while maximium is 32Gb. So CS configured VM guest correctly. This is 
> problem of hypervisor <-> guest VM communitcation. That is why I asked you to 
> try to register VM manually. I believe you will have the same result. Than 
> means your server pool of XS62ESP1027 is broken. 3 other pools are not. I see 
> no reason to update to 4.5.1, because I think this is not the problem of CS, 
> but particularly this XenServer pool + this type of Windows guest (if other 
> templates with dynamic offer are good).
> Vadim.
> On 2015-09-05

RE: Dynamic Scalable Template issue

2015-09-15 Thread Somesh Naidu
A similar issue is listed in CCP 4.5 release notes – look for 
“CS-27425/CS-21217”.

Somesh
CloudPlatform Escalations
Citrix Systems, Inc.

From: Todd Pigram [mailto:t...@toddpigram.com]
Sent: Wednesday, September 09, 2015 11:22 AM
To: CloudStack Users
Subject: Re: Dynamic Scalable Template issue

Well its seems to be a XS Tools issue. I am waiting to here back from support 
on the ‘Official’ word, but with the new tools installed and the Dynamic 
Scalable option set, I am currently running like I was before the upgrade to 
CCP 4.3.0.2.

At this point it is closed.

Thanks for the help.

Todd Pigram
http://about.me/ToddPigram
www.linkedin.com/in/toddpigram/
@pigram86 on twitter
https://plus.google.com/+ToddPigram86
Mobile - 216-224-5769

PGP Public Key

On Sep 9, 2015, at 10:53 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>> wrote:

Todd,

   Is VM guest shows the correct amount of RAM? Is this issue resolved?

Vadim.

On 2015-09-09 15:34, Todd Pigram wrote:


Vadium,
Yes to both with the new tools from XS62ESP1028. Both Windows and CentOS have 
'Dynamic Scalable' Option selected and in XenCenter they show the correct RAM.
Todd Pigram
http://about.me/ToddPigram [1]
www.linkedin.com/in/toddpigram/ [2] 
@pigram86 on twitter
https://plus.google.com/+ToddPigram86 [3] Mobile - 216-224-5769
PGP Public Key [4]
On Sep 9, 2015, at 2:49 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>> wrote:
Todd,
Have you tried to do the following manual tests on the cluster where you have 
problem:
1. Dynamically scalable with CentOS ?
2. Dynamically scalable with Windows ?
What do they show as available RAM?
Regards,
On 2015-09-08 19:44, Todd Pigram wrote:
Vadium,
After installing XS62ESP1028 via CLI (no reboot on hosts yet) and building an 
Centos65 instance w/o 'Dynamic Scalable' option checked, it showed right in 
XenCenter. I installed the new tools from (XS62ESP1028) and still good. I 
created a template from this instance. Deployed said template, and the memory 
is still good in XenCenter.
I tested on a Windows VM and it is the same with the new tools, however, as 
this particular tenant bypasses the virtual router, I have to reboot twice and 
reset networking as the new XenTools reset the networking stack.
Todd Pigram
http://about.me/ToddPigram [1] [1]
www.linkedin.com/in/toddpigram/ [2] [2] 
@pigram86 on twitter
https://plus.google.com/+ToddPigram86 [3] [3] Mobile - 216-224-5769
PGP Public Key [4]
On Sep 6, 2015, at 4:57 AM, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>> wrote:
Todd,
Can you try Linux template with same dynamic scale option on the same pool? I 
wonder if there is a problem with Windows or any guest OS.
Regards,
Vadim.
On 2015-09-05 21:04, Todd Pigram wrote:
Vadium
That makes sense. I will see if I can replicate the issue in a lab. But given 
the holiday weekend, might not be until next week
On Saturday, September 5, 2015, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>> wrote:
Todd,
I have seeing similar problem with Xen 4.1 (not XenServer). Linux guests were 
able to see (and use) entire host resources on any guest VM. That was a bug of 
configuration. If you think about what could be different after CS update - it 
could be VM registration procedure. Still guest VM should not be able to see 
static max. Your XenCenter shows that effective dynamic VM memory size is 8Gb 
while maximium is 32Gb. So CS configured VM guest correctly. This is problem of 
hypervisor <-> guest VM communitcation. That is why I asked you to try to 
register VM manually. I believe you will have the same result. Than means your 
server pool of XS62ESP1027 is broken. 3 other pools are not. I see no reason to 
update to 4.5.1, because I think this is not the problem of CS, but 
particularly this XenServer pool + this type of Windows guest (if other 
templates with dynamic offer are good).
Vadim.
On 2015-09-05 14:45, Todd Pigram wrote:
Vadim
I have 3 other pools (1 XS6.2sp1 and 2x xs65sp1) I have no issue with these.
Based on the Design doc, what i was experiencing is by design. Ok I will turn 
it off.
But now my question is, why on 4.3 I didn't have this issue but after 
installing 4.3.0.2 it changed.
Was 4.3 broken or is 4.3.0.2? Will upgrading to CCP 4.5.1 will be better?
If this is truely by design, I will not be able to use dynamic scalable for my 
windows instances
On Saturday, September 5, 2015, Vadim Kimlaychuk 
mailto:va...@kickcloud.net>> wrote:
Todd,
You may try to create VM at XenServer without CloudStack just using XE tool (or 
XenCenter). If your manually created VM with static max <> dynamic max will be 
OK, then there is a problem with Cloudstack.
Vadim.
On 2015-09-04 21:51, Todd Pigram wrote:
Latest as of XS62ESP1027. I know XS62ESP1028 comes with new XenTools.
On Friday, September 4, 2015, Vadim Kimlaychuk 
mailto:va...@kickcloud

After box restart CS 4.5.2 fails to start

2015-09-15 Thread Keerthiraja SJ
Hi All,

Today I installed CS 4.5.2 on CentOS 6.7 and able to start the app
successfully.

All of a sudden the box reboot then while I started the cloudsatck it fails
to start where I could see a different issue and ERROR on catalina.out

ERROR
==
INFO  [c.c.s.ConfigurationServerImpl] (main:null) Processing
updateSSLKeyStore
INFO  [c.c.s.ConfigurationServerImpl] (main:null) SSL keystore located at
/etc/cloudstack/management/cloudmanagementserver.keystore
WARN  [c.c.s.ConfigurationServerImpl] (main:null) Would use fail-safe
keystore to continue.
java.io.IOException: Fail to create keystore file!
at
com.cloud.server.ConfigurationServerImpl.updateSSLKeystore(ConfigurationServerImpl.java:664)
at
com.cloud.server.ConfigurationServerImpl.persistDefaultValues(ConfigurationServerImpl.java:304)
at
com.cloud.server.ConfigurationServerImpl.configure(ConfigurationServerImpl.java:166)
at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle$3.with(CloudStackExtendedLifeCycle.java:114)
at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.with(CloudStackExtendedLifeCycle.java:153)
at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.configure(CloudStackExtendedLifeCycle.java:110)
at
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:56)
at
org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:167)
at
org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:51)
at
org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:339)
at
org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:143)
at
org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:108)
at
org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:945)
at
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:482)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:145)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:122)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:245)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:250)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:250)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:233)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContexts(DefaultModuleDefinitionSet.java:117)
at
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.load(DefaultModuleDefinitionSet.java:79)
at
org.apache.cloudstack.spring.module.factory.ModuleBasedContextFactory.loadModules(ModuleBasedContextFactory.java:37)
at
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.init(CloudStackSpringContext.java:70)
at
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.(CloudStackSpringContext.java:57)
at
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.(CloudStackSpringContext.java:61)
at
org.apache.cloudstack.spring.module.web.CloudStackContextLoaderListener.contextInitialized(CloudStackContextLoaderListener.java:52)
at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4210)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4709)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1041)
at
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:964)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at
org.apache.catalina.core.StandardEngin

Re: CS 4.5.2: all hosts reboot after 3 days at production

2015-09-15 Thread Remi Bergsma
Hi,

You cannot sync because that will also try to write to remote disks and that 
doesn’t work. Or else you wouldn’t be in trouble anyway. Before, we have seen 
situations where the box was supposed to be rebooted, but that took a long time 
due to the sync. Instead, you wait a little bit and then fence it. Otherwise 
your ring buffers will become full and disks will get corruption.

There are global settings to tweak the parameters, if I recall correctly.

This all gets less relevant due to XenHA. This process is smarter and also 
fences the box, always leaving behind log trails. Running XenServer without 
turning on XenHA is asking for trouble anyway. I always put XenHA to 180 
seconds.

Regards,
Remi




On 15/09/15 10:19, "Vadim Kimlaychuk"  wrote:

>Abhinandan and Frank,
>
>1. xenheartbeat.sh is designed to monitor iSCSI and NFS mounts
>2. default installation monitors only presence of 
>/opt/cloud/bin/heartbeat local file. Administrator must run script 
>setup_heartbeat_file.sh with host UUID and SR UUID it wants to monitor 
>and then heartbeat file will contain information about what storage to 
>monitor.
>3. if p2. was set up, script will try to read 100 bytes from the 
>mounted storage to /dev/null and if read is successful withing 1 min. it 
>sleeps for 1 min. Otherwise it will report problem, exit endless cycle 
>and do reboot
>4. it does reboot by calling "echo b > /proc/sysrq-trigger". I am 
>not confident that such method is safe in terms of local disk writes. I 
>mean script log may not be flushed to hard drive and thus after reboot 
>admin may not see any problems in syslog, because local disk writes are 
>discarded.
>
>BR,
>
>Vadim.
>
>On 2015-09-15 10:41, Frank Louwers wrote:
>
>> Important correction: it monitors the health of the first primary NFS 
>> (or otherwise "distributed and mounted") filesystem. If you don't use 
>> NFS als (main) primary storage, it's safe to disable that reboot. If 
>> you know your NFS has "issues" from time to time, and have controls 
>> around that, yes, you can disable that...
>> 
>> Frank
>> 
>> On 15 Sep 2015, at 09:08, Abhinandan Prateek 
>>  wrote:
>> 
>> The heartbeat script monitors the health of the primary storage by 
>> using a timestamp that is written to each primary store.
>> In case the primary storage is unreachable it reboots the XenServer in 
>> order to protect the virtual machines from corruption.
>> 
>> On 14-Sep-2015, at 8:48 pm, Vadim Kimlaychuk  
>> wrote:
>> 
>> Remi,
>> 
>> I will definitely enable HA when find who is rebooting the host. I 
>> known circumstances when it happens and I know that it is 
>> storage-related. Hardware health is monitored by SNMP and there were no 
>> problems with temperature, CPU, RAM or HDD ranges. In case of HW 
>> failure I should theoretically have kernel panic or crash dumps. But 
>> there is none. Will experiment a bit.
>> 
>> Thank you,
>> 
>> Vadim.
>> 
>> On 2015-09-14 17:35, Remi Bergsma wrote:
>> 
>> Hi Vadim,
>> It can also be XenHA but I remember you already said it is off. Did you 
>> check the hardware health?
>> I'd recommend turning on XenHA as otherwise in case of a failure you 
>> will not have an automatic recovery.
>> Regards,
>> Remi
>> On 14/09/15 15:09, "Vadim Kimlaychuk"  wrote:
>> Remi,
>> I have analyzed script xenheartbeat.sh and it seems it is useless,
>> because relies on file /opt/cloud/bin/heartbeat that has 0 length. It 
>> is
>> not set-up during installation and there is no such a step in
>> documentation for setting it up. Logically admin must run
>> "setup_heartbeat_file.sh" to make heartbeat work. If this file is 0
>> length then script checks nothing and log this message every minute:
>> Sep 14 04:43:53 xcp1 heartbeat: Problem with heartbeat, no iSCSI or NFS
>> mount defined in /opt/cloud/bin/heartbeat!
>> That means it can't reboot host, because it doesn't check
>> anything. Isn't it ?
>> Is there any other script that may reboot host if when there is a
>> problem with storage?
>> Vadim.
>> On 2015-09-14 15:40, Remi Bergsma wrote:
>> Hi Vadim,
>> This does indeed reboot a box, once storage fails:
>> echo b > /proc/sysrq-trigger
>> Removing it doesn't make sense, as there are serious issues once you
>> hit this code. I'd recommend making sure the storage is reliable.
>> Regards, Remi
>> On 14/09/15 08:13, "Vadim Kimlaychuk"  wrote:
>> Remi,
>> I have analyzed situation and found that storage may cause problem
>> with host reboot as you wrote before in this thread. Reason for that --
>> we do offline backups from NFS server at that time when hosts fail.
>> Basically we copy all files in primary and secondary storage offsite.
>> This process starts precisely at 00:00 and somewhere around 00:10 -
>> 00:40 XenServer host starts to reboot.
>> Reading old threads I have found that
>> /opt/cloud/bin/xenheartbeat.sh may do this job. Particularly last lines
>> at my xenheartbeat.sh are:
>> -
>> /usr/bin/logger -t hear

Re: CS 4.5.2: all hosts reboot after 3 days at production

2015-09-15 Thread Vadim Kimlaychuk

Abhinandan and Frank,

   1. xenheartbeat.sh is designed to monitor iSCSI and NFS mounts
   2. default installation monitors only presence of 
/opt/cloud/bin/heartbeat local file. Administrator must run script 
setup_heartbeat_file.sh with host UUID and SR UUID it wants to monitor 
and then heartbeat file will contain information about what storage to 
monitor.
   3. if p2. was set up, script will try to read 100 bytes from the 
mounted storage to /dev/null and if read is successful withing 1 min. it 
sleeps for 1 min. Otherwise it will report problem, exit endless cycle 
and do reboot
   4. it does reboot by calling "echo b > /proc/sysrq-trigger". I am 
not confident that such method is safe in terms of local disk writes. I 
mean script log may not be flushed to hard drive and thus after reboot 
admin may not see any problems in syslog, because local disk writes are 
discarded.


BR,

Vadim.

On 2015-09-15 10:41, Frank Louwers wrote:

Important correction: it monitors the health of the first primary NFS 
(or otherwise "distributed and mounted") filesystem. If you don't use 
NFS als (main) primary storage, it's safe to disable that reboot. If 
you know your NFS has "issues" from time to time, and have controls 
around that, yes, you can disable that...


Frank

On 15 Sep 2015, at 09:08, Abhinandan Prateek 
 wrote:


The heartbeat script monitors the health of the primary storage by 
using a timestamp that is written to each primary store.
In case the primary storage is unreachable it reboots the XenServer in 
order to protect the virtual machines from corruption.


On 14-Sep-2015, at 8:48 pm, Vadim Kimlaychuk  
wrote:


Remi,

I will definitely enable HA when find who is rebooting the host. I 
known circumstances when it happens and I know that it is 
storage-related. Hardware health is monitored by SNMP and there were no 
problems with temperature, CPU, RAM or HDD ranges. In case of HW 
failure I should theoretically have kernel panic or crash dumps. But 
there is none. Will experiment a bit.


Thank you,

Vadim.

On 2015-09-14 17:35, Remi Bergsma wrote:

Hi Vadim,
It can also be XenHA but I remember you already said it is off. Did you 
check the hardware health?
I'd recommend turning on XenHA as otherwise in case of a failure you 
will not have an automatic recovery.

Regards,
Remi
On 14/09/15 15:09, "Vadim Kimlaychuk"  wrote:
Remi,
I have analyzed script xenheartbeat.sh and it seems it is useless,
because relies on file /opt/cloud/bin/heartbeat that has 0 length. It 
is

not set-up during installation and there is no such a step in
documentation for setting it up. Logically admin must run
"setup_heartbeat_file.sh" to make heartbeat work. If this file is 0
length then script checks nothing and log this message every minute:
Sep 14 04:43:53 xcp1 heartbeat: Problem with heartbeat, no iSCSI or NFS
mount defined in /opt/cloud/bin/heartbeat!
That means it can't reboot host, because it doesn't check
anything. Isn't it ?
Is there any other script that may reboot host if when there is a
problem with storage?
Vadim.
On 2015-09-14 15:40, Remi Bergsma wrote:
Hi Vadim,
This does indeed reboot a box, once storage fails:
echo b > /proc/sysrq-trigger
Removing it doesn't make sense, as there are serious issues once you
hit this code. I'd recommend making sure the storage is reliable.
Regards, Remi
On 14/09/15 08:13, "Vadim Kimlaychuk"  wrote:
Remi,
I have analyzed situation and found that storage may cause problem
with host reboot as you wrote before in this thread. Reason for that --
we do offline backups from NFS server at that time when hosts fail.
Basically we copy all files in primary and secondary storage offsite.
This process starts precisely at 00:00 and somewhere around 00:10 -
00:40 XenServer host starts to reboot.
Reading old threads I have found that
/opt/cloud/bin/xenheartbeat.sh may do this job. Particularly last lines
at my xenheartbeat.sh are:
-
/usr/bin/logger -t heartbeat "Problem with $hb: not reachable for
$(($(date +%s) - $lastdate)) seconds, rebooting system!"
echo b > /proc/sysrq-trigger
-
The only "unclear" moment is -- I don't have such line in my logs.
May this command "echo b > /proc/sysrq-trigger" prevent from writing to
syslog file? Documentation says that it does reboot immediately without
synchronizing FS. It seems there is no other place that may do it, but
still I am not 100% sure.
Vadim.
On 2015-09-13 18:26, Vadim Kimlaychuk wrote:
Remi,
Thank you for hint. At least one problem is identified:
[root@xcp1 ~]# xe pool-⁠list params=all | grep -⁠E
"ha-⁠enabled|ha-⁠config"
ha-⁠enabled ( RO): false
ha-⁠configuration ( RO):
Where should I look for storage errors? Host? Management server? I have
checked /var/log/messages and there were only regular messages, no
"fence" or "reboot" commands.
I have dedicated NFS server that should be accessible all the time (at
least NIC interfaces are bonded in master-slave mode). Server is used
fo

Re: CS 4.5.2: all hosts reboot after 3 days at production

2015-09-15 Thread Frank Louwers
Important correction: it monitors the health of the first primary NFS (or 
otherwise “distributed and mounted”) filesystem. If you don’t use NFS als 
(main) primary storage, it’s safe to disable that reboot. If you know your NFS 
has “issues” from time to time, and have controls around that, yes, you can 
disable that…

Frank



> On 15 Sep 2015, at 09:08, Abhinandan Prateek 
>  wrote:
> 
> The heartbeat script monitors the health of the primary storage by using a 
> timestamp that is written to each primary store.
> In case the primary storage is unreachable it reboots the XenServer in order 
> to protect the virtual machines from corruption.
> 
>> On 14-Sep-2015, at 8:48 pm, Vadim Kimlaychuk  wrote:
>> 
>> Remi,
>> 
>>   I will definitely enable HA when find who is rebooting the host. I known 
>> circumstances when it happens and I know that it is storage-related. 
>> Hardware health is monitored by SNMP and there were no problems with 
>> temperature, CPU, RAM or HDD ranges. In case of HW failure I should 
>> theoretically have kernel panic or crash dumps. But there is none. Will 
>> experiment a bit.
>> 
>>   Thank you,
>> 
>> Vadim.
>> 
>> On 2015-09-14 17:35, Remi Bergsma wrote:
>> 
>>> Hi Vadim,
>>> It can also be XenHA but I remember you already said it is off. Did you 
>>> check the hardware health?
>>> I'd recommend turning on XenHA as otherwise in case of a failure you will 
>>> not have an automatic recovery.
>>> Regards,
>>> Remi
>>> On 14/09/15 15:09, "Vadim Kimlaychuk"  wrote:
>>> Remi,
>>> I have analyzed script xenheartbeat.sh and it seems it is useless,
>>> because relies on file /opt/cloud/bin/heartbeat that has 0 length. It is
>>> not set-up during installation and there is no such a step in
>>> documentation for setting it up. Logically admin must run
>>> "setup_heartbeat_file.sh" to make heartbeat work. If this file is 0
>>> length then script checks nothing and log this message every minute:
>>> Sep 14 04:43:53 xcp1 heartbeat: Problem with heartbeat, no iSCSI or NFS
>>> mount defined in /opt/cloud/bin/heartbeat!
>>> That means it can't reboot host, because it doesn't check
>>> anything. Isn't it ?
>>> Is there any other script that may reboot host if when there is a
>>> problem with storage?
>>> Vadim.
>>> On 2015-09-14 15:40, Remi Bergsma wrote:
>>> Hi Vadim,
>>> This does indeed reboot a box, once storage fails:
>>> echo b > /proc/sysrq-trigger
>>> Removing it doesn't make sense, as there are serious issues once you
>>> hit this code. I'd recommend making sure the storage is reliable.
>>> Regards, Remi
>>> On 14/09/15 08:13, "Vadim Kimlaychuk"  wrote:
>>> Remi,
>>> I have analyzed situation and found that storage may cause problem
>>> with host reboot as you wrote before in this thread. Reason for that --
>>> we do offline backups from NFS server at that time when hosts fail.
>>> Basically we copy all files in primary and secondary storage offsite.
>>> This process starts precisely at 00:00 and somewhere around 00:10 -
>>> 00:40 XenServer host starts to reboot.
>>> Reading old threads I have found that
>>> /opt/cloud/bin/xenheartbeat.sh may do this job. Particularly last lines
>>> at my xenheartbeat.sh are:
>>> -
>>> /usr/bin/logger -t heartbeat "Problem with $hb: not reachable for
>>> $(($(date +%s) - $lastdate)) seconds, rebooting system!"
>>> echo b > /proc/sysrq-trigger
>>> -
>>> The only "unclear" moment is -- I don't have such line in my logs.
>>> May this command "echo b > /proc/sysrq-trigger" prevent from writing to
>>> syslog file? Documentation says that it does reboot immediately without
>>> synchronizing FS. It seems there is no other place that may do it, but
>>> still I am not 100% sure.
>>> Vadim.
>>> On 2015-09-13 18:26, Vadim Kimlaychuk wrote:
>>> Remi,
>>> Thank you for hint. At least one problem is identified:
>>> [root@xcp1 ~]# xe pool-⁠list params=all | grep -⁠E
>>> "ha-⁠enabled|ha-⁠config"
>>> ha-⁠enabled ( RO): false
>>> ha-⁠configuration ( RO):
>>> Where should I look for storage errors? Host? Management server? I have
>>> checked /var/log/messages and there were only regular messages, no
>>> "fence" or "reboot" commands.
>>> I have dedicated NFS server that should be accessible all the time (at
>>> least NIC interfaces are bonded in master-slave mode). Server is used
>>> for both primary and secondary storage.
>>> Thanks,
>>> Vadim.
>>> On 2015-⁠09-⁠13 14:38, Remi Bergsma wrote:
>>> Hi Vadim,
>>> Not sure what the problem is. Although I do know that when shared
>>> storage is used, both CloudStack and XenServer will fence (reboot) the
>>> box to prevent corruption in case access to the network or the storage
>>> is not possible. What storage do you use?
>>> What does this return on a XenServer?:
>>> xe pool-⁠list params=all | grep -⁠E "ha-⁠enabled|ha-⁠config"
>>> HA should be on, or else a hypervisor crash will not recover properly.
>>> If you search the logs for Fence or reboot, does anything come back?
>>> T

Re: CS 4.5.2: all hosts reboot after 3 days at production

2015-09-15 Thread Abhinandan Prateek
The heartbeat script monitors the health of the primary storage by using a 
timestamp that is written to each primary store.
In case the primary storage is unreachable it reboots the XenServer in order to 
protect the virtual machines from corruption.

> On 14-Sep-2015, at 8:48 pm, Vadim Kimlaychuk  wrote:
>
> Remi,
>
>I will definitely enable HA when find who is rebooting the host. I known 
> circumstances when it happens and I know that it is storage-related. Hardware 
> health is monitored by SNMP and there were no problems with temperature, CPU, 
> RAM or HDD ranges. In case of HW failure I should theoretically have kernel 
> panic or crash dumps. But there is none. Will experiment a bit.
>
>Thank you,
>
> Vadim.
>
> On 2015-09-14 17:35, Remi Bergsma wrote:
>
>> Hi Vadim,
>> It can also be XenHA but I remember you already said it is off. Did you 
>> check the hardware health?
>> I'd recommend turning on XenHA as otherwise in case of a failure you will 
>> not have an automatic recovery.
>> Regards,
>> Remi
>> On 14/09/15 15:09, "Vadim Kimlaychuk"  wrote:
>> Remi,
>> I have analyzed script xenheartbeat.sh and it seems it is useless,
>> because relies on file /opt/cloud/bin/heartbeat that has 0 length. It is
>> not set-up during installation and there is no such a step in
>> documentation for setting it up. Logically admin must run
>> "setup_heartbeat_file.sh" to make heartbeat work. If this file is 0
>> length then script checks nothing and log this message every minute:
>> Sep 14 04:43:53 xcp1 heartbeat: Problem with heartbeat, no iSCSI or NFS
>> mount defined in /opt/cloud/bin/heartbeat!
>> That means it can't reboot host, because it doesn't check
>> anything. Isn't it ?
>> Is there any other script that may reboot host if when there is a
>> problem with storage?
>> Vadim.
>> On 2015-09-14 15:40, Remi Bergsma wrote:
>> Hi Vadim,
>> This does indeed reboot a box, once storage fails:
>> echo b > /proc/sysrq-trigger
>> Removing it doesn't make sense, as there are serious issues once you
>> hit this code. I'd recommend making sure the storage is reliable.
>> Regards, Remi
>> On 14/09/15 08:13, "Vadim Kimlaychuk"  wrote:
>> Remi,
>> I have analyzed situation and found that storage may cause problem
>> with host reboot as you wrote before in this thread. Reason for that --
>> we do offline backups from NFS server at that time when hosts fail.
>> Basically we copy all files in primary and secondary storage offsite.
>> This process starts precisely at 00:00 and somewhere around 00:10 -
>> 00:40 XenServer host starts to reboot.
>> Reading old threads I have found that
>> /opt/cloud/bin/xenheartbeat.sh may do this job. Particularly last lines
>> at my xenheartbeat.sh are:
>> -
>> /usr/bin/logger -t heartbeat "Problem with $hb: not reachable for
>> $(($(date +%s) - $lastdate)) seconds, rebooting system!"
>> echo b > /proc/sysrq-trigger
>> -
>> The only "unclear" moment is -- I don't have such line in my logs.
>> May this command "echo b > /proc/sysrq-trigger" prevent from writing to
>> syslog file? Documentation says that it does reboot immediately without
>> synchronizing FS. It seems there is no other place that may do it, but
>> still I am not 100% sure.
>> Vadim.
>> On 2015-09-13 18:26, Vadim Kimlaychuk wrote:
>> Remi,
>> Thank you for hint. At least one problem is identified:
>> [root@xcp1 ~]# xe pool-⁠list params=all | grep -⁠E
>> "ha-⁠enabled|ha-⁠config"
>> ha-⁠enabled ( RO): false
>> ha-⁠configuration ( RO):
>> Where should I look for storage errors? Host? Management server? I have
>> checked /var/log/messages and there were only regular messages, no
>> "fence" or "reboot" commands.
>> I have dedicated NFS server that should be accessible all the time (at
>> least NIC interfaces are bonded in master-slave mode). Server is used
>> for both primary and secondary storage.
>> Thanks,
>> Vadim.
>> On 2015-⁠09-⁠13 14:38, Remi Bergsma wrote:
>> Hi Vadim,
>> Not sure what the problem is. Although I do know that when shared
>> storage is used, both CloudStack and XenServer will fence (reboot) the
>> box to prevent corruption in case access to the network or the storage
>> is not possible. What storage do you use?
>> What does this return on a XenServer?:
>> xe pool-⁠list params=all | grep -⁠E "ha-⁠enabled|ha-⁠config"
>> HA should be on, or else a hypervisor crash will not recover properly.
>> If you search the logs for Fence or reboot, does anything come back?
>> The logs you mention are nothing to worry about.
>> Can you tell us in some more details what happens and how we can
>> reproduce it?
>> Regards,
>> Remi
>> -⁠-⁠-⁠-⁠-⁠Original Message-⁠-⁠-⁠-⁠-⁠
>> From: Vadim Kimlaychuk [mailto:va...@kickcloud.net]
>> Sent: zondag 13 september 2015 9:32
>> To: users@cloudstack.apache.org
>> Cc: Remi Bergsma
>> Subject: Re: CS 4.5.2: all hosts reboot after 3 days at production
>> Hello Remi,
>> This issue has nothing to do with CS 4.5.2. We got host reboot after
>> p