Re: DHCP instance/vm issue

2019-07-09 Thread Yiping Zhang
Jesse:

As Andrija said, this is a purely DHCP issue, has nothing to do with the 
CloudStack.

We have a similar set up here , where both ACS VM instances and non-ACS servers 
exist on the same subnet, and they are served by separated DHCP servers (stock 
ISC dhcp server on RHEL).  Here is how we solved the conflict.

For non-ACS DHCP server, In dhcpd.conf, we define a class for Cloudstack 
instances based on their MAC address.  Then in the pool stanza for each subnet 
served by this (non ACS) DHCP server , just ignore members of cloustack class.

Here are the relevant snippets in dhcpd.conf:

class "Cloudstack" {
  match if substring(hardware,1,1) = 06;
}
...

# Prod_DMZ_subnet
subnet 10.0.8.0 netmask 255.255.252.0 {
pool {
deny members of "Cloudstack";
...
}
...
}

In earlier versions of ACS, all VM's MAC address start with "06:".  I think in 
current version of ACS, they start with "1e:"   now.  

Hope this helps.

Yiping




On 7/9/19, 9:44 AM, "jesse.wat...@gmail.com"  wrote:

Interesting

proxy in to vm
pkill dhclient
dhclient -x
dhclient eth0

get ip I expected, odd


On Tue, Jul 9, 2019 at 11:16 AM  wrote:

>
> My vm was assigned an ip from our endpoint DHCP server, not from VR. Do I
> need to add firewall rule(s) to force DHCP request to VR? I probably 
missed
> a part of setup w/KVM hosts and or within management when I defined the
> zone/pod/...
>
> This seems to be correct, VR is running on a different host then the vm.
>
> Chain i-2-11-VM-eg (1 references)
>  pkts bytes target prot opt in out source
> destination
> 0 0 RETURN all  --  *  *   0.0.0.0/0
> 0.0.0.0/0
>
> Chain i-2-11-def (2 references)
>  pkts bytes target prot opt in out source
> destination
> 0 0 ACCEPT all  --  *  *   0.0.0.0/0
> 0.0.0.0/0state RELATED,ESTABLISHED
> 0 0 ACCEPT udp  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-in vnet0
> --physdev-is-bridged udp spt:68 dpt:67
> 0 0 ACCEPT udp  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-out vnet0
> --physdev-is-bridged udp spt:67 dpt:68
> 0 0 DROP   all  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-in vnet0
> --physdev-is-bridged ! match-set i-2-11-VM src
> 0 0 RETURN udp  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-in vnet0
> --physdev-is-bridged match-set i-2-11-VM src udp dpt:53
> 0 0 RETURN tcp  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-in vnet0
> --physdev-is-bridged match-set i-2-11-VM src tcp dpt:53
> 0 0 i-2-11-VM-eg  all  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-in vnet0
> --physdev-is-bridged match-set i-2-11-VM src
>15  1963 i-2-11-VM  all  --  *  *   0.0.0.0/0
> 0.0.0.0/0PHYSDEV match --physdev-out vnet0
> --physdev-is-bridged
>
>
>
> Thanks for quick response Andrija!
>
> -  Jesse
>
>
>
>
> On Tue, Jul 9, 2019 at 10:39 AM Andrija Panic 
> wrote:
>
>> ACS will only offer DHCP leases to its VMs, via DHCP reservation.. If you
>> have another DHCP server in your area, than it might be quicker to offer 
a
>> lease to a VM. You have to either remove your non-ACS DHCP server
>> completely, OR make sure it uses reservation for non-ACS servers/hosts
>> i.e.
>> NOT let it issue leases freely to anyone who asks for it. Pure DHCP
>> "problem" - i.e. nothing to do with ACS specifically.
>>
>> Best,
>> Andrija
>>
>> On Tue, Jul 9, 2019, 20:27  wrote:
>>
>> > Have a DHCP issue where vm pulls from ACS proxy properly sometimes and
>> > other when it pulls from our normal dhcp server for end-points.
>> >
>> > Network layout is flat, and I ACS is using basic network with security
>> > groups. IP range for acs is  within range of our normal network so vms
>> and
>> > endpoints will flow without additional hardware. How do I ensure dhcp
>> > requests are served by router vm and not our normal dhcp server?
>> >
>> > TIA,
>> >   Jesse
>> >
>>
>




Re: Can't start systemVM in a new advanced zone deployment

2019-06-06 Thread Yiping Zhang
The nfs volume definitely allows root mount and have RW permissions, as we 
already see the volume mounted and template staged on primary storage. The 
volume is mounted as NFS3 datastore in vSphere.

Volume snapshot is enabled,  I can ask to have snapshot disabled to see if it 
makes any differentces.   I need to find out more about NFS version and qtree 
mode from our storage admin.  

One thing I noticed is that when cloudstack templates are staged on to primary 
storage, a snapshot was created, which does not exist In the original OVA or on 
secondary storage.  I suppose this is the expected behavior?

Yiping

On 6/6/19, 6:59 AM, "Sergey Levitskiy"  wrote:

This option is 'vol options name_of_volume nosnapdir on' however if I 
recall it right is supposed to work even with .snapshot directory visible
Can you find out all vol options on your netapp volume? I would be most 
concerned about:
- NFS version - NFS v4 should be disabled
- security qtree mode to be set to UNIX
- allow root mount

I am also wondering if ACS is able to create ROOT-XX folder so you might 
want to watch the content of the DS when ACS tries the operations.
 

On 6/5/19, 11:43 PM, "Paul Angus"  wrote:

Hi Yiping,

do you have snapshots enabled on the NetApp filer?  (it used to be seen 
as a  ".snapshot"  subdirectory in each directory)

If so try disabling snapshots - there used to be a bug where the 
.snapshot directory would confuse CloudStack.

paul.an...@shapeblue.com 

https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=NhoxwF0x4%2F8yn%2B8ck%2BCI8RUKEEDGnI73QfDDQeSmZUc%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-----
From: Yiping Zhang  
Sent: 05 June 2019 23:38
To: users@cloudstack.apache.org
Subject: Re: Can't start systemVM in a new advanced zone deployment

Hi, Sergey:

I found more logs in vpxa.log ( the esxi hosts are using UTC time zone, 
 so I was looking at wrong time periods earlier).  I have uploaded more logs 
into pastebin.

From these log entries,  it appears that when copying template to VM,  
it tried to open destination VMDK file and got error file not found.  

In case that the CloudStack attempted to create a systemVM,  the 
destination VMDK file path it is looking for is 
"//.vmdk",  see uploaded log at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FaFysZkTydata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448727822sdata=YyB9VdghCgiBuUmDZ8gIc0jPlM8miPzemX2UEAZ3sFA%3Dreserved=0

In case when I manually created new VM from a (different) template in 
vCenter UI,   the destination VMDK file path it is looking for is 
"//.vmdk", see uploaded log at 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FyHcsD8xBdata=02%7C01%7Cyipzhang%40adobe.com%7C557bf647ff13413c66b708d6ea87220d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636954263448732817sdata=N%2BZHteGo3LDU0pvhBtzv7wcocAv35gRE9b9yKVQa6%2FQ%3Dreserved=0

So, I am confused as to how the path for destination VMDK was 
determined and by CloudStack or VMware, how did I end up with this?

Yiping


On 6/5/19, 12:32 PM, "Sergey Levitskiy"  wrote:

Some operations log get transferred to vCenter log vpxd.log. It is 
not straightforward to trace I but Vmware will be able to help should you open 
case with them. 
    

On 6/5/19, 11:39 AM, "Yiping Zhang"  
wrote:

Hi, Sergey:

During the time period when I had problem cloning template,  
there are only a few unique entries in vmkernel.log, and they were repeated 
hundreds/thousands of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open 
file 'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing 
flags 0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: 
hpsa_vmkScsiCmdDone:6384: Sense data: error code: 0x70, key: 0x5, info:00 00 00 
00 , cmdInfo:00 00 00 00 , CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 
0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: 
Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev 
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Va

Re: Can't start systemVM in a new advanced zone deployment

2019-06-05 Thread Yiping Zhang
Hi, Sergey:

I found more logs in vpxa.log ( the esxi hosts are using UTC time zone,  so I 
was looking at wrong time periods earlier).  I have uploaded more logs into 
pastebin.

From these log entries,  it appears that when copying template to VM,  it tried 
to open destination VMDK file and got error file not found.  

In case that the CloudStack attempted to create a systemVM,  the destination 
VMDK file path it is looking for is "//.vmdk", 
 see uploaded log at https://pastebin.com/aFysZkTy

In case when I manually created new VM from a (different) template in vCenter 
UI,   the destination VMDK file path it is looking for is 
"//.vmdk", see uploaded log at 
https://pastebin.com/yHcsD8xB

So, I am confused as to how the path for destination VMDK was determined and by 
CloudStack or VMware, how did I end up with this?

Yiping


On 6/5/19, 12:32 PM, "Sergey Levitskiy"  wrote:

Some operations log get transferred to vCenter log vpxd.log. It is not 
straightforward to trace I but Vmware will be able to help should you open case 
with them. 

    
    On 6/5/19, 11:39 AM, "Yiping Zhang"  wrote:

Hi, Sergey:

During the time period when I had problem cloning template,  there are 
only a few unique entries in vmkernel.log, and they were repeated 
hundreds/thousands of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 
'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 
0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: 
Sense data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 
, CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: 
Cmd(0x43954115be40) 0x85, CmdSN 0xd5c from world 8490638 to dev 
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense 
data: 0x5 0x20 0x0.

The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local disk on 
this host.

Yiping


On 6/5/19, 11:15 AM, "Sergey Levitskiy"  wrote:

This must be specific to that environment.  For a full clone mode 
ACS simply calls cloneVMTask of vSphere API so basically until cloning of that 
template succeeds when attmepted in vSphere client  it would keep failing in 
ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001?
    

On 6/5/19, 8:47 AM, "Yiping Zhang"  
wrote:

Well,  I can always reproduce it in this particular vSphere set 
up,  but in a different ACS+vSphere environment,  I don't see this problem.

Yiping

On 6/5/19, 1:00 AM, "Andrija Panic"  
wrote:

Yiping,

if you are sure you can reproduce the issue, it would be 
good to raise a
GitHub issue and provide as much detail as possible.

    Andrija

On Wed, 5 Jun 2019 at 05:29, Yiping Zhang 

wrote:

> Hi, Sergey:
>
> Thanks for the tip. After setting 
vmware.create.full.clone=false,  I was
> able to create and start system VM instances.However, 
 I feel that the
> underlying problem still exists, and I am just working 
around it instead of
> fixing it,  because in my lab CloudStack instance with 
the same version of
> ACS and vSphere,  I still have 
vmware.create.full.clone=true and all is
> working as expected.
>
> I did some reading on VMware docs regarding full clone 
vs. linked clone.
> It seems that the best practice is to use full clone for 
production,
> especially if there are high rates of changes to the 
disks.  So
> eventually,  I need to understand and fix the root cause 
for this issue.
> At least for now,  I am over this hurdle and I can move 
on.
>
> Thanks again,
>
> Yiping
>
> On 6/4/19, 11:13 AM, "Sergey Levitskiy" 
 wrote:
>
> Everything looks good and consistent including all 
references in VMDK
> and its snapshot. I would try these 2 routes:
> 1. Figure out what vSphere error actuall

Re: Can't start systemVM in a new advanced zone deployment

2019-06-05 Thread Yiping Zhang
Hi, Sergey:

During the time period when I had problem cloning template,  there are only a 
few unique entries in vmkernel.log, and they were repeated hundreds/thousands 
of times by all the cpu cores:

2019-06-02T16:47:00.633Z cpu9:8491061)FSS: 6751: Failed to open file 
'hpilo-d0ccb15'; Requested flags 0x5, world: 8491061 [ams-ahs], (Existing flags 
0x5, world: 8491029 [ams-main]): Busy
2019-06-02T16:47:49.320Z cpu1:66415)nhpsa: hpsa_vmkScsiCmdDone:6384: Sense 
data: error code: 0x70, key: 0x5, info:00 00 00 00 , cmdInfo:00 00 00 00 , 
CmdSN: 0xd5c, worldId: 0x818e8e, Cmd: 0x85, ASC: 0x20, ASCQ: 0x0
2019-06-02T16:47:49.320Z cpu1:66415)ScsiDeviceIO: 2948: Cmd(0x43954115be40) 
0x85, CmdSN 0xd5c from world 8490638 to dev 
"naa.600508b1001c6d77d7dd6a0cc0953df1" failed H:0x0 D:0x2 P:0x0 Valid sense 
data: 0x5 0x20 0x0.

The device " naa.600508b1001c6d77d7dd6a0cc0953df1" is the local disk on this 
host.

Yiping


On 6/5/19, 11:15 AM, "Sergey Levitskiy"  wrote:

This must be specific to that environment.  For a full clone mode ACS 
simply calls cloneVMTask of vSphere API so basically until cloning of that 
template succeeds when attmepted in vSphere client  it would keep failing in 
ACS. Can you post vmkernel.log from your ESX host esx-0001-a-001?


    On 6/5/19, 8:47 AM, "Yiping Zhang"  wrote:

Well,  I can always reproduce it in this particular vSphere set up,  
but in a different ACS+vSphere environment,  I don't see this problem.

Yiping

On 6/5/19, 1:00 AM, "Andrija Panic"  wrote:

Yiping,

if you are sure you can reproduce the issue, it would be good to 
raise a
GitHub issue and provide as much detail as possible.

Andrija
    
    On Wed, 5 Jun 2019 at 05:29, Yiping Zhang 

wrote:

> Hi, Sergey:
>
> Thanks for the tip. After setting vmware.create.full.clone=false, 
 I was
> able to create and start system VM instances.However,  I feel 
that the
> underlying problem still exists, and I am just working around it 
instead of
> fixing it,  because in my lab CloudStack instance with the same 
version of
> ACS and vSphere,  I still have vmware.create.full.clone=true and 
all is
> working as expected.
>
> I did some reading on VMware docs regarding full clone vs. linked 
clone.
> It seems that the best practice is to use full clone for 
production,
> especially if there are high rates of changes to the disks.  So
> eventually,  I need to understand and fix the root cause for this 
issue.
> At least for now,  I am over this hurdle and I can move on.
>
> Thanks again,
>
> Yiping
>
> On 6/4/19, 11:13 AM, "Sergey Levitskiy"  
wrote:
>
> Everything looks good and consistent including all references 
in VMDK
> and its snapshot. I would try these 2 routes:
> 1. Figure out what vSphere error actually means from vmkernel 
log of
> ESX when ACS tries to clone the template. If the same error 
happens while
> doing it outside of ACS then a support case with VMware can be an 
option
> 2. Try using link clones. This can be done by this global 
setting and
> restarting management server
    > vmware.create.full.clonefalse
>
>
> On 6/4/19, 9:57 AM, "Yiping Zhang" 
 wrote:
>
> Hi, Sergey:
>
> Thanks for the help. By now, I have dropped and recreated 
DB,
> re-deployed this zone multiple times, blow away primary and 
secondary
> storage (including all contents on them) , or just delete 
template itself
> from primary storage, multiple times.  Every time I ended up with 
the same
> error at the same place.
>
> The full management server log,  from the point I seeded 
the
> systemvmtemplate for vmware, to deploying a new advanced zone and 
enable
> the zone to let CS to create system VM's and finally disable the 
zone to
> stop infinite loop of trying to recreate failed system VM's,  are 
posted
> at pastebin:
>
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3Rdata=02%7C01%7Cyipzhang%40adobe.com%7C44530fc614da4d42aeb208d6e9e1bf07%7Cfa7b1b5a7b34

Re: Can't start systemVM in a new advanced zone deployment

2019-06-04 Thread Yiping Zhang
Hi, Sergey:

Thanks for the tip. After setting vmware.create.full.clone=false,  I was able 
to create and start system VM instances.However,  I feel that the 
underlying problem still exists, and I am just working around it instead of 
fixing it,  because in my lab CloudStack instance with the same version of ACS 
and vSphere,  I still have vmware.create.full.clone=true and all is working as 
expected.

I did some reading on VMware docs regarding full clone vs. linked clone.  It 
seems that the best practice is to use full clone for production, especially if 
there are high rates of changes to the disks.  So eventually,  I need to 
understand and fix the root cause for this issue.  At least for now,  I am over 
this hurdle and I can move on.

Thanks again,

Yiping

On 6/4/19, 11:13 AM, "Sergey Levitskiy"  wrote:

Everything looks good and consistent including all references in VMDK  and 
its snapshot. I would try these 2 routes:
1. Figure out what vSphere error actually means from vmkernel log of ESX 
when ACS tries to clone the template. If the same error happens while doing it 
outside of ACS then a support case with VMware can be an option
2. Try using link clones. This can be done by this global setting and 
restarting management server
vmware.create.full.clonefalse


On 6/4/19, 9:57 AM, "Yiping Zhang"  wrote:

Hi, Sergey:

Thanks for the help. By now, I have dropped and recreated DB, 
re-deployed this zone multiple times, blow away primary and secondary storage 
(including all contents on them) , or just delete template itself from primary 
storage, multiple times.  Every time I ended up with the same error at the same 
place.

The full management server log,  from the point I seeded the 
systemvmtemplate for vmware, to deploying a new advanced zone and enable the 
zone to let CS to create system VM's and finally disable the zone to stop 
infinite loop of trying to recreate failed system VM's,  are posted  at 
pastebin:


https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3Rdata=02%7C01%7Cyipzhang%40adobe.com%7Cefd6c161a79144ab04da08d6e91868da%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636952688385027720sdata=GmOE6q6APW3SfTLKItw0M2BRnynbuZuuevo1Ly%2F6CnQ%3Dreserved=0

Here are the content of relevant files for the template on primary 
storage:

1) /vmfsvolumes:

ls -l /vmfs/volumes/
total 2052
drwxr-xr-x1 root root 8 Jan  1  1970 
414f6a73-87cd6dac-9585-133ddd409762
lrwxr-xr-x1 root root17 Jun  4 16:37 
42054b8459633172be231d72a52d59d4 -> afc5e946-03bfe3c2  <== this is the 
NFS datastore for primary storage
drwxr-xr-x1 root root 8 Jan  1  1970 
5cd4b46b-fa4fcff0-d2a1-00215a9b31c0
drwxr-xr-t1 root root  1400 Jun  3 22:50 
5cd4b471-c2318b91-8fb2-00215a9b31c0
drwxr-xr-x1 root root 8 Jan  1  1970 
5cd4b471-da49a95b-bdb6-00215a9b31c0
drwxr-xr-x4 root root  4096 Jun  3 23:38 
afc5e946-03bfe3c2
drwxr-xr-x1 root root 8 Jan  1  1970 
b70c377c-54a9d28a-6a7b-3f462a475f73

2) content in template dir on primary storage:

ls -l 
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/
total 1154596
-rw---1 root root  8192 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a-01-delta.vmdk
-rw---1 root root   366 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk
-rw-r--r--1 root root   268 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog
-rw---1 root root  9711 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn
-rw---1 root root 2097152000 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk
-rw---1 root root   518 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a.vmdk
-rw-r--r--1 root root   471 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a.vmsd
-rwxr-xr-x1 root root  1402 Jun  3 23:38 
533b6fcf3fa6301aadcc2b168f3f999a.vmtx

3) *.vmdk file content:

cat 
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmdk
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=ecb01275
parentCID=
isNativeSnapshot="no"
createType="vmfs"

# Extent description
RW 4096000 VMFS "533b6fcf3fa6301aadcc2b168f3f999a-flat.vmdk"

# The Disk Data Base 
#DDB

d

Re: Can't start systemVM in a new advanced zone deployment

2019-06-04 Thread Yiping Zhang
ide0:0.fileName = "CD/DVD drive 0"
ide0:0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk"
scsi0:0.present = "TRUE"
displayName = "533b6fcf3fa6301aadcc2b168f3f999a"
annotation = "systemvmtemplate-4.11.2.0-vmware"
guestOS = "otherlinux-64"
toolScripts.afterPowerOn = "TRUE"
toolScripts.afterResume = "TRUE"
toolScripts.beforeSuspend = "TRUE"
toolScripts.beforePowerOff = "TRUE"
uuid.bios = "42 02 f1 40 33 e8 de e5-1a c5 93 2a c9 12 47 61"
vc.uuid = "50 02 5b d9 e9 c9 77 86-28 3e 84 00 22 2b eb d3"
firmware = "bios"
migrate.hostLog = "533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog"


6) *.vmsd file content:

cat 
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a.vmsd
.encoding = "UTF-8"
snapshot.lastUID = "1"
snapshot.current = "1"
snapshot0.uid = "1"
snapshot0.filename = "533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn"
snapshot0.displayName = "cloud.template.base"
snapshot0.description = "Base snapshot"
snapshot0.createTimeHigh = "363123"
snapshot0.createTimeLow = "-679076964"
snapshot0.numDisks = "1"
snapshot0.disk0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a.vmdk"
snapshot0.disk0.node = "scsi0:0"
snapshot.numSnapshots = "1"

7) *-Snapshot1.vmsn content:

cat 
/vmfs/volumes/42054b8459633172be231d72a52d59d4/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-Snapshot1.vmsn
 
ҾSnapshot\?%?cfgFilet%t%.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "8"
nvram = "533b6fcf3fa6301aadcc2b168f3f999a.nvram"
pciBridge0.present = "TRUE"
svga.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
hpet0.present = "TRUE"
floppy0.present = "FALSE"
memSize = "256"
scsi0.virtualDev = "lsilogic"
scsi0.present = "TRUE"
ide0:0.startConnected = "FALSE"
ide0:0.deviceType = "atapi-cdrom"
ide0:0.fileName = "CD/DVD drive 0"
ide0:0.present = "TRUE"
scsi0:0.deviceType = "scsi-hardDisk"
scsi0:0.fileName = "533b6fcf3fa6301aadcc2b168f3f999a.vmdk"
scsi0:0.present = "TRUE"
displayName = "533b6fcf3fa6301aadcc2b168f3f999a"
annotation = "systemvmtemplate-4.11.2.0-vmware"
guestOS = "otherlinux-64"
toolScripts.afterPowerOn = "TRUE"
toolScripts.afterResume = "TRUE"
toolScripts.beforeSuspend = "TRUE"
toolScripts.beforePowerOff = "TRUE"
uuid.bios = "42 02 f1 40 33 e8 de e5-1a c5 93 2a c9 12 47 61"
vc.uuid = "50 02 5b d9 e9 c9 77 86-28 3e 84 00 22 2b eb d3"
firmware = "bios"
migrate.hostLog = "533b6fcf3fa6301aadcc2b168f3f999a-7d5d73de.hlog"




That's all the data on the template VMDK.

Much appreciate your time!

Yiping



On 6/4/19, 9:29 AM, "Sergey Levitskiy"  wrote:

Have you tried deleting template from PS and let ACS to recopy it again? If 
the issue is reproducible we can try to look what is wrong with VMDK. Please 
post content of 533b6fcf3fa6301aadcc2b168f3f999a.vmdk , 
533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk and 
533b6fcf3fa6301aadcc2b168f3f999a.vmx (their equitant after ACS finishes copying 
template). Also from one of your ESX hosts output of this
ls -al /vmfs/volumes
ls -al /vmfs/volumes/*/533b6fcf3fa6301aadcc2b168f3f999a (their equitant 
after ACS finishes copying template)

 Can you also post management server log starting from the point you 
unregister and delete template from the vCenter.

On 6/4/19, 8:37 AM, "Yiping Zhang"  wrote:

I have manually imported the OVA to vCenter and successfully cloned a 
VM instance with it, on the same NFS datastore.


On 6/4/19, 8:25 AM, "Sergey Levitskiy"  wrote:

I would suspect the template is corrupted on the secondary storage. 
You can try disabling/enabling link clone feature and see if it works the other 
way. 
vmware.create.full.clonefalse

Also systemVM template might have been generated on a ne

Re: Can't start systemVM in a new advanced zone deployment

2019-06-04 Thread Yiping Zhang
I have manually imported the OVA to vCenter and successfully cloned a VM 
instance with it, on the same NFS datastore.


On 6/4/19, 8:25 AM, "Sergey Levitskiy"  wrote:

I would suspect the template is corrupted on the secondary storage. You can 
try disabling/enabling link clone feature and see if it works the other way. 
vmware.create.full.clonefalse

Also systemVM template might have been generated on a newer version of 
vSphere and not compatible with ESXi 6.5. What you can do to validate this is 
to manually deploy OVA that is in Secondary storage and try to spin up VM from 
it directly in vCenter.



On 6/3/19, 5:41 PM, "Yiping Zhang"  wrote:

Hi, list:

I am struggling with deploying a new advanced zone using ACS 4.11.2.0 + 
vSphere 6.5 + NetApp volumes for primary and secondary storage devices. The 
initial setup of CS management server, seeding of systemVM template, and 
advanced zone deployment all went smoothly.

Once I enabled the zone in web UI and the systemVM template gets 
copied/staged on to primary storage device. But subsequent VM creations from 
this template would fail with errors:


2019-06-03 18:38:15,764 INFO  [c.c.h.v.m.HostMO] 
(DirectAgent-7:ctx-d01169cb esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) VM 533b6fcf3fa6301aadcc2b168f3f999a not found in host cache

2019-06-03 18:38:17,017 INFO  [c.c.h.v.r.VmwareResource] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) VmwareStorageProcessor and VmwareStorageSubsystemCommandHandler 
successfully reconfigured

2019-06-03 18:38:17,128 INFO  [c.c.s.r.VmwareStorageProcessor] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) creating full clone from template

2019-06-03 18:38:17,657 INFO  [c.c.h.v.u.VmwareHelper] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) [ignored]failed toi get message for exception: Error caused by 
file 
/vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk

2019-06-03 18:38:17,658 ERROR [c.c.s.r.VmwareStorageProcessor] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) clone volume from base image failed due to Exception: 
java.lang.RuntimeException

Message: Error caused by file 
/vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk



If I try to create “new VM from template” 
(533b6fcf3fa6301aadcc2b168f3f999a) on vCenter UI manually,  I will receive 
exactly the same error message. The name of the VMDK file in the error message 
is a snapshot of the base disk image, but it is not part of the original 
template OVA on the secondary storage.  So, in the process of copying the 
template from secondary to primary storage, a snapshot got created and the disk 
became corrupted/unusable.

Much later in the log file,  there is another error message “failed to 
fetch any free public IP address” (for ssvm, I think).  I don’t know if these 
two errors are related or if one is the root cause for the other error.

The full management server log is uploaded as 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2Fc05wiQ3Rdata=02%7C01%7Cyipzhang%40adobe.com%7C1541cabf6c9b42873e0708d6e900da78%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636952587209705577sdata=qDALhWV4NoLWyidmyz6f1MUSERFWb9XN%2B7GtaAnExaU%3Dreserved=0

Any help or insight on what went wrong here are much appreciated.

Thanks

Yiping






Can't start systemVM in a new advanced zone deployment

2019-06-03 Thread Yiping Zhang
Hi, list:

I am struggling with deploying a new advanced zone using ACS 4.11.2.0 + vSphere 
6.5 + NetApp volumes for primary and secondary storage devices. The initial 
setup of CS management server, seeding of systemVM template, and advanced zone 
deployment all went smoothly.

Once I enabled the zone in web UI and the systemVM template gets copied/staged 
on to primary storage device. But subsequent VM creations from this template 
would fail with errors:


2019-06-03 18:38:15,764 INFO  [c.c.h.v.m.HostMO] (DirectAgent-7:ctx-d01169cb 
esx-0001-a-001.example.org, job-3/job-29, cmd: CopyCommand) VM 
533b6fcf3fa6301aadcc2b168f3f999a not found in host cache

2019-06-03 18:38:17,017 INFO  [c.c.h.v.r.VmwareResource] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) VmwareStorageProcessor and VmwareStorageSubsystemCommandHandler 
successfully reconfigured

2019-06-03 18:38:17,128 INFO  [c.c.s.r.VmwareStorageProcessor] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) creating full clone from template

2019-06-03 18:38:17,657 INFO  [c.c.h.v.u.VmwareHelper] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) [ignored]failed toi get message for exception: Error caused by 
file 
/vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk

2019-06-03 18:38:17,658 ERROR [c.c.s.r.VmwareStorageProcessor] 
(DirectAgent-4:ctx-08b54fbd esx-0001-a-001.example.org, job-3/job-29, cmd: 
CopyCommand) clone volume from base image failed due to Exception: 
java.lang.RuntimeException

Message: Error caused by file 
/vmfs/volumes/afc5e946-03bfe3c2/533b6fcf3fa6301aadcc2b168f3f999a/533b6fcf3fa6301aadcc2b168f3f999a-01.vmdk



If I try to create “new VM from template” (533b6fcf3fa6301aadcc2b168f3f999a) on 
vCenter UI manually,  I will receive exactly the same error message. The name 
of the VMDK file in the error message is a snapshot of the base disk image, but 
it is not part of the original template OVA on the secondary storage.  So, in 
the process of copying the template from secondary to primary storage, a 
snapshot got created and the disk became corrupted/unusable.

Much later in the log file,  there is another error message “failed to fetch 
any free public IP address” (for ssvm, I think).  I don’t know if these two 
errors are related or if one is the root cause for the other error.

The full management server log is uploaded as https://pastebin.com/c05wiQ3R

Any help or insight on what went wrong here are much appreciated.

Thanks

Yiping


Error running configureVirtualRouterElement api

2019-05-27 Thread Yiping Zhang
Hi, List:



I am trying to run configureVirtualRouterelement API directly, but it throws an 
error.  Is this a known bug in 4.11.2.0?



I am using the cs python client to run api’s.  Following are the command 
outputs:







# cs listNetworkServiceProviders name=VirtualRouter 
physicalnetworkid=a493b1a4-333a-404a-af82-5c4bbdf6a467

{

  "count": 1,

  "networkserviceprovider": [

{

  "canenableindividualservice": true,

  "id": "b69088bb-ec44-41fa-ab19-3d1a00c11b59",

  "name": "VirtualRouter",

  "physicalnetworkid": "a493b1a4-333a-404a-af82-5c4bbdf6a467",

  "servicelist": [

"Vpn",

"Dhcp",

"Dns",

"Gateway",

"Firewall",

"Lb",

"SourceNat",

"StaticNat",

"PortForwarding",

"UserData"

  ],

  "state": "Disabled"

}

  ]

}

#

# cs configureVirtualRouterElement enabled=true 
id=b69088bb-ec44-41fa-ab19-3d1a00c11b59

CloudStack error: HTTP 431 response from CloudStack

{u'errorcode': 431, u'uuidList': [], u'cserrorcode': , u'errortext': 
u'Unable to execute API command configurevirtualrouterelement due to invalid 
value. Invalid parameter id value=b69088bb-ec44-41fa-ab19-3d1a00c11b59 due to 
incorrect long value format, or entity does not exist or due to incorrect 
parameter annotation for the field in api cmd class.'}

{

  "configurevirtualrouterelementresponse": {

"cserrorcode": ,

"errorcode": 431,

"errortext": "Unable to execute API command configurevirtualrouterelement 
due to invalid value. Invalid parameter id 
value=b69088bb-ec44-41fa-ab19-3d1a00c11b59 due to incorrect long value format, 
or entity does not exist or due to incorrect parameter annotation for the field 
in api cmd class.",

"uuidList": []

  }

}
#

Any one has some clues what’s going on here?

Thanks

Yiping


Re: questions on cloudstack and VMware hypervisors

2019-05-22 Thread Yiping Zhang
Hi, Dag:

Thanks for the quick reply.

Yea, that is it!. The custom attribute "cloud.zone" was set to true for the 
datacenter in vCenter.  Once I delete that attribute, or change its value to 
"false",  I could re-associate the datacenter with new CloudStack zone.  If I 
delete the VMware datacenter from zone in web UI, its value will be set to 
false.

Regarding the ansible modules,  I am wondering why did you choose to generate a 
shell script to deploy a CloudStack env, instead of directly do it with 
Ansible?  Is it because the Ansible module for CloudStack is still incomplete 
to fulfill all tasks or for some other historic reasons?  For example, I could 
have missed it,  but I could not find a module to add a VMware datacenter to a 
zone.

Yiping

On 5/22/19, 1:49 AM, "Dag Sonstebo"  wrote:

Hi Yiping,

See 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.vmware.com%2Fen%2FVMware-vSphere%2F6.5%2Fcom.vmware.vsphere.vcenterhost.doc%2FGUID-EC0F7308-96AE-4089-9DD4-B42AF50AABDC.htmldata=02%7C01%7Cyipzhang%40adobe.com%7C205d61533fba49494a6408d6de9261c2%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636941117627901428sdata=sf4At4YVoG957qgbjzBHBSlXNCbX%2BQrP62RUZqExgJQ%3Dreserved=0

You should find a custom attribute to your virtual DC called something like 
"cloud.zone" which I believe you have to wipe before you can re-add it to a new 
CloudStack instance. If the above doesn't work you may have to drop to command 
line and do it with something like PowerCLI.

With regards to deploying a full zone this is part of what Trillian 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fshapeblue%2FTrilliandata=02%7C01%7Cyipzhang%40adobe.com%7C205d61533fba49494a6408d6de9261c2%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636941117627901428sdata=wL65BH1bbJdG54fOlkRE5F1iRcSyT24p1aj0VqPUBFg%3Dreserved=0
 does in it's cloudstack-config role - but please be aware this role simply 
populates a "deployzone.sh" jinja template with the right options before 
running this as a normal bash script.

Regards,
Dag Sonstebo
Cloud Architect
    ShapeBlue
 

On 22/05/2019, 06:31, "Yiping Zhang"  wrote:

Hi, all:

I am creating a new CloudStack 4.11.2.0 instance with VMware 
hypervisors.   I have some general questions regarding such a set up:

My first attempt had some errors, so I decided to blow away the 
CloudStack database and start from scratch again.  Now when I try to add VMware 
datacenter to the zone, I receive an error message saying that: “Failed to add 
VMware DC to zone due to : This DC is being managed by other CloudStack 
deployment. Cannot add this DC to zone”

This sounds like my last try has left something on the datacenter in 
vCenter so that I can’t associate it to another zone any more. If this is true, 
do I have to blow away the datacenter in vCenter as well to start over again? 
Is there any way to clean up on the vCenter side without rebuilding the 
datacenter?  In hindsight, I should have deleted all CloudStack objects from 
web GUI instead of taking the nuclear option of blowing away the database!

A related question,  has anyone used CloudStack ansible module to 
deploy a complete zone (including all necessary objects zone/pod/physical 
networks/cluster/host/primary_storage/secondary_storage etc)?  If so, would you 
mind sharing your ansible playbook(s)?

Thanks,

Yiping



dag.sonst...@shapeblue.com 

https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C205d61533fba49494a6408d6de9261c2%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636941117627901428sdata=WnFWm4stluvC%2FpxXgB2nJZ1OiUDZz8XVooSxESC%2Bz70%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 





questions on cloudstack and VMware hypervisors

2019-05-21 Thread Yiping Zhang
Hi, all:

I am creating a new CloudStack 4.11.2.0 instance with VMware hypervisors.   I 
have some general questions regarding such a set up:

My first attempt had some errors, so I decided to blow away the CloudStack 
database and start from scratch again.  Now when I try to add VMware datacenter 
to the zone, I receive an error message saying that: “Failed to add VMware DC 
to zone due to : This DC is being managed by other CloudStack deployment. 
Cannot add this DC to zone”

This sounds like my last try has left something on the datacenter in vCenter so 
that I can’t associate it to another zone any more. If this is true, do I have 
to blow away the datacenter in vCenter as well to start over again? Is there 
any way to clean up on the vCenter side without rebuilding the datacenter?  In 
hindsight, I should have deleted all CloudStack objects from web GUI instead of 
taking the nuclear option of blowing away the database!

A related question,  has anyone used CloudStack ansible module to deploy a 
complete zone (including all necessary objects zone/pod/physical 
networks/cluster/host/primary_storage/secondary_storage etc)?  If so, would you 
mind sharing your ansible playbook(s)?

Thanks,

Yiping


Re: ACS management server running out of loop back devices

2019-04-24 Thread Yiping Zhang
Hi, Andrija:

Thanks for looking into it.

In my case,  there is no log entry for "Failed to unmount old iso" messages. 
The loopback devices are not mounted at all, at least when I was checking. BTW, 
 what's the mount point for ths lopback device for systemvm.iso? I can't tell 
if the loopback device being mounted at all, or failed to mount, or something 
else.  The systemvm.iso has 644 permissions on the file system.  I think the 
problem is that after some sort of failure,  the loopback device should have 
been deleted, rather than left behind.  Of course,  I still need to figure out 
what caused the failure in the first place.

Yiping

On 4/24/19, 10:47 AM, "Andrija Panic"  wrote:

Hi Yiping,

Based on 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcloudstack%2Fblob%2F4.11.2.0%2Fscripts%2Fvm%2Fsystemvm%2Finjectkeys.shdata=02%7C01%7Cyipzhang%40adobe.com%7C680d0eb544d441baa33808d6c8dce3b5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636917248422370400sdata=YEoGkKYT96QuMwUGZQ03YPLN%2BnXERXGyKn6NIxr0B9k%3Dreserved=0
 , I would say to see what keeps these loop devices mounted - i.e. if you can 
unmount them manually.

Based on code from above, unmount is run, but it might fail in your 
environment due to different things. Also check systemvm.iso permissions.

Grep logs for " Failed to unmount old iso" lines...

Best,
Andrija

andrija.pa...@shapeblue.com 

https://nam04.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C680d0eb544d441baa33808d6c8dce3b5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636917248422370400sdata=iGv8Z834UeQz6n%2BNjp6ZG8vhlf107I8QzpoUGbv7qtc%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-
From: Yiping Zhang  
Sent: 24 April 2019 18:42
To: users@cloudstack.apache.org
Subject: ACS management server running out of loop back devices

Hi,

My lab ACS server (version 4.11.2.0) recently starts to die off a few hours 
after a restart, with following error message in the log:


2019-04-24 10:38:35,237 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Processing updateKeyPairs

2019-04-24 10:38:35,237 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Keypairs already in database, updating local copy

2019-04-24 10:38:35,241 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Going to update systemvm iso with generated keypairs if needed

2019-04-24 10:38:35,241 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Trying to inject public and private keys into systemvm iso

2019-04-24 10:38:35,288 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Injected public and private keys into systemvm iso with result : mount: could 
not find any free loop device

2019-04-24 10:38:35,288 WARN  [c.c.s.ConfigurationServerImpl] (main:null) 
Failed to inject generated public key into systemvm iso mount: could not find 
any free loop device

2019-04-24 10:38:35,290 WARN  [o.a.c.s.m.c.ResourceApplicationContext] 
(main:null) Exception encountered during context initialization - cancelling 
refresh attempt: org.springframework.context.ApplicationContextException: 
Failed to start bean 'cloudStackLifeCycle'; nested exception is 
com.cloud.utils.exception.CloudRuntimeException: Failed to inject generated 
public key into systemvm iso mount: could not find any free loop device

2019-04-24 10:38:35,291 WARN  [o.e.j.w.WebAppContext] (main:null) Failed 
startup of context 
o.e.j.w.WebAppContext@78a2da20{/client,file:///usr/share/cloudstack-management/webapp/,UNAVAILABLE}{/usr/share/cloudstack-management/webapp}

org.springframework.context.ApplicationContextException: Failed to start 
bean 'cloudStackLifeCycle'; nested exception is 
com.cloud.utils.exception.CloudRuntimeException: Failed to inject generated 
public key into systemvm iso mount: could not find any free loop device


And sure enough, all /dev/loopX are in use:


# losetup -a

/dev/loop0: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop1: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop2: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop3: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop4: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop5: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop6: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop7: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)
#

Recent changes in the lab includes adding a VMware cluster, registered new 
systemvm templates for VMware, and creat

ACS management server running out of loop back devices

2019-04-24 Thread Yiping Zhang
Hi,

My lab ACS server (version 4.11.2.0) recently starts to die off a few hours 
after a restart, with following error message in the log:


2019-04-24 10:38:35,237 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Processing updateKeyPairs

2019-04-24 10:38:35,237 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Keypairs already in database, updating local copy

2019-04-24 10:38:35,241 INFO  [c.c.s.ConfigurationServerImpl] (main:null) Going 
to update systemvm iso with generated keypairs if needed

2019-04-24 10:38:35,241 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Trying to inject public and private keys into systemvm iso

2019-04-24 10:38:35,288 INFO  [c.c.s.ConfigurationServerImpl] (main:null) 
Injected public and private keys into systemvm iso with result : mount: could 
not find any free loop device

2019-04-24 10:38:35,288 WARN  [c.c.s.ConfigurationServerImpl] (main:null) 
Failed to inject generated public key into systemvm iso mount: could not find 
any free loop device

2019-04-24 10:38:35,290 WARN  [o.a.c.s.m.c.ResourceApplicationContext] 
(main:null) Exception encountered during context initialization - cancelling 
refresh attempt: org.springframework.context.ApplicationContextException: 
Failed to start bean 'cloudStackLifeCycle'; nested exception is 
com.cloud.utils.exception.CloudRuntimeException: Failed to inject generated 
public key into systemvm iso mount: could not find any free loop device

2019-04-24 10:38:35,291 WARN  [o.e.j.w.WebAppContext] (main:null) Failed 
startup of context 
o.e.j.w.WebAppContext@78a2da20{/client,file:///usr/share/cloudstack-management/webapp/,UNAVAILABLE}{/usr/share/cloudstack-management/webapp}

org.springframework.context.ApplicationContextException: Failed to start bean 
'cloudStackLifeCycle'; nested exception is 
com.cloud.utils.exception.CloudRuntimeException: Failed to inject generated 
public key into systemvm iso mount: could not find any free loop device


And sure enough, all /dev/loopX are in use:


# losetup -a

/dev/loop0: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop1: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop2: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop3: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop4: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop5: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop6: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)

/dev/loop7: [ca06]:1315130 (/usr/share/cloudstack-common/vms/systemvm.iso)
#

Recent changes in the lab includes adding a VMware cluster, registered new 
systemvm templates for VMware, and created our own template for VMware.

It looks like the updateKeyPairs process runs once an hour, and it failed to 
clean up the loopback device.  So, in about 8 hours, the management server 
would run out loopback devices and dies.

Any suggestions how I troubleshoot this further?

Thanks

Yiping


Re: CloudStack affinity group and vSphere DRS

2019-03-28 Thread Yiping Zhang
Thanks Andrija,

It sure would be nice if CloudStack's anti-host affinity group could be mapped 
to vSphere's corresponding native affinity rules, or any hypervisor's such 
native features if available, for a better integration and experience.

Regards,
Yiping

On 3/28/19, 1:34 PM, "Andrija Panic"  wrote:

Hi Yiping,

unless I'm mistaken, CloudStack should have nothing to do with DRS - it
only initially chooses the host to which to deploy a VM to, but later
VMware DRS can kick in and move it on it's own.

Cheers,
Andrija

On Thu, 28 Mar 2019 at 21:10, Yiping Zhang 
wrote:

> Hi, All:
>
> I started playing with vSphere 6.5 hypervisors under ACS 4.11.2.0 in our
> lab and have some questions on this set up.
>
> According to release notes, VMware DRS support in CloudStack was added in
> ACS 4.4.4 (here:
> 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdocs.cloudstack.apache.org%2Fprojects%2Fcloudstack-release-notes%2Fen%2F4.4.4%2Fabout.html%23vmware-support-for-drsdata=02%7C01%7Cyipzhang%40adobe.com%7Cf61e59ca65674798129308d6b3bcc0c5%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636894020605423276sdata=apKnRSRa1L9RFtBnPscnr%2BWa6rUaUM46bTTOf6tv19M%3Dreserved=0).
> However,  in my CPU load tests,  VM instances are moved around by DRS in
> violation of CloudStack anti-host affinity group assignment for these VM
> instances.
>
> Did I configure it wrong or missed some step?  If anyone else are using
> these features, please share your experiences.
>
> Thanks
>
> Yiping
>


-- 

Andrija Panić




CloudStack affinity group and vSphere DRS

2019-03-28 Thread Yiping Zhang
Hi, All:

I started playing with vSphere 6.5 hypervisors under ACS 4.11.2.0 in our lab 
and have some questions on this set up.

According to release notes, VMware DRS support in CloudStack was added in ACS 
4.4.4 (here: 
http://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/4.4.4/about.html#vmware-support-for-drs).
  However,  in my CPU load tests,  VM instances are moved around by DRS in 
violation of CloudStack anti-host affinity group assignment for these VM 
instances.

Did I configure it wrong or missed some step?  If anyone else are using these 
features, please share your experiences.

Thanks

Yiping


Re: how to run rhel 6.x VM as PV VM on xenserver 7.1CU1?

2019-01-31 Thread Yiping Zhang
Hi, Andrija:

I am willing to try this approach given that we are working in a lab 
environment. Otherwise we would have to downgrade to use XenServer 7.1 + 
installing security patches afterwards

Since we also need to add one new entry in hypervisor_capabilities table for 
XenServer 7.1.1 and there is no API to *add* new entry into this table, we 
decided to use SQL directly.  After reading source code in 
cloudstack/engine/schema/src/main/resources/META-INF/db directory on Github, we 
come up with following SQL statements to seed DB tables for XenServer 7.1CU1 
support:

INSERT IGNORE INTO hypervisor_capabilities (uuid, hypervisor_type, 
hypervisor_version, max_guests_limit, max_data_volumes_limit, 
storage_motion_supported) values (UUID(), "XenServer", "7.1.1", 500, 13, 1);

INSERT IGNORE INTO guest_os_hypervisor (uuid,hypervisor_type, 
hypervisor_version, guest_os_name, guest_os_id, created, is_user_defined)
  SELECT UUID(),"Xenserver", "7.1.1", guest_os_name, guest_os_id, 
utc_timestamp(), 0
  FROM guest_os_hypervisor
  WHERE hypervisor_type="Xenserver"
  AND hypervisor_version="7.1.0";

After executing these two SQL statements,  and restarting management service, 
all my RHEL 6.x VM instances can be started successfully as PV instances. Now 
we just have to do a lot more validation checks to make sure all is well, 
especially with our own particular setups and usages.  I'd appreciate very much 
if anyone else could send their feedbacks and gotchas if they have done 
anything similar and any area I may need to pay more attention with.

Thanks

Yiping



On 1/30/19, 2:34 AM, "Andrija Panic"  wrote:

Hi Yiping,

As far as I can expect, patch level should not break any functionality with 
ACS (except, obviously, guest OS mappings...) so I assume it should work same 
was as unpatched/vanila version.


https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcloudstack.apache.org%2Fapi%2Fapidocs-4.11%2Fapis%2FaddGuestOsMapping.htmldata=02%7C01%7Cyipzhang%40adobe.com%7C7be8d8c0b54748293cc308d6869e9335%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636844412977844955sdata=x6cRVNkrIw2bN87nibCtwlgwfZ3GfWm%2Buxsg4R7oyVg%3Dreserved=0

Please use above API call to make needed guest OS mapping - i.e. observe 
needed (or all ???) OS types (ID value from guest_os table for each guest OS 
type in ACS) and use it to generate appropriate API calls which will create 
missing mapping records inside guest_os_hypervisor table).

Alternatively,  just copy all 192 rows that you have for 7.1.0 - duplicate 
all these rows with changing hypervisor_version to 7.1.1 - I assume mgmt. 
restart might be needed, but since test env, doesn't hurt.

Let me know how this worked for you,

Best
Andrija

andrija.pa...@shapeblue.com 

https://na01.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7C7be8d8c0b54748293cc308d6869e9335%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636844412977844955sdata=XeE6PL2DHQZhBlkXLA6o3bjIqfRZFTklrFBp85naDeU%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 


-Original Message-
From: Yiping Zhang  
Sent: 29 January 2019 21:54
To: users@cloudstack.apache.org
Subject: Re: how to run rhel 6.x VM as PV VM on xenserver 7.1CU1?

Hi, Andrija:

I think you are

 on to something here:

Here are my query results for these sql statements:

mysql> select id,name,hypervisor_type,hypervisor_version from host where 
type="Routing" and removed is NULL;
++---+-++
| id | name  | hypervisor_type | hypervisor_version |
++---+-++
| 56 | lab-hv03 | XenServer   | 7.1.1  |
| 57 | lab-hv02 | XenServer   | 7.1.1  |
| 58 | lab-hv04 | XenServer   | 7.1.1  |
++---+-++
3 rows in set (0.00 sec)

mysql> SELECT count(*) FROM guest_os_hypervisor WHERE 
hypervisor_type="Xenserver" AND hypervisor_version = "7.1.0";
+--+
| count(*) |
+--+
|  192 |
+--+
1 row in set (0.00 sec)

mysql> SELECT count(*) FROM guest_os_hypervisor WHERE 
hypervisor_type="Xenserver" AND hypervisor_version = "7.1.1";
+--+
| count(*) |
+--+
|0 |
+--+
1 row in set (0.01 sec)

mysql>


As can be seen,  there are 192 entries for hypervisor_version "7.1.0", but 
zero entries for "7.1.1" which is what my hypervisors are.  I went back to read 
4.11.2.0 release 

Re: how to run rhel 6.x VM as PV VM on xenserver 7.1CU1?

2019-01-29 Thread Yiping Zhang
che.org
Subject: RE: how to run rhel 6.x VM as PV VM on xenserver 7.1CU1?

Hi Yiping,

If you do the following SQL:SELECT * FROM guest_os_hypervisor WHERE 
hypervisor_type="XenServer" and hypervisor_version="7.1.0" AND guest_os_id IN 
(SELECT id FROM guest_os WHERE display_name="CentOS 6.4 (64-bit)");
+---+
| guest_os_name |
+---+
| CentOS 6 (64-bit) |
+---+

It basically shows you, that for ACS OS type called "CentOS 6.4 (64-bit)", 
this is translated/matches/mapped to the "CentOS 6 (64-bit)" as seen from the 
XenServer 7.1.x hypervisor. 
Now, there IS a possibility that some of these mappings is incorrect...
I would just go to XenServer and try to deploy manually (via XenCenter etc) 
a VM with selecting the OS type from above SQL results set (CentOS 6 (64-bit)) 
and observe if it gets provisioned as HVM or PV - if XenServer brings it as 
HVM, then you can see it's XenServer making it HVM. But if XenServer starts 
that manually deployed VM as PV - then we can assume some wrong mapping from 
ACS side.

If you have time to test this, I'm also interested in root cause - since I 
saw with 7.1.x XenServer that you can't even restore a VM from snapshot, if you 
change OS type on existing VM in ACS from, i.e. centos 6.4 to 6.5 etc.

Kind regards,
Andrija

andrija.pa...@shapeblue.com

https://na01.safelinks.protection.outlook.com/?url=www.shapeblue.comdata=02%7C01%7Cyipzhang%40adobe.com%7Caf6dbfb87b194e4324f508d685c48f5d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636843476632262651sdata=ZQ2uFwhpHXEyz1xHFrJ3ry8wBKdEzp99YhUQnIqoFWM%3Dreserved=0
Amadeus House, Floral Street, London  WC2E 9DPUK @shapeblue
  
 


-Original Message-
From: Yiping Zhang 
Sent: 28 January 2019 23:16
To: users@cloudstack.apache.org
Subject: how to run rhel 6.x VM as PV VM on xenserver 7.1CU1?

Hi, All:

I have a large number of RHEL 6.x VM instances running in our ACS 
environment. Last time when I upgraded our XenServer from 6.5SP1 to 7.0,   I 
have to change my templates to assign OS TYPE as “RHEL 6.4 (64bit)” so that my 
VM instances can be started as PV instances. Anything above “RHEL 6.5 (64bits)” 
would be started as HVM instances and they would get stuck during boot.

Last week, after I upgraded my lab hypervisors to use XenServer 7.1CU1,  
all my (lab) rhel 6.x VM instances would get started as HVM instances, thus 
stuck during boot.  I even tried to change template’s OS TYPE to other types 
such as “rhel 5.10/5.0/6.0/, Other PV(64bit)” etc without any luck.

What did I miss?  My lab is running ACS 4.11.2.0 packages from Shapeblue.  
According to Citrix document, 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.citrix.com%2Fen-us%2Fxenserver%2F7-1%2Fdownloads%2Fvm-users-guide.pdfdata=02%7C01%7Cyipzhang%40adobe.com%7Caf6dbfb87b194e4324f508d685c48f5d%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C1%7C636843476632262651sdata=9Y9I8YNoj0gh%2B10zqjh99LFl9rMmLT1Vo0EVFRWPv38%3Dreserved=0,
  RHEL 6.x should always be started as PV instance. So why I only get HVM 
instances instead?

Is it CloudStack or XenServer which decides an instance to be started as PV 
or HVM instance?

Thanks, all helps are appreciated.

Yiping







Re: Possible bug fix - sanity check please

2019-01-24 Thread Yiping Zhang
Hi, Jon:

Would you please describe this bug a little more? How do I reproduce it?  Is 
there a Jira or Github issue number for it?

It sounds like a bug in 4.11.2.0 affecting VM live migration.  I am in the 
middle of upgrading to 4.11.2.0, and on my lab system I see that the line 488 
of file /usr/share/cloudstack-common/scripts/vm/network/security_group.py does 
have a ";" instead of a ":".

Thanks,

Yiping


On 1/24/19, 12:54 AM, "Jon Marshall"  wrote:

Please ignore, it has already been fixed but it is not included in the 
4.11.2 release (due in the 4.11.3 one).


From: Jon Marshall 
Sent: 23 January 2019 15:30
To: users@cloudstack.apache.org
Subject: Possible bug fix - sanity check please

The following issue was seen using  CS 4.11.2 in advanced mode with 
security group isolation.

VM (internal name i-2-29-VM)  - is created and works fine with default 
security group allowing inbound SSH and ICMP echo request.

Migrate the VM to another of the compute nodes and the VM migrates and from 
the proxy console the VM can connect out but the default security group inbound 
is not copied across the compute node.   The 
/var/log/cloudstack/agent/security_group.log shows on the compute node the VM 
has migrated to -

2019-01-18 14:54:25,724 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,724 - ebtables -t nat -F i-2-29-VM-out
2019-01-18 14:54:25,730 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,730 - ebtables -t nat -F i-2-29-VM-in-ips
2019-01-18 14:54:25,735 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,735 - ebtables -t nat -F i-2-29-VM-out-ips
2019-01-18 14:54:25,740 - Ignoring failure to delete ebtables chain for vm 
i-2-29-VM
2019-01-18 14:54:25,741 - iptables -N i-2-29-VM
2019-01-18 14:54:25,745 - ip6tables -N i-2-29-VM
2019-01-18 14:54:25,749 - iptables -N i-2-29-VM-eg
2019-01-18 14:54:25,753 - ip6tables -N i-2-29-VM-eg
2019-01-18 14:54:25,758 - iptables -N i-2-29-def
2019-01-18 14:54:25,763 - ip6tables -N i-2-29-def
2019-01-18 14:54:25,767 - Creating ipset chain  i-2-29-VM
2019-01-18 14:54:25,768 - ipset -F i-2-29-VM
2019-01-18 14:54:25,772 - ipset chain not exists creating i-2-29-VM
2019-01-18 14:54:25,772 - ipset -N i-2-29-VM iphash family inet
2019-01-18 14:54:25,777 - vm ip 172.30.6.62019-01-18 14:54:25,777 - ipset 
-A i-2-29-VM 172.30.6.60
2019-01-18 14:54:25,782 - Failed to network rule !
Traceback (most recent call last):
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 995, in add_network_rules
default_network_rules(vmName, vm_id, vm_ip, vm_ip6, vmMac, vif, brname, 
sec_ips)
  File "/usr/share/cloudstack-common/scripts/vm/network/security_group.py", 
line 490, in default_network_rules
if ips[0] == "0":
IndexError: list index out of range

 Added a few lines to debug the script security_group.py and it would 
appear this line (line 487) is the culprit -

ips = sec_ips.split(';')

as far as I can tell the separator should be a colon (':') and not a semi 
colon or at least on my setup.  Once changed to -

ips = sec_ips.split(':')

the iptables rules were updated correctly on the host the VM was migrated 
to.

I don't know if this is the right change to make as the script is over a 
1000 lines long and imports other modules so woudl appreciate any input as this 
seems to be a key function of Advanced with security groups.

Thanks

Jon






what happened to /dev/xvdd in my instance?

2018-12-06 Thread Yiping Zhang
Hi, All:

I am working on a script to handle data disks. So, I added a few data disks to 
my instance. Then I noticed that in listVolumes api output, the third data disk 
actually has a deviceid 4, instead of 3 as I would expect.  Sure enough, in the 
guest OS,  I see three data disks as /dev/xvdb, /dev/xvdc, and /dev/xvde.  
There is no device /dev/xvdd.

Why does CloudStack skip device /dev/xvdd?  I have verified this behavior on 
both ACS versions 4.9.3.0 and 4.11.2.0.

Yiping


Re: number of cores

2018-11-19 Thread Yiping Zhang
Eric:

If you change any of memory or cpu over provisioning factors, you need to 
restart (from GUI or API) all running VM instances, otherwise the reported 
allocation/available numbers would still reflect data before your changes. If 
you also add/delete instances at the same time, you could lose track of actual 
allocated and available resources very quickly until you restart all your 
instances.

Yiping

On 11/19/18, 3:59 PM, "Eric Lee Green"  wrote:

On 11/19/18 3:47 PM, Yiping Zhang wrote:
> Eric:
>
> What's your value for global setting cpu.overprovisioning.factor?
>
> I have this value set to 3.0. Right now, one of my servers with 32 cores 
@ 2.0 GHz (with HT enabled), I can allocate a total of 79 vCPU and 139 GHz to 
26 VM instances.  That's over 200% over provisioning!

I changed it to 4.0 and restarted the management server first thing. It 
started out at 1.5. At 1.5, my zone shows 41% usage with the typical 
workload -- 169.60Ghz / 409.78Ghz.  I have 2x24x3.03ghz and 1x24x2.40ghz 
servers for a total of 203.04Ghz actual, so even without the multiplier 
I'm not over provisioning my CPU Mhz.

> On 11/19/18, 6:43 AM, "Andrija Panic"  wrote:
>
>  Unless someone gives you better answer, I guess it's for fun - to 
have more
>  detailed numbers in dashboard (may be it's related to other 
hypervisor
>  types, just assuming... or not...)
>  
>  Cheers
>  
>  On Mon, 19 Nov 2018 at 14:11, Ugo Vasi  wrote:
>  
>  > Hi Andrija,
>  > not having noticed this new voice before I wondered if it is 
limiting
>  > the fact of reaching or exceeding the number of physical cores.
>  >
>  > What is the purpose of this dashboard pane?
>  >
>  >
>  > Il 19/11/18 12:56, Andrija Panic ha scritto:
>  > > Hi Ugo,
>  > >
>  > > Why would you want to do this, just curious ?
>  > >
>  > > I believe it's not possible, but anyway (at least with KVM, 
probably same
>  > > for other hypervisors) it doesn't even makes sense/use, since 
when
>  > > deploying a VM, ACS query host free/unused number of MHz (GHz), 
so it's
>  > not
>  > > even relevant for ACS - number of cores in not relevant in ACS
>  > calculations
>  > > during VM deployment.
>  > >
>  > >
>  > > Cheers,
>  > > Andrija
>  > >
>  > > On Mon, Nov 19, 2018, 11:31 Ugo Vasi   > >
>  > >> Hi all,
>  > >> in the dashboard of an ACS installation vesion 4.11.1.0 (Ubuntu 
16.04
>  > >> with KVM hypervisor), the new entry "# of CPU Cores" appears.
>  > >> Is it possible to over-provision like for MHz or storage?
>  > >>
>  > >> Thanks
>  > >>
>  > >>
>  > >> --
>  > >>
>  > >> *Ugo Vasi* / System Administrator
>  > >> ugo.v...@procne.it <mailto:ugo.v...@procne.it>
>  > >>
>  > >>
>  > >>
>  > >>
>  > >> *Procne S.r.l.*
>  > >> +39 0432 486 523
>  > >> via Cotonificio, 45
>  > >> 33010 Tavagnacco (UD)
>  > >> www.procne.it <http://www.procne.it/>
>  > >>
>  > >>
>  > >> Le informazioni contenute nella presente comunicazione ed i 
relativi
>  > >> allegati possono essere riservate e sono, comunque, destinate
>  > >> esclusivamente alle persone od alla Società sopraindicati. La
>  > >> diffusione, distribuzione e/o copiatura del documento trasmesso 
da parte
>  > >> di qualsiasi soggetto diverso dal destinatario è proibita sia 
ai sensi
>  > >> dell'art. 616 c.p., che ai sensi del Decreto Legislativo n. 
196/2003
>  > >> "Codice in materia di protezione dei dati personali". Se avete 
ricevuto
>  > >> questo messaggio per errore, vi preghiamo di distruggerlo e di 
informare
>  > >> immediatamente Procne S.r.l. scrivendo all' indirizzo e-mail
>  > >> i...@procne.it <mailto:i...@procne.it>.
>  > >>
>  > >>
>  > >
>  > >
>  

Re: number of cores

2018-11-19 Thread Yiping Zhang
Eric:

What's your value for global setting cpu.overprovisioning.factor?

I have this value set to 3.0. Right now, one of my servers with 32 cores @ 2.0 
GHz (with HT enabled), I can allocate a total of 79 vCPU and 139 GHz to 26 VM 
instances.  That's over 200% over provisioning!

Yiping

On 11/19/18, 6:43 AM, "Andrija Panic"  wrote:

Unless someone gives you better answer, I guess it's for fun - to have more
detailed numbers in dashboard (may be it's related to other hypervisor
types, just assuming... or not...)

Cheers

On Mon, 19 Nov 2018 at 14:11, Ugo Vasi  wrote:

> Hi Andrija,
> not having noticed this new voice before I wondered if it is limiting
> the fact of reaching or exceeding the number of physical cores.
>
> What is the purpose of this dashboard pane?
>
>
> Il 19/11/18 12:56, Andrija Panic ha scritto:
> > Hi Ugo,
> >
> > Why would you want to do this, just curious ?
> >
> > I believe it's not possible, but anyway (at least with KVM, probably 
same
> > for other hypervisors) it doesn't even makes sense/use, since when
> > deploying a VM, ACS query host free/unused number of MHz (GHz), so it's
> not
> > even relevant for ACS - number of cores in not relevant in ACS
> calculations
> > during VM deployment.
> >
> >
> > Cheers,
> > Andrija
> >
> > On Mon, Nov 19, 2018, 11:31 Ugo Vasi  >
> >> Hi all,
> >> in the dashboard of an ACS installation vesion 4.11.1.0 (Ubuntu 16.04
> >> with KVM hypervisor), the new entry "# of CPU Cores" appears.
> >> Is it possible to over-provision like for MHz or storage?
> >>
> >> Thanks
> >>
> >>
> >> --
> >>
> >> *Ugo Vasi* / System Administrator
> >> ugo.v...@procne.it 
> >>
> >>
> >>
> >>
> >> *Procne S.r.l.*
> >> +39 0432 486 523
> >> via Cotonificio, 45
> >> 33010 Tavagnacco (UD)
> >> www.procne.it 
> >>
> >>
> >> Le informazioni contenute nella presente comunicazione ed i relativi
> >> allegati possono essere riservate e sono, comunque, destinate
> >> esclusivamente alle persone od alla Società sopraindicati. La
> >> diffusione, distribuzione e/o copiatura del documento trasmesso da 
parte
> >> di qualsiasi soggetto diverso dal destinatario è proibita sia ai sensi
> >> dell'art. 616 c.p., che ai sensi del Decreto Legislativo n. 196/2003
> >> "Codice in materia di protezione dei dati personali". Se avete ricevuto
> >> questo messaggio per errore, vi preghiamo di distruggerlo e di 
informare
> >> immediatamente Procne S.r.l. scrivendo all' indirizzo e-mail
> >> i...@procne.it .
> >>
> >>
> >
> >
> >
>
>
> --
>
> *Ugo Vasi* / System Administrator
> ugo.v...@procne.it 
>
>
>
>
> *Procne S.r.l.*
> +39 0432 486 523
> via Cotonificio, 45
> 33010 Tavagnacco (UD)
> www.procne.it 
>
>
> Le informazioni contenute nella presente comunicazione ed i relativi
> allegati possono essere riservate e sono, comunque, destinate
> esclusivamente alle persone od alla Società sopraindicati. La
> diffusione, distribuzione e/o copiatura del documento trasmesso da parte
> di qualsiasi soggetto diverso dal destinatario è proibita sia ai sensi
> dell'art. 616 c.p., che ai sensi del Decreto Legislativo n. 196/2003
> "Codice in materia di protezione dei dati personali". Se avete ricevuto
> questo messaggio per errore, vi preghiamo di distruggerlo e di informare
> immediatamente Procne S.r.l. scrivendo all' indirizzo e-mail
> i...@procne.it .
>
>

-- 

Andrija Panić




Re: primary storage best practices?

2018-11-16 Thread Yiping Zhang
Hi, Ivan:

I think one or more deployment planner for storage to handle automatic storage 
placement for new images is a good idea (when multiple primary storages are 
available).  But on top of that, letting admins to manually pick storage device 
(to override deployment planner selection) is also a good thing to have, giving 
that it is simply not possible for any deployment planner to handle all 
possible situations out there.

Yiping

On 11/16/18, 10:49 AM, "Ivan Kudryavtsev"  wrote:

Hi, Yiping. This is important feature especially for those, who use local
storage deployments.

But I don't think regular users must be able doing that. Admins may have
that feature, but users must perceipt the cloud as incapsulated service
with hidden topology. What they need is a deployment planner for a storage.

The request itself is useful, but the feature design must fit every kind of
cloud use case, not only yours.


пт, 16 нояб. 2018 г., 13:10 Yiping Zhang yzh...@marketo.com:

> It sounds like we have an enhancement/feature request here: to be able to
> specify primary storage device where the new image to be created on when
> calling deployVirtualMachine API.
>
> Where should I file this request, in Github or the original Apache's
> CloudStack Jira?
>
> Yiping
>
> On 11/15/18, 2:27 PM, "Andrija Panic"  wrote:
>
> I believe (if not mistaken) that CloudStack will match first available
> storage based on storage tags and availability, and will always choose
> first storage pool, even though you have 3 of them available for
> particular
> cluster.
> In this sense, you can not really balance load across multiple Primary
> Storages... (I have actually just tested this, having 2 pools with 
same
> storage tag, and deploying a few volumes - all of them were created on
> first storage available...)
>
> You could configure them with different storage tags, but not sure 
that
> solves your problem - i.e. some Compute/Disk offerings will be
> targeting
> NetApp Cluster1, some NetApp 2, some NetApp3 - but this is 
impractical.
>
> Not sure if someone else can shred some light on this scenario ? (I
> could
> atm imagine a very specific game with editing storage tags on
> storage_pool
> via SQL (scheduled job), in order to "rotate" list of available 
storage
> pools...)
>
> Cheers
>
> On Thu, 15 Nov 2018 at 23:01, Yiping Zhang  wrote:
>
> > Hi, all:
> >
> > At my site, our currently practice is to have only one primary
> storage
> > device for each CloudStack cluster, serving up to 500 disk volumes
> with
> > total of 10 – 20TB disk space.  Now, we are replacing old NetApp
> clusters
> > with new ones and moving to SSD disks,  so I need to recreate all my
> > primary storage devices.
> >
> > I am thinking of configuring three primary storage volumes, each
> served by
> > a different NetApp cluster,  for each CloudStack cluster to divide
> work
> > load on the NetApp end, and to provide some storage redundancy in
> > CloudStack.
> >
> > My question is when creating new VM instances,  how would I
> distribute new
> > disk volumes on to different primary storage devices evenly and
> > automatically?
> >
> > I am wondering how are other users configure their (NFS) primary
> storage
> > devices?  What are your best practices in this area?
> >
> > Thanks
> >
> > Yiping
> >
>
>
> --
>
> Andrija Panić
>
>
>




Re: primary storage best practices?

2018-11-16 Thread Yiping Zhang
It sounds like we have an enhancement/feature request here: to be able to 
specify primary storage device where the new image to be created on when 
calling deployVirtualMachine API.

Where should I file this request, in Github or the original Apache's CloudStack 
Jira?

Yiping

On 11/15/18, 2:27 PM, "Andrija Panic"  wrote:

I believe (if not mistaken) that CloudStack will match first available
storage based on storage tags and availability, and will always choose
first storage pool, even though you have 3 of them available for particular
cluster.
In this sense, you can not really balance load across multiple Primary
Storages... (I have actually just tested this, having 2 pools with same
storage tag, and deploying a few volumes - all of them were created on
first storage available...)

You could configure them with different storage tags, but not sure that
solves your problem - i.e. some Compute/Disk offerings will be targeting
NetApp Cluster1, some NetApp 2, some NetApp3 - but this is impractical.

Not sure if someone else can shred some light on this scenario ? (I could
atm imagine a very specific game with editing storage tags on storage_pool
via SQL (scheduled job), in order to "rotate" list of available storage
pools...)

Cheers

On Thu, 15 Nov 2018 at 23:01, Yiping Zhang  wrote:

> Hi, all:
>
> At my site, our currently practice is to have only one primary storage
> device for each CloudStack cluster, serving up to 500 disk volumes with
> total of 10 – 20TB disk space.  Now, we are replacing old NetApp clusters
> with new ones and moving to SSD disks,  so I need to recreate all my
> primary storage devices.
>
> I am thinking of configuring three primary storage volumes, each served by
> a different NetApp cluster,  for each CloudStack cluster to divide work
> load on the NetApp end, and to provide some storage redundancy in
> CloudStack.
>
> My question is when creating new VM instances,  how would I distribute new
> disk volumes on to different primary storage devices evenly and
> automatically?
>
> I am wondering how are other users configure their (NFS) primary storage
> devices?  What are your best practices in this area?
>
> Thanks
>
> Yiping
>


-- 

Andrija Panić




primary storage best practices?

2018-11-15 Thread Yiping Zhang
Hi, all:

At my site, our currently practice is to have only one primary storage device 
for each CloudStack cluster, serving up to 500 disk volumes with total of 10 – 
20TB disk space.  Now, we are replacing old NetApp clusters with new ones and 
moving to SSD disks,  so I need to recreate all my primary storage devices.

I am thinking of configuring three primary storage volumes, each served by a 
different NetApp cluster,  for each CloudStack cluster to divide work load on 
the NetApp end, and to provide some storage redundancy in CloudStack.

My question is when creating new VM instances,  how would I distribute new disk 
volumes on to different primary storage devices evenly and automatically?

I am wondering how are other users configure their (NFS) primary storage 
devices?  What are your best practices in this area?

Thanks

Yiping


Re: 4.9 to 4.11 upgrade broken

2018-10-04 Thread Yiping Zhang
Gosh,  I just encountered exactly the same problem!

If I understand this thread correctly, the root cause is that the document 
contains wrong url for systemvm template to download, it points to 4.11.0 
version instead of 4.11.1 version of templates for all hypervisors!

If so, why hasn’t anyone fixed the following document yet: 
http://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/4.11.0.0/upgrade/upgrade-4.9.html
 ???

Yiping

On 8/21/18, 11:28 PM, "Asai"  wrote:

Before upgrading the router, can I restart the network and check "Make 
redundant" so that VMs don’t becoming inaccessible during upgrade?  Will this 
work without upgrading first?

Asai


> On Aug 21, 2018, at 11:18 PM, Sergey Levitskiy  
wrote:
> 
> You can either Restart the network with cleanup or simply destroy VR and 
let it be created on the next VM deployment.
> 
> On 8/21/18, 11:13 PM, "Asai"  wrote:
> 
>Thanks, nearly back up and running.  One question, what about the 
Virtual Router upgrade?  What do I do if the upgrade fails on the Virtual 
Router?  Looking for docs on this, but can’t find anything.
> 
>Thanks for your assistance.
> 
>Asai
> 
> 
>> On Aug 21, 2018, at 6:35 PM, Sergey Levitskiy  
wrote:
>> 
>> Yes. this should bring you back.
>> However if you perform what you described in your previous reply + 
rename template in CS DB to systemvm-kvm-4.11.1  from what it is now 
systemvm-kvm-4.11 you should be able to bring all up as it is. Updating 
template image is not enough.
>> 
>> On 8/21/18, 4:04 PM, "Asai"  wrote:
>> 
>>   OK thanks a lot, Sergey,
>> 
>>   That helps.  What’s the best method to roll back?  Just use yum to 
roll back to 4.9 and rebuild the DB from backup?
>> 
>> 
>>> On Aug 21, 2018, at 3:59 PM, Sergey Levitskiy  
wrote:
>>> 
>>> The fastest and easiest way is to rollback both DB and management 
server and start over. You need to have correct systemVM template registered 
before you initiate an upgrade.
>>> 
>>> Thanks,
>>> Sergey
>>> 
>>> 
>>> On 8/21/18, 2:30 PM, "Asai"  wrote:
>>> 
>>>  Is there anybody out there that can assist with this?  
>>> 
>>>  Asai
>>> 
>>> 
 On Aug 21, 2018, at 2:01 PM, Asai  wrote:
 
 Is there any more specific instruction about this?
 
 What is the best practice?  Should I roll back first?  Is there any 
documentation about rolling back?  Do I uninstall cloudstack management and 
re-install 4.9? 
 
 Or is it as simple as just overwriting the file?  If so, what about 
the template.properties file and the metadata in there like qcow2.size?
 
 filename=9cebb971-8605-3493-86f3-f5d1aef1715e.qcow2
 id=225
 qcow2.size=316310016
 public=true
 uniquename=225-2-826a2950-bb8e-34dd-9420-1eb24ea16b4a
 qcow2.virtualsize=2516582400
 virtualsize=2516582400
 checksum=2d8d1e4eacc976814b97f02849481433
 hvm=true
 description=systemvm-kvm-4.11
 qcow2=true
 qcow2.filename=9cebb971-8605-3493-86f3-f5d1aef1715e.qcow2
 size=316310016
 
 
 Asai
 
 
> On Aug 21, 2018, at 1:56 PM, ilya musayev 
 wrote:
> 
> yes - please try the proper 4.11 systemvm templates.
> 
>> On Aug 21, 2018, at 1:54 PM, Asai  wrote:
>> 
>> Can I manually download the systemvm template from here? 
http://download.cloudstack.org/systemvm/4.11/ 

>> 
>> Then manually overwrite it in the filesystem and update it 
accordingly in the database?
>> 
>> Asai
>> 
>> 
>>> On Aug 21, 2018, at 1:40 PM, Asai  
wrote:
>>> 
>>> 4.11.0
>>> 
>>> As outlined in this 
http://docs.cloudstack.apache.org/projects/cloudstack-release-notes/en/4.11.0.0/upgrade/upgrade-4.9.html
 

 On Aug 21, 2018, at 1:37 PM, ilya musayev 
 wrote:
 
 which template did you use? 
 
> On Aug 21, 2018, at 1:36 PM, Asai  
wrote:
> 
> Greetings,
> 
> I just tried to upgrade from 4.9 to 4.11, but it looks like the 
system VM template I downloaded according to the upgrade guide is the wrong 
template.  It’s 4.11, but I upgraded to 4.11.1 and I get this error message:
> 
> Caused by: com.cloud.utils.exception.CloudRuntimeException: 
4.11.1.0KVM SystemVm template not found. Cannot upgrade system Vms
>   at 
com.cloud.upgrade.dao.Upgrade41100to41110.updateSystemVmTemplates(Upgrade41100to41110.java:281)
>   at 

vCPU priority setting for Xen VM

2018-06-08 Thread Yiping Zhang
Hi, all:

I am trying to find out more info about VM’s vCPU priority settings on 
XenServer.

I noticed that my VM instances have various vCPU weight associated with them, 
even for instances using the same service offering. I am wondering how does 
CloudStack set vCPU priority for VM instances?

Thanks,

Yiping


Re: How exactly does CloudStack stop a VM?

2018-06-06 Thread Yiping Zhang
Our VM instances do have xentools installed, though still at 6.2 version, 
whereas our hypervisors have been upgraded to XenServer 6.5 since the VM 
instances were created


On 6/6/18, 4:20 PM, "Jean-Francois Nadeau"  wrote:

If the xentools are installed and running in the guest OS it should detect
the shutdown sent via XAPI.

On Wed, Jun 6, 2018 at 6:58 PM, Yiping Zhang  wrote:

> We are using XenServers with our CloudStack instances.
>
> On 6/6/18, 3:11 PM, "Jean-Francois Nadeau" 
> wrote:
>
> On KVM,  AFAIK the shutdown is the equivalent of pressing the power
> button.  To get the Linux OS to catch this and initiate a clean
> shutdown,
> you need the ACPID service running in the guest OS.
    >
> On Wed, Jun 6, 2018 at 6:01 PM, Yiping Zhang 
> wrote:
>
> > Hi, all:
> >
> > We have a few VM instances which will hang when issue a Stop command
> from
> > CloudStack web UI or thru API calls, due to the app’s own
> startup/stop
> > script in guest OS was not properly invoked.  The app’s startup/stop
> script
> > works properly if we issue shutdown/reboot command in guest OS
> directly.
> >
> > Hence here is my question:  when CloudStack tries to stop a running
> VM
> > instance, what is the exact command it sends to VM to stop it, with
> or
> > without forced flag?  What are the interactions between the
> CloudStack, the
> > hypervisor and the guest VM?
> >
> > Yiping
> >
>
>
>




Re: How exactly does CloudStack stop a VM?

2018-06-06 Thread Yiping Zhang
We are using XenServers with our CloudStack instances.

On 6/6/18, 3:11 PM, "Jean-Francois Nadeau"  wrote:

On KVM,  AFAIK the shutdown is the equivalent of pressing the power
button.  To get the Linux OS to catch this and initiate a clean shutdown,
you need the ACPID service running in the guest OS.

On Wed, Jun 6, 2018 at 6:01 PM, Yiping Zhang  wrote:

> Hi, all:
>
> We have a few VM instances which will hang when issue a Stop command from
> CloudStack web UI or thru API calls, due to the app’s own startup/stop
> script in guest OS was not properly invoked.  The app’s startup/stop 
script
> works properly if we issue shutdown/reboot command in guest OS directly.
>
> Hence here is my question:  when CloudStack tries to stop a running VM
> instance, what is the exact command it sends to VM to stop it, with or
> without forced flag?  What are the interactions between the CloudStack, 
the
> hypervisor and the guest VM?
>
> Yiping
>




How exactly does CloudStack stop a VM?

2018-06-06 Thread Yiping Zhang
Hi, all:

We have a few VM instances which will hang when issue a Stop command from 
CloudStack web UI or thru API calls, due to the app’s own startup/stop script 
in guest OS was not properly invoked.  The app’s startup/stop script works 
properly if we issue shutdown/reboot command in guest OS directly.

Hence here is my question:  when CloudStack tries to stop a running VM 
instance, what is the exact command it sends to VM to stop it, with or without 
forced flag?  What are the interactions between the CloudStack, the hypervisor 
and the guest VM?

Yiping


Re: 4.11 without Host-HA framework

2018-05-23 Thread Yiping Zhang
I can say for fact that VM's using a HA enabled service offering will be 
restarted by CS on another host, assuming there are enough capacity/resources 
in the cluster, when their original host crashes, regardless that host comes 
back or not.

The simplest way to test VM HA feature with a VM instance using HA enabled 
service offering is to issue shutdown command in guest OS, and watching it gets 
restarted by CS manager.

On 5/23/18, 1:23 PM, "Paul Angus"  wrote:

Hi Jon,

Don't worry, TBH I'm dubious about those claiming to have VM-HA working 
when a host crashes (but doesn't restart).
I'll check in with the guys that set values for host-ha when testing, to 
see which ones they change and what they set them to. 

paul.an...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-Original Message-
From: Jon Marshall  
Sent: 23 May 2018 21:10
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Rohit / Paul


Thanks again for answering.


I am a Cisco guy with an ex Unix background but no virtualisation 
experience and I can honestly say I have never felt this stupid before 


I have Cloudstack working but failover is killing me.


When you say VM HA relies on the host telling CS the VM is down how does 
that work because if you crash the host how does it tell CS anything ? And when 
you say tell CS do you mean the CS manager  ?


I guess I am just not understanding all the moving parts. I have had HOST 
HA working (to an extent) although it takes a long time to failover even after 
tweaking the timers but the fact that I keep finding references to people 
saying even without HOST HA it should failover (and mine doesn't) makes me 
think I have configured it incorrectly somewhere along the line.


I have configured a compute offering with HA and I am crashing the host 
with the echo command as suggested but still nothing.


I understand what you are saying Paul about it not being a good idea to 
rely on VM HA so I will go back to Host HA and try to speed up failover times.


Can I ask, from your experiences, what is a realistic fail over time for CS 
ie. if a host fails for example ?


Jon





From: Paul Angus 
Sent: 23 May 2018 19:55
To: users@cloudstack.apache.org
Subject: RE: 4.11 without Host-HA framework

Jon,

As Rohit says, it is very important to understand the difference between VM 
HA and host HA.
VM HA relies on the HOST telling CloudStack that the VM is down on order 
for CloudStack start it again (wherever that ends up being).
Any sequence of events that ends up with VM HA restarting the VM when 
CloudStack can't contact the host is luck/fluke/unreliable/bad(tm)

The purpose of Host HA was to create a reliable mechanism to determine that 
a host has 'crashed' and that the VMs within it are inoperative. Then take 
appropriate action, including ultimately telling VM HA to restart the VM 
elsewhere.





paul.an...@shapeblue.com
www.shapeblue.com
Shapeblue - The CloudStack Company
www.shapeblue.com
ShapeBlue are the largest independent integrator of CloudStack technologies 
globally and are specialists in the design and implementation of IaaS cloud 
infrastructures for both private and public cloud implementations.



53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue




-Original Message-
From: Rohit Yadav 
Sent: 23 May 2018 10:45
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Jon,


In the VM's compute offering, make sure that HA is ticked/enabled. Then use 
that HA-enabled VM offering while deploying a VM. Around testing - it depends 
how you're crashing. In case of KVM, you can try to cause host crash (example: 
echo c > /proc/sysrq-trigger) and see if HA-enabled VMs gets started on a 
different host.


- Rohit






From: Jon Marshall 
Sent: Tuesday, May 22, 2018 8:28:06 PM
To: users@cloudstack.apache.org
Subject: Re: 4.11 without Host-HA framework

Hi Rohit


Thanks for responding.


I have not had much luck with HA at all.  I crash a server and nothing 
happens  in terms of VMs migrating to another host. Monitoring the management 
log file it seems the management server recognises the host has stopped 
responding to pings but doesn't think it has 

can't start guest VM assigned with new empty affinity group

2018-04-27 Thread Yiping Zhang
Hi, list:



My user is creating a bunch of guest VM instances using affinity groups and we 
started to see InsufficientServerCapacityException errors and new VM instances 
won’t start, even though this is for a newly created empty affinity group and 
both my clusters have plenty of CPU/Memory/IP address resources.



This happened after we have created 4 new groups and added 8 instances into 
each of them. Now the 5th group is acting up and the only errors in the log are 
the following stack trace.



INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-197fec7e job-20856) 
Add job-20856 into job monitoring

INFO  [o.a.c.a.c.a.v.StartVMCmdByAdmin] (API-Job-Executor-4:ctx-197fec7e 
job-20856 ctx-1d550d44) 
com.cloud.exception.InsufficientServerCapacityException: Unable to create a 
deployment for VM[User|i-3-393-VM]Scope=interface com.cloud.dc.DataCenter; id=1

INFO  [o.a.c.a.c.a.v.StartVMCmdByAdmin] (API-Job-Executor-4:ctx-197fec7e 
job-20856 ctx-1d550d44) Unable to create a deployment for VM[User|i-3-393-VM], 
Please check the affinity groups provided, there may not be sufficient capacity 
to follow them

com.cloud.exception.InsufficientServerCapacityException: Unable to create a 
deployment for VM[User|i-3-393-VM]Scope=interface com.cloud.dc.DataCenter; id=1

at 
org.apache.cloudstack.engine.cloud.entity.api.VMEntityManagerImpl.reserveVirtualMachine(VMEntityManagerImpl.java:214)

at 
org.apache.cloudstack.engine.cloud.entity.api.VirtualMachineEntityImpl.reserve(VirtualMachineEntityImpl.java:200)

at 
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:4100)

at 
com.cloud.vm.UserVmManagerImpl.startVirtualMachine(UserVmManagerImpl.java:2592)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)

at 
org.apache.cloudstack.network.contrail.management.EventUtils$EventInterceptor.invoke(EventUtils.java:107)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)

at 
com.cloud.event.ActionEventInterceptor.invoke(ActionEventInterceptor.java:51)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)

at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)

at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)

at com.sun.proxy.$Proxy197.startVirtualMachine(Unknown Source)

at 
org.apache.cloudstack.api.command.admin.vm.StartVMCmdByAdmin.execute(StartVMCmdByAdmin.java:51)

at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:150)

at 
com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)

at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)

at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)

at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)

at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

INFO  [o.a.c.f.j.i.AsyncJobMonitor] (API-Job-Executor-4:ctx-197fec7e job-20856) 
Remove job-20856 from job monitoring


Any helps are much appreciated,

Thanks

Yiping


[Solved] Re: PV vs HVM guest on XenServer 7.0

2018-04-12 Thread Yiping Zhang
After more investigation, I found out that  changing template's "OS Type" 
assignment affects how a VM is started as PV or HVM guest on either XenServer 
6.5 or 7.0 clusters:

On XenServer 6.5 clusters,  64 bit RHEL guests with versions up to 6.7 (last 
6.x version listed by ACS 4.9.3.0) are always started as PV guest, and they all 
run properly.
On XenServer 7.0 clusters,  64 bit RHEL guests with versions up to 6.4 are 
started as PV guests and run properly, guests with version 6.5 onwards are 
started as HVM guest, and these won't start.

So, the simple work around is to assign "RHEL 6.4 (64bit)" to all my rhel 6.x 
templates so that guests using these templates are always started as PV guest 
regardless the version of XenServer cluster they are running on.  

IMHO, I think this is a bug in CloudStack, that is all rhel 6.x guests should 
be started as PV guests regardless Xen hypervisor versions (I only verified on 
XenServer 6.5 and 7.0).

Yiping

On 4/11/18, 2:31 PM, "Yiping Zhang" <yzh...@marketo.com> wrote:

Hi, list:


I am in the process of upgrading my hypervisors clusters from XenServer 6.5 
to 7.0 for all my ACS instances.

My XS 7.0 clusters are patched up to XS70E050.

During cluster rolling upgrade, VM instances are live migrated several 
times, eventually all of them running on XS 7.0 hosts.
However, out of a hundred or so instances, a few of rhle 6.x VM got stuck 
at “Plex86/Bochs VGABios …” screen with at least one vCPU at 100%.
Comparing VM parameters between running and stuck instances, I can see that 
running instances are PV guests while stuck instances are HVM guests.
RHEL 7.x guest are always in HVM mode and always runs on both XS 6.5 and 
7.0 hosts.

Running RHEL6.x guests:

   HVM-boot-policy ( RW):

   HVM-boot-params (MRW):

 HVM-shadow-multiplier ( RW): 1.000

   PV-args ( RW): graphical utf8

 PV-bootloader ( RW): pygrub

Stuck RHEL 6.x guests:

   HVM-boot-policy ( RW): BIOS order

   HVM-boot-params (MRW): order: dc

 HVM-shadow-multiplier ( RW): 1.000

   PV-args ( RW):

 PV-bootloader ( RW):

Why does CloudStack convert just a few rhel 6.x instances into HVM mode, 
while leaving most in PV mode?
How would I force them back to PV guests?

Thanks for all the help

Yiping




PV vs HVM guest on XenServer 7.0

2018-04-11 Thread Yiping Zhang
Hi, list:


I am in the process of upgrading my hypervisors clusters from XenServer 6.5 to 
7.0 for all my ACS instances.

My XS 7.0 clusters are patched up to XS70E050.

During cluster rolling upgrade, VM instances are live migrated several times, 
eventually all of them running on XS 7.0 hosts.
However, out of a hundred or so instances, a few of rhle 6.x VM got stuck at 
“Plex86/Bochs VGABios …” screen with at least one vCPU at 100%.
Comparing VM parameters between running and stuck instances, I can see that 
running instances are PV guests while stuck instances are HVM guests.
RHEL 7.x guest are always in HVM mode and always runs on both XS 6.5 and 7.0 
hosts.

Running RHEL6.x guests:

   HVM-boot-policy ( RW):

   HVM-boot-params (MRW):

 HVM-shadow-multiplier ( RW): 1.000

   PV-args ( RW): graphical utf8

 PV-bootloader ( RW): pygrub

Stuck RHEL 6.x guests:

   HVM-boot-policy ( RW): BIOS order

   HVM-boot-params (MRW): order: dc

 HVM-shadow-multiplier ( RW): 1.000

   PV-args ( RW):

 PV-bootloader ( RW):

Why does CloudStack convert just a few rhel 6.x instances into HVM mode, while 
leaving most in PV mode?
How would I force them back to PV guests?

Thanks for all the help

Yiping


Re: upgraded XenServer host stays in Alert state

2018-03-27 Thread Yiping Zhang
Hi, Dag:

Thanks for remind me the backup partition!  Yesterday I was in a bit of panic 
mode when I sent that message because there are over 100 busy VM instances for 
QE are running on this cluster and this week is our release week!  
I have restored the master from this backup partition (Using XenServer 7.0 
installation ISO) and now both Xen pool and ACS are happy.  I'll leave this 
cluster alone till after the release frenzy is over!

But seriously, I still would like to know what went wrong this time.  I have 
built two brand new XenServer 7.0 clusters and upgraded another two clusters 
from 6.5 SP1 to 7.0 so far for three separate ACS instances (all running 
version 4.9.3.0) without encounter this issue before.

In the log file, I saw following WARN message:

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) 
callHostPlugin failed for cmd: setIptables with args  due to The requested 
plugin could not be found.

My questions are:
* Which plugin is requested by setIpTables cmd? What is its name and 
expected full path?
* Is this plugin part of ACS or XenServer distribution?
* Where is it missing from, on Xen host or on management server (this 
is unlikely since this ACS instance is managing six other XS 7.0 hosts already)?

Yiping


On 3/27/18, 12:56 AM, "Dag Sonstebo" <dag.sonst...@shapeblue.com> wrote:

Hi Yiping,

If I remember correctly a full ISO upgrade of a XenServer actually backs up 
the existing version (OS partition), then installs a brand new XS version on 
top before copying settings across from the backup. As a result you are 
effectively looking at more or less a new install – and anything CloudStack 
related may not have been copied across.

So – as Kristian said take a look at the docs and work out which files are 
missing. If this still fails you may need to promote another poolmaster, eject 
the broken host, rebuild and re-add it.

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 27/03/2018, 08:36, "Kristian Liivak" <k...@wavecom.ee> wrote:

Hi

Its really good question. I runned similar issue.
But did you fallow xen upgrade instarations from end of 
http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.11/hypervisor/xenserver.html
  
And in my memory some paths where to copy files are changed and not 
updated in documentation.


Lugupidamisega / Regards
 
Kristian Liivak

WaveCom As
Endla 16, 10142 Tallinn
Estonia
Tel: +3726850001
Gsm: +37256850001
E-mail: k...@wavecom.ee
Skype: kristian.liivak
http://www.wavecom.ee
http://www.facebook.com/wavecom.ee


dag.sonst...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

- Original Message -
From: "Yiping Zhang" <yzh...@marketo.com>
To: "users" <users@cloudstack.apache.org>
Sent: Monday, March 26, 2018 11:47:24 PM
Subject: upgraded XenServer host stays in Alert state

Hi, all:



I am upgrading my ACS clusters from XenServer 6.5 to XenServer 7.0.  I 
am on ACS version 4.9.3.0. On this ACS instance, I have another fully 
functioning XenServer 7.0 cluster already.



This time, after I upgraded the pool master, it remains in “Alert” 
state, while all the slave hosts eventually are in “Up” state. Attempts to 
reconnect the host (via UI or API) or restart management service have no 
effects.



Looking at catalina.out log,  there is an error executing following 
command on the host:  xe sm-list | grep "resigning of duplicates", what exactly 
does this command do and how to fix it?



Note:  I did a manual upgrade of the pool master (from XenServer 7.0 
ISO image), in order to keep the existing partition table and cluster 
configurations, and following are the error logs from catalina.out file:



Yiping







INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) 
XenServer Version is 7.0.0 for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) 
Private Network is mgmt for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) 
Guest Network is mgmt for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) 
Public Network is mgmt for host 10.0.1.18

ERROR [c.c.u.s.SshHelper] (AgentTaskPool-11:ctx-3ef0dede) SSH execution 
of command xe sm-list 

Re: upgraded XenServer host stays in Alert state

2018-03-27 Thread Yiping Zhang
Hi, Kristian:

Thanks for the link.  I have checked it out , but its contents are quite dated, 
even though the link itself implies it is for ACS 4.11.  
In the doc, there is no mentioning if it covers the case of upgrading from 
XenServe 6.5SP1 to 7.0 with ACS version 4.9, so I am somewhat reluctant to 
follow it as is.  Besides, I have upgraded two separate XenServer clusters 
already without any issues by following my current own process so far, 
including one cluster on current ACS instance.

Yiping

On 3/27/18, 12:37 AM, "Kristian Liivak" <k...@wavecom.ee> wrote:

Hi

Its really good question. I runned similar issue.
But did you fallow xen upgrade instarations from end of 
http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/4.11/hypervisor/xenserver.html
  
And in my memory some paths where to copy files are changed and not updated 
in documentation.


Lugupidamisega / Regards
 
Kristian Liivak

WaveCom As
Endla 16, 10142 Tallinn
Estonia
Tel: +3726850001
Gsm: +37256850001
E-mail: k...@wavecom.ee
Skype: kristian.liivak
http://www.wavecom.ee
http://www.facebook.com/wavecom.ee

- Original Message -
    From: "Yiping Zhang" <yzh...@marketo.com>
To: "users" <users@cloudstack.apache.org>
Sent: Monday, March 26, 2018 11:47:24 PM
Subject: upgraded XenServer host stays in Alert state

Hi, all:



I am upgrading my ACS clusters from XenServer 6.5 to XenServer 7.0.  I am 
on ACS version 4.9.3.0. On this ACS instance, I have another fully functioning 
XenServer 7.0 cluster already.



This time, after I upgraded the pool master, it remains in “Alert” state, 
while all the slave hosts eventually are in “Up” state. Attempts to reconnect 
the host (via UI or API) or restart management service have no effects.



Looking at catalina.out log,  there is an error executing following command 
on the host:  xe sm-list | grep "resigning of duplicates", what exactly does 
this command do and how to fix it?



Note:  I did a manual upgrade of the pool master (from XenServer 7.0 ISO 
image), in order to keep the existing partition table and cluster 
configurations, and following are the error logs from catalina.out file:



Yiping







INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) 
XenServer Version is 7.0.0 for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) Private 
Network is mgmt for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) Guest 
Network is mgmt for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) Public 
Network is mgmt for host 10.0.1.18

ERROR [c.c.u.s.SshHelper] (AgentTaskPool-11:ctx-3ef0dede) SSH execution of 
command xe sm-list | grep "resigning of duplicates" has an error

status code in return. Result output:

INFO  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede) Host: 
 connected with hypervisor type: XenServer. Checking CIDR...

INFO  [c.c.a.m.DirectAgentAttache] (AgentTaskPool-11:ctx-3ef0dede) 
StartupAnswer received 71 Interval = 60

WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede) 
defaulting to xenserver650 resource for product brand: XenServer with product 
version: 7.0.0

INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) Host 
10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host 10.0.1.18 is 
already setup.

INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) Host 
10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host 10.0.1.18 is 
already setup.

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) 
callHostPlugin failed for cmd: setIptables with args  due to The requested 
plugin could not be found.

WARN  [c.c.h.x.r.w.x.CitrixSetupCommandWrapper] 
(DirectAgent-219:ctx-c04388fd) Unable to setup

com.cloud.utils.exception.CloudRuntimeException: callHostPlugin failed for 
cmd: setIptables with args  due to The requested plugin could not be found.

at 
com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.callHostPlugin(CitrixResourceBase.java:340)

at 
com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.setIptables(CitrixResourceBase.java:4555)

at 
com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:63)

at 
com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:45)

at 
com.cloud.hypervisor.xenserver.resource.wrappe

upgraded XenServer host stays in Alert state

2018-03-26 Thread Yiping Zhang
Hi, all:



I am upgrading my ACS clusters from XenServer 6.5 to XenServer 7.0.  I am on 
ACS version 4.9.3.0. On this ACS instance, I have another fully functioning 
XenServer 7.0 cluster already.



This time, after I upgraded the pool master, it remains in “Alert” state, while 
all the slave hosts eventually are in “Up” state. Attempts to reconnect the 
host (via UI or API) or restart management service have no effects.



Looking at catalina.out log,  there is an error executing following command on 
the host:  xe sm-list | grep "resigning of duplicates", what exactly does this 
command do and how to fix it?



Note:  I did a manual upgrade of the pool master (from XenServer 7.0 ISO 
image), in order to keep the existing partition table and cluster 
configurations, and following are the error logs from catalina.out file:



Yiping







INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) XenServer 
Version is 7.0.0 for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) Private 
Network is mgmt for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) Guest 
Network is mgmt for host 10.0.1.18

INFO  [c.c.h.x.r.CitrixResourceBase] (AgentTaskPool-4:ctx-7e09325c) Public 
Network is mgmt for host 10.0.1.18

ERROR [c.c.u.s.SshHelper] (AgentTaskPool-11:ctx-3ef0dede) SSH execution of 
command xe sm-list | grep "resigning of duplicates" has an error

status code in return. Result output:

INFO  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede) Host: 
 connected with hypervisor type: XenServer. Checking CIDR...

INFO  [c.c.a.m.DirectAgentAttache] (AgentTaskPool-11:ctx-3ef0dede) 
StartupAnswer received 71 Interval = 60

WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede) 
defaulting to xenserver650 resource for product brand: XenServer with product 
version: 7.0.0

INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) Host 
10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host 10.0.1.18 is 
already setup.

INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) Host 
10.0.1.18 OpaqueRef:3a71d366-1db2-b082-93e0-73a70dd9d409: Host 10.0.1.18 is 
already setup.

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-219:ctx-c04388fd) 
callHostPlugin failed for cmd: setIptables with args  due to The requested 
plugin could not be found.

WARN  [c.c.h.x.r.w.x.CitrixSetupCommandWrapper] (DirectAgent-219:ctx-c04388fd) 
Unable to setup

com.cloud.utils.exception.CloudRuntimeException: callHostPlugin failed for cmd: 
setIptables with args  due to The requested plugin could not be found.

at 
com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.callHostPlugin(CitrixResourceBase.java:340)

at 
com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.setIptables(CitrixResourceBase.java:4555)

at 
com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:63)

at 
com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixSetupCommandWrapper.execute(CitrixSetupCommandWrapper.java:45)

at 
com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixRequestWrapper.execute(CitrixRequestWrapper.java:122)

at 
com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:1693)

at 
com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:315)

at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)

at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-11:ctx-3ef0dede) Unable to 
setup agent 71 due to callHostPlugin failed for cmd: setIptables with args  due 
to The requested plugin could not be found.

INFO  [c.c.u.e.CSExceptionErrorCode] (AgentTaskPool-11:ctx-3ef0dede) Could not 
find exception: 

RE: EOL for supported OSes & Hypervisors

2018-01-16 Thread Yiping Zhang
From end user’s perspective, listing just OS, hypervisor version’s EOL status 
is only half the story.  We need also to consider Apache CloudStack’s own EOL 
status to help users chose which combination of OS/Hypervisor/ACS versions to 
deploy for a stable and fully supported environment.

For example, I am currently on RHEL 6.7 + XenServer 6.5 SP1 + ACS 4.8.0. When 
all things are considered, my only upgrade path is to go with RHEL 6.7 + ACS 
4.9.3 + XenServer 7.0.  But hotfix for Meltdown/Spectre for XenServer 7.0 is 
not available either!   Therefore, I am stuck with my current environment!

BTW, is there any plan to add support for XenServer 7.1 LTSR to a stable ACS 
version?

Yiping

On 1/12/18, 9:24 AM, "Eric Green"  wrote:

Official EOL for Centos 6 / RHEL 6 as declared by Red Hat Software is 
11/30/2020. Jumping the gun a bit there, padme. 

People on Centos 6 should certainly be working on a migration strategy 
right now, but the end is not here *yet*. Furthermore, the install 
documentation is still written for Centos 6 rather than Centos 7. That needs to 
be fixed before discontinuing support for Centos 6, eh?

> On Jan 12, 2018, at 04:35, Rohit Yadav  wrote:
> 
> +1 I've updated the page with upcoming Ubuntu 18.04 LTS.
> 
> 
> After 4.11, I think 4.12 (assuming releases by mid of 2018) should remove 
"declared" (they might still work with 4.12+ but in docs and by project we 
should officially support them) support for following:
> 
> 
> a. Hypervisor:
> 
> XenServer - 6.2, 6.5,
> 
> KVM - CentOS6, RHEL6, Ubuntu12.04 (I think this is already removed, 
packages don't work I think?)
> 
> vSphere/Vmware - 4.x, 5.0, 5.1, 5.5
> 
> 
> b. Remove packaging for CentOS6.x, RHEL 6.x (the el6 packages), and 
Ubuntu 12.04 (any non-systemd debian distro).
> 
> 
> Thoughts, comments?
> 





Re: why instance must be stopped in order to update its affinity groups?

2018-01-12 Thread Yiping Zhang
Well, I am not a Java developer so this task is beyond my ability.   But I am 
more than willing to work with someone to come up with a feature 
description/user story and test it when it becomes available if it ever reaches 
that stage.

Thanks

Yiping

On 1/12/18, 7:10 AM, "Paul Angus" <paul.an...@shapeblue.com> wrote:

It a high level that is quite possible to do.  In practice there would be a 
number of safety nets in place, staged moving of VMs is always a little fraught 
without being able to reserve the resources ahead of time.

Are you volunteering to write it?

Kind regards,

Paul Angus

paul.an...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-Original Message-
    From: Yiping Zhang [mailto:yzh...@marketo.com] 
Sent: 11 January 2018 19:16
To: users@cloudstack.apache.org
Subject: Re: why instance must be stopped in order to update its affinity 
groups?

Paul, Marc:

Thanks for clarifying.

As cloud admin/operator, I do care about the instance’s placement and that 
is why I’d like to apply affinity groups to all instances whenever possible.

It sounds like there is no fundamental technical reasons that a running 
instance’s affinity group membership can’t be updated.  Then why not allow this 
operation?  The logic could be as simple as follows:

If current host placement is compatible with new affinity group’s placement:
then 
   let the update succeed
else
   if auto-migration is true && there is a suitable host to migrate to
   then
  live migrate instance to new host and update instance’s affinity 
group membership
   else
  raise an exception
   end
end

Here “auto-migrate” is controlled by a new global setting parameter, and it 
is for migrating VM to another host in the same cluster. IOW, it does not 
involve storage migration.  If for some technical reasons that live migration 
can’t be done here, then that inner “if ... else ... end” block can be reduced 
to just “raise an exception”.

Is this reasonable?

Yiping

On 1/11/18, 12:19 AM, "Marc-Aurèle Brothier" <ma...@exoscale.ch> wrote:

Hi Yiping,

To add to Paul's comment, you also need to understand the goal of the
anti-affinity groups. If they don't care, you should simply block the
command so that your users don't use it (you should list the
createAffinityGroup command as a root admin call in the 
commands.properties
file by changing it's flag value).
The goal is to spread a group of VMs, a cluster of a service, so that in
case of a hardware failure on one hyperisor, the cluster can be sure 
that
only 1 of its instances will go down and the srvice can keep running.

On Thu, Jan 11, 2018 at 9:01 AM, Paul Angus <paul.an...@shapeblue.com>
wrote:

> Hi Yiping,
>
> Anti-affinity groups deal with the placement of VMs when they are 
started,
> but doesn't/can't 'move' running VMs (it isn't like vSphere DRS).  If 
you
> change a VM's anti-affinity group, it's current placement on a host 
may
> suddenly become invalid.  As the Anti-Affinity group code isn't 
designed to
> move VMs, the safest option is to ensure that the VM is stopped when 
its
> group is changed so that when it is started again, CloudStack can then
> properly decide where it can/should go.
>
>
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Yiping Zhang [mailto:yzh...@marketo.com]
> Sent: 10 January 2018 19:51
> To: users@cloudstack.apache.org
> Subject: why instance must be stopped in order to update its affinity
> groups?
>
> Hi, List:
>
> Can someone please explain why a VM instance must be in stopped state 
when
> updating its affinity group memberships?   This requirement is in 
“Feature
> assumptions” section of the original 4.2 design document (
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/
> FS+-+Affinity-Anti-affinity+groups).
>
> My users either don’t understand or don’t care about affinity groups 
and I
> see a large number of instances with sub-optimal host placement (from
> anti-host affinity group point of view).  But it is too m

newly installed clondmonkey 5.3.3 won't run

2018-01-11 Thread Yiping Zhang
Hi, there:

I just installed the latest cloudmonkey (version 5.3.3) with pip on a RHEL 6.7 
VM, but when I run it, it throws a stack trace:

# pip install cloudmonkey
DEPRECATION: Python 2.6 is no longer supported by the Python core team, please 
upgrade your Python. A future version of pip will drop support for Python 2.6
Collecting cloudmonkey
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318:
 SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
Indication) extension to TLS is not available on this platform. This may cause 
the server to present an incorrect TLS certificate, which can cause validation 
failures. You can upgrade to a newer version of Python to solve this. For more 
information, see 
https://urllib3.readthedocs.io/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122:
 InsecurePlatformWarning: A true SSLContext object is not available. This 
prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
connections to fail. You can upgrade to a newer version of Python to solve 
this. For more information, see 
https://urllib3.readthedocs.io/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Using cached cloudmonkey-5.3.3.tar.gz
Requirement already satisfied: Pygments>=1.5 in 
/usr/lib/python2.6/site-packages (from cloudmonkey)
Requirement already satisfied: argcomplete in /usr/lib/python2.6/site-packages 
(from cloudmonkey)
Requirement already satisfied: dicttoxml in /usr/lib/python2.6/site-packages 
(from cloudmonkey)
Requirement already satisfied: prettytable>=0.6 in 
/usr/lib/python2.6/site-packages (from cloudmonkey)
Requirement already satisfied: requests in /usr/lib/python2.6/site-packages 
(from cloudmonkey)
Requirement already satisfied: requests-toolbelt in 
/usr/lib/python2.6/site-packages (from cloudmonkey)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in 
/usr/lib/python2.6/site-packages (from requests->cloudmonkey)
Requirement already satisfied: certifi>=2017.4.17 in 
/usr/lib/python2.6/site-packages (from requests->cloudmonkey)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in 
/usr/lib/python2.6/site-packages (from requests->cloudmonkey)
Requirement already satisfied: idna<2.7,>=2.5 in 
/usr/lib/python2.6/site-packages (from requests->cloudmonkey)
Installing collected packages: cloudmonkey
  Running setup.py install for cloudmonkey ... done
Successfully installed cloudmonkey-5.3.3
#
#
# cloudmonkey
Traceback (most recent call last):
  File "/usr/bin/cloudmonkey", line 5, in 
from pkg_resources import load_entry_point
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2655, in 

working_set.require(__requires__)
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 648, in require
needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: requests-toolbelt
#

I have verified that requests-toolbelt (0.8.0) is installed.   How do I fix 
this?

Thanks

Yiping



Re: why instance must be stopped in order to update its affinity groups?

2018-01-11 Thread Yiping Zhang
Paul, Marc:

Thanks for clarifying.

As cloud admin/operator, I do care about the instance’s placement and that is 
why I’d like to apply affinity groups to all instances whenever possible.

It sounds like there is no fundamental technical reasons that a running 
instance’s affinity group membership can’t be updated.  Then why not allow this 
operation?  The logic could be as simple as follows:

If current host placement is compatible with new affinity group’s placement:
then 
   let the update succeed
else
   if auto-migration is true && there is a suitable host to migrate to
   then
  live migrate instance to new host and update instance’s affinity group 
membership
   else
  raise an exception
   end
end

Here “auto-migrate” is controlled by a new global setting parameter, and it is 
for migrating VM to another host in the same cluster. IOW, it does not involve 
storage migration.  If for some technical reasons that live migration can’t be 
done here, then that inner “if ... else ... end” block can be reduced to just 
“raise an exception”.

Is this reasonable?

Yiping

On 1/11/18, 12:19 AM, "Marc-Aurèle Brothier" <ma...@exoscale.ch> wrote:

Hi Yiping,

To add to Paul's comment, you also need to understand the goal of the
anti-affinity groups. If they don't care, you should simply block the
command so that your users don't use it (you should list the
createAffinityGroup command as a root admin call in the commands.properties
file by changing it's flag value).
The goal is to spread a group of VMs, a cluster of a service, so that in
case of a hardware failure on one hyperisor, the cluster can be sure that
only 1 of its instances will go down and the srvice can keep running.

On Thu, Jan 11, 2018 at 9:01 AM, Paul Angus <paul.an...@shapeblue.com>
wrote:

> Hi Yiping,
>
> Anti-affinity groups deal with the placement of VMs when they are started,
> but doesn't/can't 'move' running VMs (it isn't like vSphere DRS).  If you
> change a VM's anti-affinity group, it's current placement on a host may
> suddenly become invalid.  As the Anti-Affinity group code isn't designed 
to
> move VMs, the safest option is to ensure that the VM is stopped when its
> group is changed so that when it is started again, CloudStack can then
> properly decide where it can/should go.
>
>
>
> Kind regards,
>
> Paul Angus
>
> paul.an...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>
>
>
>
> -Original Message-
> From: Yiping Zhang [mailto:yzh...@marketo.com]
> Sent: 10 January 2018 19:51
> To: users@cloudstack.apache.org
> Subject: why instance must be stopped in order to update its affinity
> groups?
>
> Hi, List:
>
> Can someone please explain why a VM instance must be in stopped state when
> updating its affinity group memberships?   This requirement is in “Feature
> assumptions” section of the original 4.2 design document (
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/
> FS+-+Affinity-Anti-affinity+groups).
>
> My users either don’t understand or don’t care about affinity groups and I
> see a large number of instances with sub-optimal host placement (from
> anti-host affinity group point of view).  But it is too much trouble for 
me
> to coordinate with so many users to shut them down in order to fix their
> host placement.  What bad things would happen if a running instance’s
> affinity group is changed?
>
> Thanks,
>
> Yiping
>
>




why instance must be stopped in order to update its affinity groups?

2018-01-10 Thread Yiping Zhang
Hi, List:

Can someone please explain why a VM instance must be in stopped state when 
updating its affinity group memberships?   This requirement is in “Feature 
assumptions” section of the original 4.2 design document 
(https://cwiki.apache.org/confluence/display/CLOUDSTACK/FS+-+Affinity-Anti-affinity+groups).

My users either don’t understand or don’t care about affinity groups and I see 
a large number of instances with sub-optimal host placement (from anti-host 
affinity group point of view).  But it is too much trouble for me to coordinate 
with so many users to shut them down in order to fix their host placement.  
What bad things would happen if a running instance’s affinity group is changed?

Thanks,

Yiping



[SOLVED] Re: How to modify Pod start/end IP ?

2017-10-25 Thread Yiping Zhang
Dag,  Nitin:

Thanks very much for your help. With info from both of you, I am able to solved 
the problem.

Based on Dag’s query, without clause “AND 
cloud.op_dc_ip_address_alloc.reservation_id = cloud.nics.reservation_id)”, I 
found a stale IP reservation belonging to the very first SSVM (which was long 
gone!) after the site was first created almost three years ago:

++-+++++
| id | ip_address  | pod_id | nic_id | network_id | name   |
++-+++++
|  7 | 10.0.100.57 |  1 |  6 |200 | s-2-VM |
++-+++++

So, I run following update statement to set all relevant columns to NULL:

mysql> update op_dc_ip_address_alloc set 
nic_id=null,reservation_id=null,taken=null where id=7;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Once this is done, I am able to modify reserved IP ranges on Pod details page 
via web GUI.  Since this is my lab ACS environment and over last three years, 
it went thru multiple upgrades, and was broken numerous times, I am not 
surprised at all that there are some junks remaining in my DB!

Have good day!

Yiping

On 10/25/17, 2:00 AM, "Nitin Kumar Maharana" 
<nitinkumar.mahar...@accelerite.com> wrote:

Hi Yiping,

As Dag said, the table(cloud.op_dc_ip_address_alloc) contains the detail of 
each IP(including the reservation status etc..)
The field (reservation_id and taken) say, if the IP is allocated to anyone. 
If both the values are not null then it is allocated.
Before shrinking we should make sure the IP is not allocated to 
anyone(should be free).

Once we tried shrinking the range using DB hack in our dev environment. But 
I won’t recommend it in a production environment.
Basically you have to reduce the value in (cloud.host_pod_ref) table. For 
e.g, replace the range (10.102.193.170-10.102.193.209) to 
(10.102.193.171-10.102.193.208), It’s just a string. Before doing that we 
should make sure the shrinking IPs are not allocated to anyone.

Next, delete the IP entries from table(cloud.op_dc_ip_address_alloc). 
According to the above e.g, we should delete the IP entries (10.102.193.170 & 
10.102.193.209). (Note: Make sure the reservation_id and taken field is NULL 
for both the IPs)


We have a pending PR(https://github.com/apache/cloudstack/pull/2048), which 
allows a user to perform below operations.
- Add a mutually exclusive range.
- Delete a range if none of the IP is allocated.
- List down the range in GUI. (Home -> Infrastructure -> Zones -> 
 ->  -> Management -> IP Ranges)

I recommend you to try with the above feature. Where you can flexibly add 
and delete the range anytime.


Thanks,
Nitin

  *   Infrastructure
  *   Zones
  *   Nitin
  *   Physical Network 1
  *   Management

On 25-Oct-2017, at 2:01 PM, Dag Sonstebo 
<dag.sonst...@shapeblue.com<mailto:dag.sonst...@shapeblue.com>> wrote:

Hi Yiping,

Which hypervisor do you use? Keep in mind if you use VMware then every VR 
will also get an IP address in this same range – hence you need to have enough 
IP addresses to cover all your networks, VPCs and SSVM+CPVM.

If you want to check in your DB the management IP range is defined in:

- host_pod_ref.description: think this is cosmetic
- cloud.op_dc_ip_address_alloc: row per IP address – and you have to 
reference “nic” and “reservation_id” back to the “nics” table and from there 
reference the “vm_instance” table.
Something like this should work – but double check it please – this was 
just quickly thrown together:

SELECT
   cloud.op_dc_ip_address_alloc.id,
   cloud.op_dc_ip_address_alloc.ip_address,
   cloud.op_dc_ip_address_alloc.pod_id,
   cloud.op_dc_ip_address_alloc.nic_id,
   cloud.nics.network_id,
   cloud.vm_instance.name
FROM
   cloud.op_dc_ip_address_alloc
LEFT JOIN
   cloud.nics ON (cloud.op_dc_ip_address_alloc.nic_id = cloud.nics.id
   AND cloud.op_dc_ip_address_alloc.reservation_id = 
cloud.nics.reservation_id)
LEFT JOIN
cloud.vm_instance on (cloud.nics.instance_id = cloud.vm_instance.id)
WHERE
   cloud.op_dc_ip_address_alloc.reservation_id IS NOT NULL;

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 25/10/2017, 01:28, "Yiping Zhang" 
<yzh...@marketo.com<mailto:yzh...@marketo.com>> wrote:

   Hi, all:

   In ACS web GUI, on the Pod Details page, the fields “Start IP” and “End 
IP” are editable.  This IP range is reserved for systemVM’s (SSVM/CPVM).  When 
I built my ACS environments three years ago, I gave this a large range without 
fully understanding what it is for.  Now I am short of IPs on the management 

How to modify Pod start/end IP ?

2017-10-24 Thread Yiping Zhang
Hi, all:

In ACS web GUI, on the Pod Details page, the fields “Start IP” and “End IP” are 
editable.  This IP range is reserved for systemVM’s (SSVM/CPVM).  When I built 
my ACS environments three years ago, I gave this a large range without fully 
understanding what it is for.  Now I am short of IPs on the management network, 
I would like to reduce this reserved range so that I can use some extra IP for 
other things, like more hypervisor hosts!

When I try to trim some IP off either end, I get an error message: “The 
specified pod has allocated private IP addresses. So its IP address range can 
be extended only”, even though I know for facts that my system VM’s are not 
using any of the IP’s that I am trying to free up.

So, I did the obvious thing:  disabled the zone, and deleted all the SSVM and 
CPVM’s. I even restarted the management service.  However, when I try again to 
edit “Start IP” and/or “End IP” fields on the Pod details page, I still get the 
same error message above, even though now I have zero system VM! This does not 
seem to be correct to me.

I guess, at this point, my only solution is to do direct DB modifications.  Can 
someone please tell me which table contains this information?  If any of you 
have successfully modified pod start/end IP without going direct DB hacks, 
please share your approaches.

Thanks

Yiping


Re: Retiring OLD Primary Storage - Dealing with System VMs and (assumed) Snapshot related artifacts

2017-06-09 Thread Yiping Zhang
Maybe off topic, but the subject of storage migration (both for primary and 
secondary storage devices) comes up once every few months on this list.  I am 
really surprised that there is still NO official documentation on how to do 
this properly, or an SOP for users to follow.  I went through the same recently 
and I had to search old emails threads to come up with a process doing it.

It’s about the time to produce some SOP for this process!

Just my $0.02.

Yiping

On 6/9/17, 2:23 AM, "Daan Hoogland"  wrote:

H David,

I am not licensed to give support on a vendors propriety product, so take 
all of this with a gain of the appropriate crystal;

Ad 1: you are right but migrating to another host should work as well.
Ad 2: There is a table that tells you that a snapshot or image is on a 
certain primary storage but you’ll need to track it there. Easiest is to see if 
the id of the primary storage still occurs in snapshot_store_ref. This as 
example: ‘SELECT * FROM snapshot_store_ref WHERE store_id = ”. You might want to make sure all looks all right in storage_pool, 
storage_pool_host_ref and storage_pool_work as well. And please keep in mind I 
am looking at ACS 4.5 not CP even though the difference in this area should be 
at most trivial.

On 06/06/2017, 19:10, "David Merrill"  
wrote:

We're in the process of phasing out primary storage in one of our zones 
and had some questions about dealing with: 


1. system VMs (Console Proxy & Secondary Storage) 
2. what looks like (to me), left over snapshot artifacts 

We're running CloudPlatform 4.5.1 with XenServer hypervisors and the 
storage to be retired are Dell Equallogic servers. 

For the first item I think all I need to do is mark the primary storage 
on the Dell EQL's into maintenance mode and then destroy them so that they'll 
come up on the new primary storage (already provisioned & configured in 
CloudPlatform). 

It's the second item that's giving me pause, when looking at the SR 
associated with the primary storage I'm seeing disks named like: 


* ABCServer_ROOT-685_20160905000250 

and in CloudPlatform have found an associated volume snapshot with the 
same date data. This seemed odd because I've understood that volume snapshots 
should be on secondary storage. 

I deleted the snapshot in CloudPlatform thinking that perhaps the disk 
above would disappear from view when looking at the SR in XenCenter, but it's 
still there. The impression I'm getting is that perhaps this is left over from 
some volume snapshot failure to clean up? 

How can I best chase this down? I'd like to retire this storage but I'd 
like to make sure that once it's gone there isn't something in the database 
that's still referring items there. 

Thanks, 
David 

 
A Message From... 
David Merrill - Senior Systems Engineer 


"Uptime. All the time." ® www.reliablenetworks.com 

Fifteen Years In Business 2003 - 2017 

477 Congress Street, Suite 812 | Portland, ME 04101 | ( 207) 772-5678 

private/hybrid cloud hosting | Zimbra groupware | managed services 
proactive maintenance and monitoring | technology consulting 

Maine's only managed services and cloud hosting provider with a 
SOC 2 Type II audit covering Security, Availability and Confidentiality 

This email may contain information that is privileged and confidential. 
If you suspect that you were not intended to receive it, please delete 
it and notify us as soon as possible. Thank you. 



daan.hoogl...@shapeblue.com 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 





Re: SSVM NIO SSL Handshake error

2017-05-23 Thread Yiping Zhang
Hi, Jason:

By any chance, did your “full update” happen to also update your java version 
to IBM java?

Yiping

From: Jason Kinsella 
Reply-To: "users@cloudstack.apache.org" 
Date: Tuesday, May 23, 2017 at 7:19 AM
To: "users@cloudstack.apache.org" 
Subject: Re: SSVM NIO SSL Handshake error

Hi Vivek,

Version is centos-release-6-5.el6.centos.11.2.x86_64

Nss* packages before update are

nss-softokn-freebl-3.14.3-23.3.el6_8.x86_64
nss-tools-3.21.3-2.el6_8.x86_64
nss-util-3.21.3-1.el6_8.x86_64
nss-softokn-3.14.3-23.3.el6_8.x86_64
nss-3.21.3-2.el6_8.x86_64
nss-sysinit-3.21.3-2.el6_8.x86_64
nss-softokn-freebl-3.14.3-23.3.el6_8.i686

I’ve just updated all nss* packages, restart of MS, destroy & replace SSVM did 
not help.

I will try a full yum update tomorrow, just in case we missed something last 
time we tried.

Jason

From: Vivek Kumar 
Reply-To: "users@cloudstack.apache.org" 
Date: Tuesday, 23 May 2017 at 11:55 pm
To: "users@cloudstack.apache.org" 
Subject: Re: SSVM NIO SSL Handshake error

Hello Jason,

I also faced this issue  earlier with my centos 6, can you confirm the exact 
version of centos ?, and if possible please update all packages related to nss* 
and then restart the management services.

Vivek Kumar
Virtualization and Cloud Consultant

[cid:image001.jpg@01D2D423.5F7FA370]
IndiQus Technologies Pvt Ltd
A-98, LGF, C.R.Park, New Delhi - 110019
24x7 +91 11 4055 1409 | M +91 7503460090
www.indiqus.com

On 23-May-2017, at 6:35 PM, Jason Kinsella 
> wrote:

Hi Erik,
Yes - the box is a CentOS 6 box. It hadn’t been updated in a while, but we did 
a full update without fixing it. Also, the dev server is exactly the same 
CentOS 6 version and happily running 4.9.2.0
Thanks,
Jason



On 23/5/17, 10:56 pm, "Erik Weber" 
> wrote:

   What's the OS? Think I saw something like this on some old CentOS 6 machines

   --
   Erik

   On Tue, May 23, 2017 at 2:11 PM, Jason Kinsella
   > wrote:


Hi,
We recently upgraded from 4.5.0 to 4.9.2.0 and encountered a problem with the 
SSVM and Console Proxy. They cannot connect to the management server. The SSVM 
cloud.log repeats this error every couple of seconds.

2017-05-23 11:58:22,461 INFO  [utils.nio.NioClient] (main:null) Connecting to 
192.168.12.1:8250
2017-05-23 11:58:22,465 WARN  [utils.nio.Link] (main:null) This SSL engine was 
forced to close inbound due to end of stream.
2017-05-23 11:58:22,465 ERROR [utils.nio.Link] (main:null) Failed to send 
server's CLOSE message due to socket channel's failure.
2017-05-23 11:58:22,466 ERROR [utils.nio.NioClient] (main:null) SSL Handshake 
failed while connecting to host: 192.168.12.1 port: 8250
2017-05-23 11:58:22,466 ERROR [utils.nio.NioConnection] (main:null) Unable to 
initialize the threads.
java.io.IOException: SSL Handshake failed while connecting to host: 
192.168.12.1 port: 8250
   at com.cloud.utils.nio.NioClient.init(NioClient.java:67)
   at com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)
   at com.cloud.agent.Agent.start(Agent.java:237)
   at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)
   at 
com.cloud.agent.AgentShell.launchAgentFromClassInfo(AgentShell.java:367)
   at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:351)
   at com.cloud.agent.AgentShell.start(AgentShell.java:456)
   at com.cloud.agent.AgentShell.main(AgentShell.java:491)
2017-05-23 11:58:22,468 INFO  [utils.exception.CSExceptionErrorCode] 
(main:null) Could not find exception: 
com.cloud.utils.exception.NioConnectionException in error code list for 
exceptions
2017-05-23 11:58:22,468 WARN  [cloud.agent.Agent] (main:null) NIO Connection 
Exception  com.cloud.utils.exception.NioConnectionException: SSL Handshake 
failed while connecting to host: 192.168.12.1 port: 8250

The setup is very simple. Single management server and ports are open.

Things checked / tried:

· Destroyed SSVM multiple times – still same problem.

· SSH to SSVM from MS using ssh -i 
/var/cloudstack/management/.ssh/id_rsa -p 3922 root@IPADDRESS – PASS

· SSVM telnet on 8250 to MS – PASS

I’ve also tested a restore of the DB into our working development 4.9.2.0 
server. It also exhibits the handshake errors, so most likely DB related.

I’ve used up all my skills. Please help

Regards,
Jason




Re: Unable to start a VM due to insufficient capacity

2017-04-24 Thread Yiping Zhang
When you say there is enough resources on hosts, exactly what’s memory and cpu 
allocation on each of the hosts?  Unless you have updated following global 
settings, the default settings are to disable VM allocation if cluster’s 
overall CPU or memory allocation is over 75%, and you will get insufficient 
resources error when you creating new instances.

cluster.cpu.allocated.capacity.disablethreshold = 0.75
cluster.memory.allocated.capacity.disablethreshold = 0.75

There are other situations where you could also receive this error, such as if 
you specified wrong network or the tagged resources are not available (storage, 
host, etc, or wrong tag was specified) and so on.  This error message is pretty 
much useless as it is given, not to cloud operators or administrators anyway.

Yiping

On 4/21/17, 4:21 PM, "Marty Godsey"  wrote:

Without looking too deep into this it seems that there maybe a stuck task 
trying to snapshot the VHD.

result:
errorInfo: [OTHER_OPERATION_IN_PROGRESS, VDI, 
OpaqueRef:1d681250-d950-81ec-9f87-f7eb3d8b0cc1, snapshot]
  otherConfig: {}

Again, based on what you sent, I would look at any hung xapi tasks on the 
host in question.

Regards,
Marty Godsey
nSource Solutions

-Original Message-
From: J Andersons 
Sent: Friday, April 21, 2017 2:20 AM
To: users@cloudstack.apache.org
Subject: Unable to start a VM due to insufficient capacity

Hi!
Just like that it is impossible to start any vm. ACS4.8, Xenservers 6.5 
Unable to start a VM due to insufficient capacity. I can see in xencenter it is 
trying to start VM on all hosts but fails. Also there is enough resources on 
hosts.
Maybe its an XenServer issue, because it is also unable to stop VMS.

In log when starting VM:
2017-04-21 08:50:35,806 WARN  [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-28:ctx-6bbd1566) (logid:91c36627) Task failed! Task 
record: uuid: a9ac4c12-a6ba-6d9a-556f-b5d88f8cd2d3
nameLabel: Async.VM.start_on
  nameDescription:
allowedOperations: []
currentOperations: {}
  created: Fri Apr 21 08:48:34 EEST 2017
 finished: Fri Apr 21 08:50:35 EEST 2017
   status: failure
   residentOn: com.xensource.xenapi.Host@c07df7eb
 progress: 1.0
 type: 
   result:
errorInfo: [OTHER_OPERATION_IN_PROGRESS, VDI, 
OpaqueRef:1d681250-d950-81ec-9f87-f7eb3d8b0cc1, snapshot]
  otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
 subtasks: []

2017-04-21 08:50:35,818 WARN  [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-28:ctx-6bbd1566) (logid:91c36627) Unable to start
VM(i-95-292-VM) on host(a6601054-b730-460a-b654-a29f1a5b8a1c) due to 
Task failed! Task record: uuid: 
a9ac4c12-a6ba-6d9a-556f-b5d88f8cd2d3
nameLabel: Async.VM.start_on
  nameDescription:
allowedOperations: []
currentOperations: {}
  created: Fri Apr 21 08:48:34 EEST 2017
 finished: Fri Apr 21 08:50:35 EEST 2017
   status: failure
   residentOn: com.xensource.xenapi.Host@c07df7eb
 progress: 1.0
 type: 
   result:
errorInfo: [OTHER_OPERATION_IN_PROGRESS, VDI, 
OpaqueRef:1d681250-d950-81ec-9f87-f7eb3d8b0cc1, snapshot]
  otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
 subtasks: []

Task failed! Task record: uuid: 
a9ac4c12-a6ba-6d9a-556f-b5d88f8cd2d3
nameLabel: Async.VM.start_on
  nameDescription:
allowedOperations: []
currentOperations: {}
  created: Fri Apr 21 08:48:34 EEST 2017
 finished: Fri Apr 21 08:50:35 EEST 2017
   status: failure
   residentOn: com.xensource.xenapi.Host@c07df7eb
 progress: 1.0
 type: 
   result:
errorInfo: [OTHER_OPERATION_IN_PROGRESS, VDI, 
OpaqueRef:1d681250-d950-81ec-9f87-f7eb3d8b0cc1, snapshot]
  otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
 subtasks: []

 at

com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.checkForSuccess(CitrixResourceBase.java:461)
 at

com.cloud.hypervisor.xenserver.resource.CitrixResourceBase.startVM(CitrixResourceBase.java:4807)
 at

com.cloud.hypervisor.xenserver.resource.wrapper.xenbase.CitrixStartCommandWrapper.execute(CitrixStartCommandWrapper.java:126)
 at


Re: vm.network.throttling.rate changed from 200 to 1000 but VMs are still limited to 200

2017-03-06 Thread Yiping Zhang
After this change, you need to stop and start your VM from GUI or via api calls.


On 3/6/17, 1:46 PM, "Nezar Madbouh"  wrote:

Hi,

I changed vm.network.throttling.rate from 200 to 1000 but when I run iperf
between 2 VMs the maximum speed I get is still 200.

Can you please advise why this didn't take effect.

Thanks & Regards,
Nezar.




Question on host metrics (ACS version 4.8.0)

2017-01-26 Thread Yiping Zhang
Hi,

I am looking at “Home > Infrastructure > Clusters > Cluster Metrics > Hosts 
Metrics >” page,   and I am confused by the values in the column for used CPU.

My hosts (running XenServer 6.5 SP1) have 32 cores with total of 83.10 GHz per 
host. On average, the CPU allocation is >100% for all hosts (I enabled CPU 
overprovisioning). However, at any time I checked, the reported CPU usage on 
Host Metrics page has always been <0.5 GHz for each host.  But in XenCenter, on 
each hypervisor’s Performance tab, all my hosts’ CPU usage are between 20-30% 
on average for all 32 cpus for allhosts, with a few hosts go as high as to 60% 
on average.

Therefore, I would expect on ACS’s host metrics page, each host’s cpu usage be 
around 15-24 GHz, rather than <0.5 GHz, that’s a difference of 30-40 folds in 
values.

Did I read it completely wrong somehow?  Does the used cpu values got averaged 
across some long time period when reported in ACS?

Yiping




Xen hypervisor question

2017-01-04 Thread Yiping Zhang
Hi,

This is a Xen Server question, but since it is part of my ACS setup, I hope Xen 
Server experts on this list can provide some help.

Every month or so, one of my Xen Server pool (6.5 SP1 with most of patches are 
installed) with ten hypervisor nodes will go crazy: in CS, only the pool master 
stays in UP state, all slaves are in either Alert or Connecting state; and CS 
can’t perform any VM operations if that VM in running on one of slaves. On 
hypervisor CLI, xe commands are extremely slow on slaves, often they would just 
fail, but on pool master, xe commands behaves normally.  It seems that the pool 
slaves just can’t communicate with the master properly.

I have managed to recover the pool each time by switching pool master to 
another hypervisor (often this step proceeds with great difficulty due to poor 
communication between the master and slaves) and followed by running 
xe-toolstack-restart command on all pool members.

What is the root cause of this condition? How could I avoid getting into such 
situation in the first place?

Thanks

Yiping



Re: AW: API migrateVirtualMachine does not respect affinity group assignment

2016-11-11 Thread Yiping Zhang

I just filed CLOUDSTACK-9596


On 11/9/16, 1:20 AM, "S. Brüseke - proIO GmbH" <s.brues...@proio.com> wrote:

We run into this "problem" too. Here are my 2 cents:

API call should respect affinity groups, but it should be able for 
administrator to force a migration (force=true).
As an administrator you cannot control (or have in mind) all affinity 
groups when you need to evacuate a host. At the moment you run into the 
situation that you migrate a vm to an host where another vm of the same 
affinity group is running. When you stop this vm you are unable to start it 
because then the affinity group kicks in.

Mit freundlichen Grüßen / With kind regards,

Swen


-Ursprüngliche Nachricht-
Von: Marc-Aurèle Brothier [mailto:ma...@exoscale.ch] 
Gesendet: Mittwoch, 9. November 2016 08:41
An: users@cloudstack.apache.org
Betreff: Re: API migrateVirtualMachine does not respect affinity group 
assignment

IMHO it's something desirable, because in case of emergency, it's better to 
migrate a VM to a host that does not follow the anti affinity group, rather 
than leaving the VM on a host that must be shutdown for example and loosing the 
VM. It's up to the admin to make this transgression during the shortest amount 
of time.
Those migration API calls are always done by an admin, and therefore should 
take care of such case, which is not very complicated. I have a python script 
that does the job (
https://gist.github.com/marcaurele/dc1774b1ea13d81be702faf235bf2afe) for 
live migration for example.

On Wed, Nov 9, 2016 at 2:47 AM, Simon Weller <swel...@ena.com> wrote:

> Can you open a jira issue on this?
>
> Simon Weller/ENA
> (615) 312-6068
>
> -Original Message-
> From: Yiping Zhang [yzh...@marketo.com]
> Received: Tuesday, 08 Nov 2016, 8:03PM
> To: users@cloudstack.apache.org [users@cloudstack.apache.org]
> Subject: API migrateVirtualMachine does not respect affinity group 
> assignment
>
> Hi,
>
> It seems that the API migrateVirtualMachine does not respect 
> instance’s affinity group assignment.  Is this intentional?
>
> To reproduce:
>
> Assigning two VM instances running on different hosts, say v1 running 
> on
> h1 and v2 running on h2, to the same affinity group.  In GUI, it won’t 
> let you migrate v1 and v2 to the same host, but if you use 
> cloudmonkey,  you are able to move both instances to h1 or h2 with 
> migrateVirtualMachine API call.
>
> IMHO, the API call should return with an error message that the 
> migration is prohibited by affinity group assignment. However, if the 
> current behavior is desirable in some situations, then a parameter 
> like ignore-affinity-group=true should be passed to the API call (or 
> vice versa, depending on which behavior is chosen as the default)
>
> Yiping
>


- proIO GmbH -
Geschäftsführer: Swen Brüseke
Sitz der Gesellschaft: Frankfurt am Main

USt-IdNr. DE 267 075 918
Registergericht: Frankfurt am Main - HRB 86239

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte 
Informationen. 
Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich 
erhalten haben, 
informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail sind 
nicht gestattet. 

This e-mail may contain confidential and/or privileged information. 
If you are not the intended recipient (or have received this e-mail in 
error) please notify 
the sender immediately and destroy this e-mail.  
Any unauthorized copying, disclosure or distribution of the material in 
this e-mail is strictly forbidden. 






API migrateVirtualMachine does not respect affinity group assignment

2016-11-08 Thread Yiping Zhang
Hi,

It seems that the API migrateVirtualMachine does not respect instance’s 
affinity group assignment.  Is this intentional?

To reproduce:

Assigning two VM instances running on different hosts, say v1 running on h1 and 
v2 running on h2, to the same affinity group.  In GUI, it won’t let you migrate 
v1 and v2 to the same host, but if you use cloudmonkey,  you are able to move 
both instances to h1 or h2 with migrateVirtualMachine API call.

IMHO, the API call should return with an error message that the migration is 
prohibited by affinity group assignment. However, if the current behavior is 
desirable in some situations, then a parameter like ignore-affinity-group=true 
should be passed to the API call (or vice versa, depending on which behavior is 
chosen as the default)

Yiping


updating network throttling rate

2016-10-25 Thread Yiping Zhang
Hi,

I have updated global settings for network.throttling.rate and 
vm.network.throttling.rate from 200 to 500 Mbps, and restarted mgmt. service.  
Now I restarted two of my test VM instances, but using iperf3 I still see these 
two test VM instances got 200 Mbps network bandwidth.  What else do I need to 
do to increase their network bandwidth?

I know I could create a new ServiceOffering with higher network rate and apply 
it to my instances. But I just want all my instance to use updated network rate 
without creating duplicate ServiceOfferings for all my existing SO’s.

Thanks

Yiping


Re: CloudStack Anti-Affinity

2016-10-06 Thread Yiping Zhang
Starting a VM instance will end in error if there is no host suitable for it.  
We normally limit number of VM in an affinity group to be one less than the 
number of hosts in the cluster. 

Let’s say I have 5 instances for my app, and I have 5 hosts in my cluster.  I 
will only put 4 instances in an affinity group, and let the 5th instance float. 
 That way, if one of my hypervisor host goes down, all 5 instances will still 
be able to run on the remaining 4 hosts (assuming all my instances are using a 
service offering with HA enabled).

On 10/4/16, 3:43 PM, "ilya"  wrote:

I'm curious if anyone uses Anti-Affinity and how restrictive is it?
Meaning what happens if there are no nodes to satisfy the anti-affinity
rule? Would it still place it or deny and throw 530?

Any help is appreciated

Thanks,
ilya




Re: Adding Hosts to XenCenter Pool

2016-09-08 Thread Yiping Zhang

On the switch side, you need to configure the port channel to LACP passive 
mode.  On the Xen Server side, you need to take out all bond configurations, 
and assign mgmt. IP directly to NIC4.  Once xen4 joins the pool, its mgmt. IP 
will now be assigned to Bond4+5, and you can manually assign IPs to other newly 
created bonds on Xen4.

Yiping


On 9/8/16, 11:08 AM, "Jeremy Peterson" <jpeter...@acentek.net> wrote:

Xen4 is before joining the pool.

My question is how will I have mgmt?  I have a port channel setup for the 
bonded interfaces if I break the bond my port channel won't come up.

Jeremy


-Original Message-
    From: Yiping Zhang [mailto:yzh...@marketo.com] 
Sent: Thursday, September 8, 2016 12:56 PM
To: users@cloudstack.apache.org
Subject: Re: Adding Hosts to XenCenter Pool

Is the xen4 config shown that of before or after xen4 has joint the pool?  
If it is after, then I don’t see any problem, but if it is before, then you 
need to wipe out its bond settings first.

As I said, the incoming server, i.e. the server joining the pool and in 
this case the Xen4, should NOT have any bond / NIC settings, except the mgmt IP 
address on NIC4.

Yiping

On 9/8/16, 10:23 AM, "Jeremy Peterson" <jpeter...@acentek.net> wrote:

Ok so I created a new pool.  Added Xen3

Then tried to add Xen4 once that works I'll add Xen5 Xen6 then add to 
cloudstack.

http://screencast.com/t/LNPu9XchUU 

You can see Xen3's config and Xen4

Xen3 is pool master.  I need to add Xen4 and more still same error.

Jeremy


-Original Message-
From: Yiping Zhang [mailto:yzh...@marketo.com] 
Sent: Thursday, September 8, 2016 11:48 AM
To: users@cloudstack.apache.org
Subject: Re: Adding Hosts to XenCenter Pool

The problem is that for Xen1 and Xen2, the mgmt. interface is NIC4, 
while for Xen3, the mgmt. interface is Bond4+5.  According to docs, incoming 
server should not have any bond configurations, it will automatically get those 
from the pool once it becomes a pool member.

What you need to do is 1) wipe out all bond assignments on Xen3; 2) 
assign mgmt. IP to NIC4,  3) join Xen3 into the pool.  You can assign any 
storage IPs for Xen3 after it has joint the pool.

Yiping

PS:  if this is a brand new pool,  I would recommend that you rebuild 
the pool so that the mgmt. NIC is also a bond.

On 9/8/16, 8:43 AM, "Jeremy Peterson" <jpeter...@acentek.net> wrote:

Ok so do I have to change my existing pool hosts to have a bond? Is 
that the issue ? 

Jeremy


-Original Message-
From: Jeremy Peterson [mailto:jpeter...@acentek.net] 
Sent: Thursday, September 8, 2016 10:41 AM
To: users@cloudstack.apache.org
Subject: RE: Adding Hosts to XenCenter Pool

I changed my "name" of the Box 4+5 to MGMT to match what my pool 
has and same thing error.

Jeremy


-Original Message-
From: Jeremy Peterson [mailto:jpeter...@acentek.net]
Sent: Thursday, September 8, 2016 10:03 AM
To: users@cloudstack.apache.org
Subject: RE: Adding Hosts to XenCenter Pool

Super XenServer related but G!
http://screencast.com/t/2qpuXJaR9rhU

Xen3 is the server I want to add to the pool of xen1 and xen2

I see my Management is setup on all three IP's are reachable since 
my xencenter server can see everything for both groups of servers.  

Only difference is Management is called MGMT as network label on my 
current pool and Bond 4+5 on the new hosts I want to add.

http://screencast.com/t/sBiNU9XAtr

That is the error I get when I try to join the pool.

The server joining the pool must have a physical management NIC  
(i.e. the management NIC must not be on a VLAN or bonded PIF)


Jeremy


-Original Message-
From: Stephan Seitz [mailto:s.se...@secretresearchfacility.com]
Sent: Thursday, September 8, 2016 5:17 AM
To: users@cloudstack.apache.org
Subject: Re: Adding Hosts to XenCenter Pool

Am Mittwoch, den 07.09.2016, 22:57 + schrieb Jeremy Peterson:
> So I am running XenCenter and I am trying to create a second pool 
to 
> add hosts to but when I join the pool the erro

Re: Adding Hosts to XenCenter Pool

2016-09-08 Thread Yiping Zhang
Is the xen4 config shown that of before or after xen4 has joint the pool?  If 
it is after, then I don’t see any problem, but if it is before, then you need 
to wipe out its bond settings first.

As I said, the incoming server, i.e. the server joining the pool and in this 
case the Xen4, should NOT have any bond / NIC settings, except the mgmt IP 
address on NIC4.

Yiping

On 9/8/16, 10:23 AM, "Jeremy Peterson" <jpeter...@acentek.net> wrote:

Ok so I created a new pool.  Added Xen3

Then tried to add Xen4 once that works I'll add Xen5 Xen6 then add to 
cloudstack.

http://screencast.com/t/LNPu9XchUU 

You can see Xen3's config and Xen4

Xen3 is pool master.  I need to add Xen4 and more still same error.

Jeremy


-Original Message-
    From: Yiping Zhang [mailto:yzh...@marketo.com] 
Sent: Thursday, September 8, 2016 11:48 AM
To: users@cloudstack.apache.org
Subject: Re: Adding Hosts to XenCenter Pool

The problem is that for Xen1 and Xen2, the mgmt. interface is NIC4, while 
for Xen3, the mgmt. interface is Bond4+5.  According to docs, incoming server 
should not have any bond configurations, it will automatically get those from 
the pool once it becomes a pool member.

What you need to do is 1) wipe out all bond assignments on Xen3; 2) assign 
mgmt. IP to NIC4,  3) join Xen3 into the pool.  You can assign any storage IPs 
for Xen3 after it has joint the pool.

Yiping

PS:  if this is a brand new pool,  I would recommend that you rebuild the 
pool so that the mgmt. NIC is also a bond.

On 9/8/16, 8:43 AM, "Jeremy Peterson" <jpeter...@acentek.net> wrote:

Ok so do I have to change my existing pool hosts to have a bond? Is 
that the issue ? 

Jeremy


-Original Message-
From: Jeremy Peterson [mailto:jpeter...@acentek.net] 
Sent: Thursday, September 8, 2016 10:41 AM
To: users@cloudstack.apache.org
Subject: RE: Adding Hosts to XenCenter Pool

I changed my "name" of the Box 4+5 to MGMT to match what my pool has 
and same thing error.

Jeremy


-Original Message-
From: Jeremy Peterson [mailto:jpeter...@acentek.net]
Sent: Thursday, September 8, 2016 10:03 AM
To: users@cloudstack.apache.org
Subject: RE: Adding Hosts to XenCenter Pool

Super XenServer related but G!
http://screencast.com/t/2qpuXJaR9rhU

Xen3 is the server I want to add to the pool of xen1 and xen2

I see my Management is setup on all three IP's are reachable since my 
xencenter server can see everything for both groups of servers.  

Only difference is Management is called MGMT as network label on my 
current pool and Bond 4+5 on the new hosts I want to add.

http://screencast.com/t/sBiNU9XAtr

That is the error I get when I try to join the pool.

The server joining the pool must have a physical management NIC  (i.e. 
the management NIC must not be on a VLAN or bonded PIF)


Jeremy


-Original Message-
From: Stephan Seitz [mailto:s.se...@secretresearchfacility.com]
Sent: Thursday, September 8, 2016 5:17 AM
To: users@cloudstack.apache.org
Subject: Re: Adding Hosts to XenCenter Pool

Am Mittwoch, den 07.09.2016, 22:57 + schrieb Jeremy Peterson:
> So I am running XenCenter and I am trying to create a second pool to 
> add hosts to but when I join the pool the errors are coming up
> 
> The server joining the pool must have a physical management NIC  
(i.e. 
> the management NIC must not be on a VLAN or bonded PIF)

Indeed, this is XenServer-specific.
Start with one host meant to be the initial pool-master. Say: configure 
these trunks and networks on just this single host. Don't forget to label these 
networks to meet the respective ACS labels.
If you'ld like add this host to ACS. Can be done now or later.
Additional hosts meant as additional pool-members shouldn't be 
configured. Just do a simple installation and define the hosts management IP 
*on one of the meant-to-be MGMT trunk-ports*.
Join the addtitional host to the pool-master and you're done. The pool- 
configuration will be populated to all new pool-members. I'ld recommend to use 
the identical NICs on every host - that's way easier.


> I have 6 nics
> 
> 2 on board 1GB NIC ( LACP BOND with port channel to two different 
> nexus 5k) MGMT
> 2 10GB NIC ( LACP BOND with port channel to two different nexus 5k) 
> Primary storage
 

Re: Adding Hosts to XenCenter Pool

2016-09-08 Thread Yiping Zhang
The problem is that for Xen1 and Xen2, the mgmt. interface is NIC4, while for 
Xen3, the mgmt. interface is Bond4+5.  According to docs, incoming server 
should not have any bond configurations, it will automatically get those from 
the pool once it becomes a pool member.

What you need to do is 1) wipe out all bond assignments on Xen3; 2) assign 
mgmt. IP to NIC4,  3) join Xen3 into the pool.  You can assign any storage IPs 
for Xen3 after it has joint the pool.

Yiping

PS:  if this is a brand new pool,  I would recommend that you rebuild the pool 
so that the mgmt. NIC is also a bond.

On 9/8/16, 8:43 AM, "Jeremy Peterson"  wrote:

Ok so do I have to change my existing pool hosts to have a bond? Is that 
the issue ? 

Jeremy


-Original Message-
From: Jeremy Peterson [mailto:jpeter...@acentek.net] 
Sent: Thursday, September 8, 2016 10:41 AM
To: users@cloudstack.apache.org
Subject: RE: Adding Hosts to XenCenter Pool

I changed my "name" of the Box 4+5 to MGMT to match what my pool has and 
same thing error.

Jeremy


-Original Message-
From: Jeremy Peterson [mailto:jpeter...@acentek.net]
Sent: Thursday, September 8, 2016 10:03 AM
To: users@cloudstack.apache.org
Subject: RE: Adding Hosts to XenCenter Pool

Super XenServer related but G!
http://screencast.com/t/2qpuXJaR9rhU

Xen3 is the server I want to add to the pool of xen1 and xen2

I see my Management is setup on all three IP's are reachable since my 
xencenter server can see everything for both groups of servers.  

Only difference is Management is called MGMT as network label on my current 
pool and Bond 4+5 on the new hosts I want to add.

http://screencast.com/t/sBiNU9XAtr

That is the error I get when I try to join the pool.

The server joining the pool must have a physical management NIC  (i.e. the 
management NIC must not be on a VLAN or bonded PIF)


Jeremy


-Original Message-
From: Stephan Seitz [mailto:s.se...@secretresearchfacility.com]
Sent: Thursday, September 8, 2016 5:17 AM
To: users@cloudstack.apache.org
Subject: Re: Adding Hosts to XenCenter Pool

Am Mittwoch, den 07.09.2016, 22:57 + schrieb Jeremy Peterson:
> So I am running XenCenter and I am trying to create a second pool to 
> add hosts to but when I join the pool the errors are coming up
> 
> The server joining the pool must have a physical management NIC  (i.e. 
> the management NIC must not be on a VLAN or bonded PIF)

Indeed, this is XenServer-specific.
Start with one host meant to be the initial pool-master. Say: configure 
these trunks and networks on just this single host. Don't forget to label these 
networks to meet the respective ACS labels.
If you'ld like add this host to ACS. Can be done now or later.
Additional hosts meant as additional pool-members shouldn't be configured. 
Just do a simple installation and define the hosts management IP *on one of the 
meant-to-be MGMT trunk-ports*.
Join the addtitional host to the pool-master and you're done. The pool- 
configuration will be populated to all new pool-members. I'ld recommend to use 
the identical NICs on every host - that's way easier.


> I have 6 nics
> 
> 2 on board 1GB NIC ( LACP BOND with port channel to two different 
> nexus 5k) MGMT
> 2 10GB NIC ( LACP BOND with port channel to two different nexus 5k) 
> Primary storage
> 2 10GB NIC ( LACP BOND with port channel to two different nexus 5k) 
> Sec Storage & Guest & Public traffic
> 
> Everything looks good outside of a xenserver pool the minute I want to 
> add to a pool I get the above error.
> 
> I am on CS 4.5.0 XS 6.5
> 
> Maybe I should ask this to XenServer but I use CloudStack and I seen 
> someone talk about XenServer MGMT LACP a couple weeks ago but I don't 
> think I seen a good answer on if it works or not.
> 
> I'm confused why it doesn't work.






Re: All Xen hosts except poolmaster stuck in alerting/connecting state

2016-09-08 Thread Yiping Zhang
After we switched pool master to another host with command “xe 
pool-designate-new-master host-uuid=”, the CloudStack mgmt. server 
eventually connected to all hosts successfully with all hosts in Running state.

So I can only speculate that there is something wrong with the Xen Cluster 
itself, but I have no direct evidence to support this claim from Xen server 
logs.  Once the master is switched over, I restarted xen toolstack on the old 
master node just to be safe.

Yiping

On 9/7/16, 7:15 PM, "Ezequiel Mc Govern" <ezequiel.mcgov...@gmail.com> wrote:

We have the same problem,one Pool get in connecting or Alert state.

The configuration is the same.

> On Sep 7, 2016, at 17:50, Yiping Zhang <yzh...@marketo.com> wrote:
> 
> Hi,
> 
> In one of our xen clusters, all host except the pool master are stuck in 
alerting or connecting state.  There are no scary log entries in catalina.out 
or management-server.log.
> 
> I have tried to restart management service already, but it has no effect. 
What other steps can I take?   I am thinking of restarting xen toolstack on the 
pool master (already tried on one of slave host, no effect), is this a safe 
operation to do with tons of VM running on all hosts and the pool master?
> 
> Yiping
> 
> PS: we are running CS 4.8.0 on RHLE 6.7 and Xen 6.5 with all recommended 
security patches installed.





All Xen hosts except poolmaster stuck in alerting/connecting state

2016-09-07 Thread Yiping Zhang
Hi,

In one of our xen clusters, all host except the pool master are stuck in 
alerting or connecting state.  There are no scary log entries in catalina.out 
or management-server.log.

I have tried to restart management service already, but it has no effect. What 
other steps can I take?   I am thinking of restarting xen toolstack on the pool 
master (already tried on one of slave host, no effect), is this a safe 
operation to do with tons of VM running on all hosts and the pool master?

Yiping

PS: we are running CS 4.8.0 on RHLE 6.7 and Xen 6.5 with all recommended 
security patches installed.


Re: Cloudstack - volume migration between clusters (primary storage)

2016-08-30 Thread Yiping Zhang
I think the work is mostly performed by the hypervisors. I had seen following 
during storage live migration in XenCenter:

Highlight the primary storage for the departing cluster, then select the 
“Storage” tab on the right side panel.  You should see disk volumes on that 
primary storage. The far right column is the “Virtual Machine” the disk belongs 
to.

While the live storage migration is running, the migrating volume is shown as 
attached to a VM with the name “control domain for host xxx”, instead of the VM 
name it actually belongs to.

To me, this is pretty convincing that Xen cluster is doing the migration.

Yiping

On 8/27/16, 5:10 AM, "Makrand"  wrote:

Hello ilya,

If I am not mistaken, while adding secondary storage NFS server ip and path
is all one specifies in cloud-stack. Running df -h on ACS management server
shows you secondary storage mounted there. Don't think hypervisor sees NFS
(even if primary storage and NFS coming from same storage box). Plus, while
doing activities like VM deploy and snapshot things always move from
secondary to primary via SSVM.

Have you actually seen any setup where you have verified this?

@ cs user,
When you're moving the volumes, are those attached to running VM? or those
are just standalone orphan volumes?



--
Makrand


On Thu, Aug 25, 2016 at 4:24 AM, ilya  wrote:

> Not certain how Xen Storage Migration is implemented in 4.5.2
>
> I'd suspect legacy mode would be
>
> 1) copy disks from primary store to secondary NFS
> 2) copy disks from secondary NFS to new primary store
>
> it might be slow... but if you have enough space - it should work...
>
> My understanding is that NFS is mounted directly on hypervisors. I'd ask
> someone else to confirm though...
>
> On 8/24/16 7:20 AM, cs user wrote:
> > Hi All,
> >
> > Xenserver 6.5, cloudstack 4.5.2. NFS primary storage volumes
> >
> > Lets say I have 1 pod, with 2 clusters, each cluster has its own primary
> > storage.
> >
> > If I migrate a volume from one primary storage to the other one, using
> > cloudstack, what aspect of the environment is responsible for this copy?
> >
> > I'm trying to identify bottlenecks but I can't see what is responsible
> for
> > this copying. Is it is the xen hosts themselves or the secondary storage
> vm?
> >
> > Thanks!
> >
>




Re: Mess after volume migration.

2016-08-09 Thread Yiping Zhang
I encountered the same problem a few months ago.  With help from this list, I 
fixed my problems without any data loss, and posted my solution on the list.  
If you search the following subject line “corrupt DB after VM live migration 
with storage migration”,  you should see my posts.

Good luck

Yiping

On 8/9/16, 3:30 AM, "Makrand"  wrote:

Ilya,

Point to be noted is that my job didn't failed coz of the  timeout, but
rather coz of some VDI parameter at XENServer with below exception.

[SR_BACKEND_FAILURE_80, , Failed to mark VDI hidden [opterr=SR
96e879bf-93aa-47ca-e2d5-e595afbab294: error aborting existing process]]

I am still digging on this error from SMlogs etc on XEN server. But in
reality volume was migrated and I think that's important.


I, off course, faced timeout error during initial testing and after some
trial and error I realised that there is this "not so properly named
parameter" called *wait* (1800 default value) that needs to be modified in
end to make timeout error go away.

So all in all I modified parameters as below:-

migratewait: 36000
storage.pool.max.waitseconds: 36000
vm.op.cancel.interval: 36000
vm.op.cleanup.wait: 36000
wait:18000





--
Best,
Makrand


On Tue, Aug 9, 2016 at 6:07 AM, ilya  wrote:

> this happened to us on non XEN hypervisor as well.
>
> CloudStack has a timeout for a long running jobs - which i assume in
> your case - it has exceeded.
>
> Changing volumes table should be enough by referencing proper pool_id.
> Just make sure that data size matches on both ends.
>
> consider changing
> "copy.volume.wait" (if that does not help) also "vm.job.timeout"
>
>
> Regards
> ilya
>
> On 8/8/16 3:54 AM, Makrand wrote:
> > Guys,
> >
> > My setup:- ACS 4.4.2. Hypervisor: XENserver 6.2.
> >
> > I tried moving a volume in running VM from primary storage A to primary
> > storage B (using GUI of cloudstack). Please note, primary storage A LUN
> > (LUN7)is coming out of one storage box and  primary storage  B LUN
> (LUN14)
> > is from another.
> >
> > For VM1 with 250GB data volume (51 GB used space), I was able to move
> this
> > volume without any glitch in about 26mins.
> >
> > But for VM2 with 250Gb data volume (182 GB used space), the migration
> >  continued for about ~110 mins and then failed with follwing exception 
in
> > very end with message like:-
> >
> > 2016-08-06 14:30:57,481 WARN  [c.c.h.x.r.CitrixResourceBase]
> > (DirectAgent-192:ctx-5716ad6d) Task failed! Task record:
> > uuid: 308a8326-2622-e4c5-2019-3beb
> > 87b0d183
> >nameLabel: Async.VDI.pool_migrate
> >  nameDescription:
> >allowedOperations: []
> >currentOperations: {}
> >  created: Sat Aug 06 12:36:27 UTC 2016
> > finished: Sat Aug 06 14:30:32 UTC 2016
> >   status: failure
> >   residentOn: com.xensource.xenapi.Host@f242d3ca
> > progress: 1.0
> > type: 
> >   result:
> >errorInfo: [SR_BACKEND_FAILURE_80, , Failed to mark VDI 
hidden
> > [opterr=SR 96e879bf-93aa-47ca-e2d5-e595afbab294: error aborting existing
> > process]]
> >  otherConfig: {}
> >subtaskOf: com.xensource.xenapi.Task@aaf13f6f
> > subtasks: []
> >
> >
> > So cloudstack just removed the JOB telling it failed, says the mangement
> > server log.
> >
> > A) But when I am checking it at hyeprvisor level, the volume is on new 
SR
> > i.e. on LUN14. Strange huh? So now the new uuid for this volume from XE
> cli
> > is like
> >
> > [root@gcx-bom-compute1 ~]# xe vbd-list
> > vm-uuid=3fcb3070-e373-3cf9-d0aa-0a657142a38d
> > uuid ( RO) : f15dc54a-3868-8de8-5427-314e341879c6
> >   vm-uuid ( RO): 3fcb3070-e373-3cf9-d0aa-0a657142a38d
> > vm-name-label ( RO): i-22-803-VM
> >  vdi-uuid ( RO): cc1f8e83-f224-44b7-9359-282a1c1e3db1
> > empty ( RO): false
> >device ( RO): hdb
> >
> > B) But luckily I had the entry taken before migration  and it shows
> like:-
> >
> > uuid ( RO) : f15dc54a-3868-8de8-5427-314e341879c6
> > vm-uuid ( RO): 3fcb3070-e373-3cf9-d0aa-0a657142a38d
> > vm-name-label ( RO): i-22-803-VM
> > vdi-uuid ( RO): 7c073522-a077-41a0-b9a7-7b61847d413b
> > empty ( RO): false
> > device ( RO): hdb
> >
> > C) Since this failed at cloudstack, the DB is still holding old value.
> > Here is current volume table entry in DB
> >
> > id: 1004
> >> account_id: 22
> >>   

Re: Migrate vm from local storage to shared - change service offering

2016-07-05 Thread Yiping Zhang
Before staring the VM, you should also switch service offering of the migrated 
VM in UI, pick one using “shared” storage instead of “local” storage.


On 7/5/16, 12:21 AM, "cs user"  wrote:

Hi Shweta,

That's interesting. Yep with 4.5.2 I am able to power down a vm, and
migrate the root volume from local storage to nfs shared storage. The
problem is the vm ends up in a state of limbo with regards to the service
offering and HA.

Would there be any ill effects of manually going into the database and
changing the service offering and setting HA to be true? I assume I'd only
need to update the vm_instance table.

Thanks!

On Tue, Jul 5, 2016 at 5:47 AM, Shweta Agarwal <
shweta.agarw...@accelerite.com> wrote:

> Cold migration of VM /Volume form local to nfs or vice versa is not
> supported in Cloudstack. If you are able to do this it’s a bug.
>
>
> Thanks
> Shweta
>
> On 7/4/16, 5:33 PM, "cs user"  wrote:
>
> >Hi Folks,
> >
> >This question relates to cloudstack 4.5.2. I know this is a bit behind now
> >but its still quite stable for us :-)
> >
> >The problem I have is that lets say I do the following:
> >
> >1. Launch a vm using a service offering which does not have shared
> storage,
> >and does not offer ha (local only).
> >
> >2. I then shutdown this vm and manually migrate the root volume to shared
> >storage.
> >
> >3. I then want to enable ha.
> >
> >
> >The difficulty here is that when you go to change the service offering
> >within cloudstack, it still only offers service offerings which don't have
> >HA. There doesn't seem to be a way to transition from one state to
> another,
> >at least using only the GUI.
> >
> >Is there a way to do this using cloudmonkey ?
> >
> >If not, what steps could be followed to change the config of this vm
> within
> >the DB?
> >
> >Thanks!
>
>
>
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Accelerite, a Persistent Systems business. It is intended
> only for the use of the individual or entity to which it is addressed. If
> you are not the intended recipient, you are not authorized to read, retain,
> copy, print, distribute or use this message. If you have received this
> communication in error, please notify the sender and delete all copies of
> this message. Accelerite, a Persistent Systems business does not accept any
> liability for virus infected mails.
>




Re: Can't stop mgmt server cleanly for CS 4.8.0

2016-06-10 Thread Yiping Zhang
RHEL 6.7 / java-1.7.0-openjdk-1.7.0.85-2.6.1.3 / tomcat6-6.0.24-90

On 6/10/16, 12:49 PM, "Marc-Andre Jutras" <mar...@marcuspocus.com> wrote:

>which java / tomcat / centos version you're running on ?
>
>
>On 2016-06-10 3:41 PM, Yiping Zhang wrote:
>> Hi, all:
>>
>> We have a cron job to restart mgmt. service once a week. However, after we 
>> upgraded to CS 4.8.0, the cron job would leave the service stopped and 
>> Nagios starts to page oncall.  We traced the problem to that the mgmt. 
>> service won’t stop cleanly, even when we try to stop it manually from CLI:
>>
>> # service cloudstack-management stop
>> Stopping cloudstack-management:[FAILED]
>> # service cloudstack-management status
>> cloudstack-management dead but pid file exists
>> The pid file locates at /var/run/cloudstack-management.pid and lock file at 
>> /var/lock/subsys/cloudstack-management.
>>  Starting cloudstack-management will take care of them or you can 
>> manually clean up.
>> #
>>
>> The catalina.out log file has following exception:
>>
>> INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (Thread-95:null) stopping bean 
>> ClusterManagerImpl
>> INFO  [c.c.c.ClusterManagerImpl] (Thread-95:null) Stopping Cluster manager, 
>> msid : 60274787591663
>> ERROR [c.c.c.ClusterServiceServletContainer] (Thread-11:null) Unexpected 
>> exception
>> java.net.SocketException: Socket closed
>>   at java.net.PlainSocketImpl.socketAccept(Native Method)
>>   at 
>> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>>   at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>>   at java.net.ServerSocket.accept(ServerSocket.java:498)
>>   at 
>> com.cloud.cluster.ClusterServiceServletContainer$ListenerThread.run(ClusterServiceServletContainer.java:131)
>> log4j:WARN No appenders could be found for logger 
>> (com.cloud.cluster.ClusterServiceServletContainer).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
>> more info.
>> Exception in thread "Timer-2" java.lang.NullPointerException
>>   at 
>> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
>>   at 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.getContext(ManagedContextRunnable.java:66)
>>   at 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>   at 
>> org.apache.cloudstack.managed.context.ManagedContextTimerTask.run(ManagedContextTimerTask.java:27)
>>   at java.util.TimerThread.mainLoop(Timer.java:555)
>>   at java.util.TimerThread.run(Timer.java:505)
>> Exception in thread "Timer-1" java.lang.NullPointerException
>>   at 
>> org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
>>   at 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.getContext(ManagedContextRunnable.java:66)
>>   at 
>> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
>>   at 
>> org.apache.cloudstack.managed.context.ManagedContextTimerTask.run(ManagedContextTimerTask.java:27)
>>   at java.util.TimerThread.mainLoop(Timer.java:555)
>>   at java.util.TimerThread.run(Timer.java:505)
>>
>> The SocketException does not show up in every CS instances, but those 
>> NullPointerException for thread Timer-1/2 are present for all CS instances.
>>
>> As a work around, we have to stop the service again, to clean up leftover 
>> pid and lock files,  before starting the service again.
>>
>> Has anyone else seen this problem ?
>>
>> Thanks,
>>
>> Yiping
>



Can't stop mgmt server cleanly for CS 4.8.0

2016-06-10 Thread Yiping Zhang
Hi, all:

We have a cron job to restart mgmt. service once a week. However, after we 
upgraded to CS 4.8.0, the cron job would leave the service stopped and Nagios 
starts to page oncall.  We traced the problem to that the mgmt. service won’t 
stop cleanly, even when we try to stop it manually from CLI:

# service cloudstack-management stop
Stopping cloudstack-management:[FAILED]
# service cloudstack-management status
cloudstack-management dead but pid file exists
The pid file locates at /var/run/cloudstack-management.pid and lock file at 
/var/lock/subsys/cloudstack-management.
Starting cloudstack-management will take care of them or you can 
manually clean up.
#

The catalina.out log file has following exception:

INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (Thread-95:null) stopping bean 
ClusterManagerImpl
INFO  [c.c.c.ClusterManagerImpl] (Thread-95:null) Stopping Cluster manager, 
msid : 60274787591663
ERROR [c.c.c.ClusterServiceServletContainer] (Thread-11:null) Unexpected 
exception
java.net.SocketException: Socket closed
 at java.net.PlainSocketImpl.socketAccept(Native Method)
 at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
 at java.net.ServerSocket.implAccept(ServerSocket.java:530)
 at java.net.ServerSocket.accept(ServerSocket.java:498)
 at 
com.cloud.cluster.ClusterServiceServletContainer$ListenerThread.run(ClusterServiceServletContainer.java:131)
log4j:WARN No appenders could be found for logger 
(com.cloud.cluster.ClusterServiceServletContainer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
info.
Exception in thread "Timer-2" java.lang.NullPointerException
 at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
 at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.getContext(ManagedContextRunnable.java:66)
 at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
 at 
org.apache.cloudstack.managed.context.ManagedContextTimerTask.run(ManagedContextTimerTask.java:27)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
Exception in thread "Timer-1" java.lang.NullPointerException
 at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
 at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.getContext(ManagedContextRunnable.java:66)
 at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
 at 
org.apache.cloudstack.managed.context.ManagedContextTimerTask.run(ManagedContextTimerTask.java:27)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)

The SocketException does not show up in every CS instances, but those 
NullPointerException for thread Timer-1/2 are present for all CS instances.

As a work around, we have to stop the service again, to clean up leftover pid 
and lock files,  before starting the service again.

Has anyone else seen this problem ?

Thanks,

Yiping


Re: cloudmonkey config file get reset to default settings

2016-05-26 Thread Yiping Zhang
In my case, we have scripts running by Puppet,  and crond hourly, and by Nagios 
(once every 5 min, I think), all by root user or sudo to root user (in case of 
Nagios).  In most of our scripts, we call “set display default” first. Only 
occasionally , admins would login and manually run cloudmonkey with sudo.

As for the config file, we have puppet manage it to make sure that our 
apikey/secretkey, and logfile location are set correctly. 

IMHO,  set command should NOT automatically save current settings to disk.  For 
interactive use, set should just changes settings for current session, an 
explicit save cmd could persist the changes to the disk. For CLI use, each call 
to cloudmonkey is its own session, therefore, the user is better to have all 
correct settings in the config file to begin with.  

Thanks,
Yiping

On 5/25/16, 10:58 PM, "Rohit Yadav" <rohit.ya...@shapeblue.com> wrote:

>Correction from previous reply:
>
>"I'll see what I can do, in general you should NOT be replacing or changing 
>the cloudmonkey config file outside of cloudmonkey itself."
>
>Regards,
>Rohit Yadav
>
>
>rohit.ya...@shapeblue.com 
>www.shapeblue.com
>53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>@shapeblue
>
>
>On May 26 2016, at 11:27 am, Rohit Yadav <rohit.ya...@shapeblue.com> wrote:
>Whenever a set command is called, it would save/update the config file. When 
>you run set profile xyz; it needs to make that profile the default profile and 
>update other parameters associated with the profile which may be set as well 
>(such as url, username, password, apikey, secretkey etc). When cloudmonkey is 
>running, and you replace the config file; on calling 'set' it would save the 
>config file based on its in-memory config dictionary.
>
>I'll see what I can do, in general you should be replacing or changing the 
>cloudmonkey config file outside of cloudmonkey itself. If you want to create 
>new profile, set new rules; you should call cloudmonkey set   
>either on command line or use puppet to execute them. The tool was intended 
>for single user, in case of multi-user or concurrent usage, there is no 
>concurrency control wrt configs.
>
>One solution could be that, each server profile has their own config file 
>instead of a single config file, and you can start cloudmonkey to pick a 
>server profile with a command line flag such as -p . I'll see how I 
>may improve this, for this I would like to know how exactly you are using 
>puppet or any other automation?
>
>Regards,
>Rohit Yadav
>
>On May 26 2016, at 3:40 am, ilya <ilya.mailing.li...@gmail.com> wrote:
>
>I've seen the similar behaviour.
>
>For some reason cloudmonkey try to persist the configs each time your
>run something.
>
>If i open cloudmonkey in multiple terminals and use different profiles
>and execute commands in mutliple terminals in parallel - i've seen
>cloudmonkey mess up the config for one of open profiles.
>
>Specifically, the URL of cloudstack in profile1 might be changed with
>url of cloudstack in profile2.
>
>Rohit, is there a reason why cloudmonkey tries to update the settings in
>config file each time something gets executed?
>
>On 5/24/16 1:31 PM, Yiping Zhang wrote:
>> Hi,
>>
>> We have a few scripts that use cloudmonkey to talk to CloudStack server. The 
>> scripts are invoked by Puppet once per hour.
>>
>> However, every once a while, the /root/.cloudmonkey/config file would be 
>> over written with default settings. That is, blank apikey/secretkey, default 
>> password, default log file location etc.
>>
>> I am wondering by any chance that cloudmonkey would put a default config 
>> file in place for some reason ?
>>
>> Thanks,
>>
>> Yiping
>>



cloudmonkey config file get reset to default settings

2016-05-24 Thread Yiping Zhang
Hi,

We have a few scripts that use cloudmonkey to talk to CloudStack server.  The 
scripts are invoked by Puppet once per hour.

However, every once a while, the /root/.cloudmonkey/config file would be over 
written with default settings. That is, blank apikey/secretkey, default 
password, default log file location etc.

I am wondering by any chance that cloudmonkey would put a default config file 
in place for some reason ?

Thanks,

Yiping


[SOLVED]: corrupt DB after VM live migration with storage migration

2016-05-05 Thread Yiping Zhang
First,  I’d like to thank Ilya and Ahmad for their feedbacks.

Here is the procedure I followed to get this issue fixed:

1. For safety,  I updated following global settings:
expunge.delay => 86400 (original value: 60 sec)
expunge.interval => 86400 (original value: 60 sec)
storage.cleanup.enabled => false  (original value: true)
2. Shutdown cloudstack-management service
3. Do a full database backup using mysqldump for database cloud.
4. As we are using NetApp for primary storage, I took a manual snapshot of the 
volume.
5. Start cloudstack-management service for new global settings to take effect. 
5. Shut down VM instances from web UI
6. Do database fixups:

6a. Find out volume names for the instance you are fixing. In this example, the 
instance has two volumes with name ROOT-98 and DATA-98’
6b. Get current info for these volumes from volumes table. Here is the current 
entries (before fixing):

mysql> select id, name, instance_id, uuid, path, pool_id, state, removed from 
volumes where name='ROOT-98' or name='DATA-98';
+-+-+-+--+--+-+-+-+
| id  | name| instance_id | uuid | path 
| pool_id | state   | removed |
+-+-+-+--+--+-+-+-+
| 126 | ROOT-98 |  98 | ebc10ccc-9f58-4b2a-8748-f52caacb587c | 
25400c2c-0f39-475f-9f9c-50fdd05afab3 |   1 | Ready   | NULL|
| 127 | DATA-98 |  98 | f8794a2c-6cd0-4e26-a3c7-fdb7ec465ba3 | 
b54b2f04-dfec-4623-90fa-41c726067e7f |   1 | Ready   | NULL|
| 322 | ROOT-98 |NULL | 0f183764-2349-42c9-9fdd-944b892173ab | NULL 
|   8 | Destroy | 2016-05-03 19:06:39 |
| 323 | ROOT-98 |NULL | f2753635-4616-48c8-94bc-97d2a09b72a3 | NULL 
|   8 | Destroy | 2016-05-04 11:01:19 |
+-+-+-+--+--+-+-+-+

6c. Find out UUIDs for volumes ROOT-98 and DATA-98 on the new hypervisor pool, 
using xe CLI tool on one of hypervisors in the pool:


# xe vdi-list name-label=ROOT-98 read-only=false | grep "^uuid"
uuid ( RO): 27be8a27-e26a-457b-9140-6181a1bc6bd2
# xe vdi-list name-label=DATA-98 read-only=false | grep "^uuid"
uuid ( RO): 1c5d388a-fc36-4e0c-94dd-64e450eef7ab

6d. Now run SQL update statements to modify volumes with id=323, the root disk, 
and id=127, the data disk, so that they look like following:


mysql> select id, name, instance_id, uuid, path, pool_id, state, removed from 
volumes where name='ROOT-98' or name='DATA-98’; 
+-+-+-+--+--+-+-+-+
| id  | name| instance_id | uuid | path 
| pool_id | state   | removed |
+-+-+-+--+--+-+-+-+
| 126 | ROOT-98 |NULL | ebc10ccc-9f58-4b2a-8748-f52caacb587c | 
25400c2c-0f39-475f-9f9c-50fdd05afab3 |   1 | Ready   | 2016-05-05 18:53:36 |
| 127 | DATA-98 |  98 | f8794a2c-6cd0-4e26-a3c7-fdb7ec465ba3 | 
1c5d388a-fc36-4e0c-94dd-64e450eef7ab |   8 | Ready   | NULL|
| 322 | ROOT-98 |NULL | 0f183764-2349-42c9-9fdd-944b892173ab | NULL 
|   8 | Destroy | 2016-05-03 19:06:39 |
| 323 | ROOT-98 |  98 | f2753635-4616-48c8-94bc-97d2a09b72a3 | 
27be8a27-e26a-457b-9140-6181a1bc6bd2 |   8 | Ready   | NULL|
+-+-+-+--+--+-+-+-+
4 rows in set (0.00 sec)

Note:  Entry with id-126 has columns instance_id, removed updated;
   Entry with id=127 has columns path, pool_id updated;
   Entry with id=323 has columns instance_id, path, state, removed updated;

6e. Start the VM instance.  Just to be really sure,  I also stop and started 
the instance once more through the web UI to make sure that the instance can be 
rebooted normally.

Repeat steps 5, and 6a-e for each VM instances that need the fix.

That’s all I have to do to recover all four Vm instances.


Now,  I still don’t know what is the root cause of this problem and how can I 
avoid it in the future.


Have a good day, all.

Yiping







On 5/4/16, 11:25 PM, "Yiping Zhang" <yzh...@marketo.com> wrote:

>Thanks, it’s a good idea to back

Re: [Urgent]: corrupt DB after VM live migration with storage migration

2016-05-05 Thread Yiping Zhang
Thanks, it’s a good idea to back up those “removed” disks first before 
attempting DB surgery!




On 5/4/16, 9:57 PM, "ilya" <ilya.mailing.li...@gmail.com> wrote:

>never mind - on the "removed" disks - it deletes well.
>
>On 5/4/16 9:55 PM, ilya wrote:
>> I'm pretty certain cloudstack does not have purging on data disks as i
>> had to write my own :)
>> 
>> On 5/4/16 9:51 PM, Ahmad Emneina wrote:
>>> I'm not sure if the expunge interval/delay plays a part... but you might
>>> want to set: storage.cleanup.enabled to false. That might prevent your
>>> disks from being purged. You might also look to export those volumes, or
>>> copy them to a safe location, out of band.
>>>
>>> On Wed, May 4, 2016 at 8:49 PM, Yiping Zhang <yzh...@marketo.com> wrote:
>>>
>>>> Before I try the direct DB modifications, I would first:
>>>>
>>>> * shutdown the VM instances
>>>> * stop cloudstack-management service
>>>> * do a DB backup with mysqldump
>>>>
>>>> What I worry the most is that the volumes on new cluster’s primary storage
>>>> device are marked as “removed”, so if I shutdown the instances, the
>>>> cloudstack may kick off a storage cleanup job to remove them from new
>>>> cluster’s primary storage  before I can get the fixes in.
>>>>
>>>> Is there a way to temporarily disable storage cleanups ?
>>>>
>>>> Yiping
>>>>
>>>>
>>>>
>>>>
>>>> On 5/4/16, 3:22 PM, "Yiping Zhang" <yzh...@marketo.com> wrote:
>>>>
>>>>> Hi, all:
>>>>>
>>>>> I am in a situation that I need some help:
>>>>>
>>>>> I did a live migration with storage migration required for a production
>>>> VM instance from one cluster to another.  The first migration attempt
>>>> failed after some time, but the second attempt succeeded. During all this
>>>> time the VM instance is accessible (and it is still up and running).
>>>> However, when I use my api script to query volumes, it still reports that
>>>> the volume is on the old cluster’s primary storage.  If I shut down this
>>>> VM,  I am afraid that it won’t start again as it would try to use
>>>> non-existing volumes.
>>>>>
>>>>> Checking database, sure enough, the DB still has old info about these
>>>> volumes:
>>>>>
>>>>>
>>>>> mysql> select id,name from storage_pool where id=1 or id=8;
>>>>>
>>>>> ++--+
>>>>>
>>>>> | id | name |
>>>>>
>>>>> ++--+
>>>>>
>>>>> |  1 | abprod-primary1  |
>>>>>
>>>>> |  8 | abprod-p1c2-pri1 |
>>>>>
>>>>> ++--+
>>>>>
>>>>> 2 rows in set (0.01 sec)
>>>>>
>>>>>
>>>>> Here the old cluster’s primary storage has id=1, and the new cluster’s
>>>> primary storage has id=8.
>>>>>
>>>>>
>>>>> Here are the entries with wrong info in volumes table:
>>>>>
>>>>>
>>>>> mysql> select id,name, uuid, path,pool_id, removed from volumes where
>>>> name='ROOT-97' or name='DATA-97';
>>>>>
>>>>
>>>>> +-+-+--+--+-+-+
>>>>>
>>>>> | id  | name| uuid | path
>>>>  | pool_id | removed |
>>>>>
>>>>
>>>>> +-+-+--+--+-+-+
>>>>>
>>>>> | 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 |
>>>> 5d1ab4ef-2629-4384-a56a-e2dc1055d032 |   1 | NULL|
>>>>>
>>>>> | 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c |
>>>> 6b75496d-5907-46c3-8836-5618f11dac8e |   1 | NULL|
>>>>>
>>>>> | 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL
>>>>  |   8 | 2016-05-03 06:10:40 |
>>>>>
>>>>> | 317 | ROOT-97 | 8ba2

Re: [Urgent]: corrupt DB after VM live migration with storage migration

2016-05-04 Thread Yiping Zhang
Before I try the direct DB modifications, I would first:

* shutdown the VM instances
* stop cloudstack-management service
* do a DB backup with mysqldump

What I worry the most is that the volumes on new cluster’s primary storage 
device are marked as “removed”, so if I shutdown the instances, the cloudstack 
may kick off a storage cleanup job to remove them from new cluster’s primary 
storage  before I can get the fixes in.

Is there a way to temporarily disable storage cleanups ?

Yiping




On 5/4/16, 3:22 PM, "Yiping Zhang" <yzh...@marketo.com> wrote:

>Hi, all:
>
>I am in a situation that I need some help:
>
>I did a live migration with storage migration required for a production VM 
>instance from one cluster to another.  The first migration attempt failed 
>after some time, but the second attempt succeeded. During all this time the VM 
>instance is accessible (and it is still up and running).  However, when I use 
>my api script to query volumes, it still reports that the volume is on the old 
>cluster’s primary storage.  If I shut down this VM,  I am afraid that it won’t 
>start again as it would try to use non-existing volumes.
>
>Checking database, sure enough, the DB still has old info about these volumes:
>
>
>mysql> select id,name from storage_pool where id=1 or id=8;
>
>++--+
>
>| id | name |
>
>++--+
>
>|  1 | abprod-primary1  |
>
>|  8 | abprod-p1c2-pri1 |
>
>++--+
>
>2 rows in set (0.01 sec)
>
>
>Here the old cluster’s primary storage has id=1, and the new cluster’s primary 
>storage has id=8.
>
>
>Here are the entries with wrong info in volumes table:
>
>
>mysql> select id,name, uuid, path,pool_id, removed from volumes where 
>name='ROOT-97' or name='DATA-97';
>
>+-+-+--+--+-+-+
>
>| id  | name| uuid | path  
>   | pool_id | removed |
>
>+-+-+--+--+-+-+
>
>| 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 
>5d1ab4ef-2629-4384-a56a-e2dc1055d032 |   1 | NULL|
>
>| 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 
>6b75496d-5907-46c3-8836-5618f11dac8e |   1 | NULL|
>
>| 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL  
>   |   8 | 2016-05-03 06:10:40 |
>
>| 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL  
>   |   8 | 2016-05-03 06:10:45 |
>
>+-+-+--+--+-+-+
>
>4 rows in set (0.01 sec)
>
>On the xenserver of old cluster, the volumes do not exist:
>
>
>[root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'
>
>[root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'
>
>[root@abmpc-hv01 ~]#
>
>But the volumes are on the new cluster’s primary storage:
>
>
>[root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97
>
>uuid ( RO): a253b217-8cdc-4d4a-a111-e5b6ad48a1d5
>
>  name-label ( RW): ROOT-97
>
>name-description ( RW):
>
> sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>virtual-size ( RO): 34359738368
>
>sharable ( RO): false
>
>   read-only ( RO): true
>
>
>uuid ( RO): c46b7a61-9e82-4ea1-88ca-692cd4a9204b
>
>  name-label ( RW): ROOT-97
>
>name-description ( RW):
>
> sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>virtual-size ( RO): 34359738368
>
>sharable ( RO): false
>
>   read-only ( RO): false
>
>
>[root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97
>
>uuid ( RO): bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722
>
>  name-label ( RW): DATA-97
>
>name-description ( RW):
>
> sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>virtual-size ( RO): 107374182400
>
>sharable ( RO): false
>
>   read-only ( RO): false
>
>
>uuid ( RO): a8c187cc-2ba0-4928-8acf-2afc012c036c
>
>  name-label ( RW): DATA-97
>
>name-description ( RW):
>
> sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3
>
>virtual-size ( RO): 107374182400
>
>sharable ( RO): false
>
>   read-only ( RO): true
>
>
>Follo

[Urgent]: corrupt DB after VM live migration with storage migration

2016-05-04 Thread Yiping Zhang
Hi, all:

I am in a situation that I need some help:

I did a live migration with storage migration required for a production VM 
instance from one cluster to another.  The first migration attempt failed after 
some time, but the second attempt succeeded. During all this time the VM 
instance is accessible (and it is still up and running).  However, when I use 
my api script to query volumes, it still reports that the volume is on the old 
cluster’s primary storage.  If I shut down this VM,  I am afraid that it won’t 
start again as it would try to use non-existing volumes.

Checking database, sure enough, the DB still has old info about these volumes:


mysql> select id,name from storage_pool where id=1 or id=8;

++--+

| id | name |

++--+

|  1 | abprod-primary1  |

|  8 | abprod-p1c2-pri1 |

++--+

2 rows in set (0.01 sec)


Here the old cluster’s primary storage has id=1, and the new cluster’s primary 
storage has id=8.


Here are the entries with wrong info in volumes table:


mysql> select id,name, uuid, path,pool_id, removed from volumes where 
name='ROOT-97' or name='DATA-97';

+-+-+--+--+-+-+

| id  | name| uuid | path   
  | pool_id | removed |

+-+-+--+--+-+-+

| 124 | ROOT-97 | 224bf673-fda8-4ccc-9c30-fd1068aee005 | 
5d1ab4ef-2629-4384-a56a-e2dc1055d032 |   1 | NULL|

| 125 | DATA-97 | d385d635-9230-4130-8d1f-702dbcf0f22c | 
6b75496d-5907-46c3-8836-5618f11dac8e |   1 | NULL|

| 316 | ROOT-97 | 691b5c12-7ec4-408d-b66f-1ff041f149c1 | NULL   
  |   8 | 2016-05-03 06:10:40 |

| 317 | ROOT-97 | 8ba29fcf-a81a-4ca0-9540-0287230f10c7 | NULL   
  |   8 | 2016-05-03 06:10:45 |

+-+-+--+--+-+-+

4 rows in set (0.01 sec)

On the xenserver of old cluster, the volumes do not exist:


[root@abmpc-hv01 ~]# xe vdi-list name-label='ROOT-97'

[root@abmpc-hv01 ~]# xe vdi-list name-label='DATA-97'

[root@abmpc-hv01 ~]#

But the volumes are on the new cluster’s primary storage:


[root@abmpc-hv04 ~]# xe vdi-list name-label=ROOT-97

uuid ( RO): a253b217-8cdc-4d4a-a111-e5b6ad48a1d5

  name-label ( RW): ROOT-97

name-description ( RW):

 sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

virtual-size ( RO): 34359738368

sharable ( RO): false

   read-only ( RO): true


uuid ( RO): c46b7a61-9e82-4ea1-88ca-692cd4a9204b

  name-label ( RW): ROOT-97

name-description ( RW):

 sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

virtual-size ( RO): 34359738368

sharable ( RO): false

   read-only ( RO): false


[root@abmpc-hv04 ~]# xe vdi-list name-label=DATA-97

uuid ( RO): bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722

  name-label ( RW): DATA-97

name-description ( RW):

 sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

virtual-size ( RO): 107374182400

sharable ( RO): false

   read-only ( RO): false


uuid ( RO): a8c187cc-2ba0-4928-8acf-2afc012c036c

  name-label ( RW): DATA-97

name-description ( RW):

 sr-uuid ( RO): 6d4bea51-f253-3b43-2f2f-6d7ba3261ed3

virtual-size ( RO): 107374182400

sharable ( RO): false

   read-only ( RO): true


Following is how I plan to fix the corrupted DB entries. Note: using uuid of 
VDI volume with read/write access as the path values:


1. for ROOT-97 volume:

Update volumes set removed=NOW() where id=124;
Update volumes set removed=NULL where id=317;
Update volumes set path=c46b7a61-9e82-4ea1-88ca-692cd4a9204b where id=317;


2) for DATA-97 volume:

Update volumes set pool_id=8 where id=125;

Update volumes set path=bc868e3d-b3c0-4c6a-a6fc-910bc4dd1722 where id=125;


Would this work?


Thanks for all the helps anyone can provide.  I have a total of 4 VM instances 
with 8 volumes in this situation need to be fixed.


Yiping


Re: AW: API deploy virtualmachine failed with custom offering

2016-03-23 Thread Yiping Zhang
That syntax worked for me (CS 4.5.1, cloudmonkey 5.3.1).

Can you create a VM in UI, using the same 
zone/domain/template/network/serviceoffering selections as used in your API 
call?  I suspect that your combined selections are not valid, for example:  
template/network is not available to the selected zone, or something similar to 
that effect.

Yiping



On 3/23/16, 9:28 AM, "Martin Emrich"  wrote:

>Hi!
>
>Sorry, same effect:
>
>(local)  > deploy virtualmachine zoneid=9c667177-4f6e-4d51-b841-2b83f5e851ac 
>domainid=6ff469a6-d975-11e5-98dc-001e8c29bd11 
>templateid=14c5e883-c6c3-4fed-83d5-451d1064dcad 
>networkids=9cb521cf-4e31-4e4e-9cc3-0594c0d86b23  
>serviceofferingid=8aab1d01-1ec3-4eec-85ab-6caba370288a details[0].cpuNumber=2 
>details[0].cpuSpeed=1000 details[0].Memory=512 name=test1
>Error 401 Authentication error
>errorcode = 401
>errortext = unable to verify user credentials and/or request signature
>uuidList:
>
>(BTW: testing now on my litte test cloud running 4.7.1, cloudmonkey is 5.3.2)
>
>Thanks,
>
>Martin
>
>-Ursprüngliche Nachricht-
>Von: Patrick Miller [mailto:patrick.mil...@sungardas.com] 
>Gesendet: Mittwoch, 23. März 2016 16:50
>An: users@cloudstack.apache.org
>Betreff: Re: API deploy virtualmachine failed with custom offering
>
>Martin:
>If you look at the ui test for memory
>details%5B0%5D.memory=512  Notice there is no "M"  megabytes is implied.
>
>Please try:
>deploy virtualmachine zoneid=9c667177-4f6e-4d51-b841-2b83f5e851ac
>domainid=6ff469a6-d975-11e5-98dc-001e8c29bd11
>templateid=14c5e883-c6c3-4fed-83d5-451d1064dcad
>networkids=9cb521cf-4e31-4e4e-9cc3-0594c0d86b23
>serviceofferingid=8aab1d01-1ec3-4eec-85ab-6caba370288a
>details[0].cpuNumber=2 details[0].cpuSpeed=1000 details[0].memory=512
>name=test1
>
>Just removing the "M" from 512.
>
>Hope this helps.
>
>Patrick
>
>Patrick Miller ▪ Senior Systems Engineer ▪ Sungard Availability Services
>2481 Deerwood Dr, San Ramon, Ca 94583 ▪  Office: 925-831-7738 
>patrick.mil...@sungardas.com  ▪ www.sungardas.com
>
>
>CONFIDENTIALITY:  This e-mail (including any attachments) may contain 
>confidential, proprietary and privileged information, and unauthorized 
>disclosure or use is prohibited.  If you received this e-mail in error, please 
>notify the sender and delete this e-mail from your system.
>
>On Wed, Mar 23, 2016 at 8:14 AM, Martin Emrich 
>wrote:
>
>> Hi!
>>
>> I also tried with [0], but it fails, too.
>>
>> The command line:
>>
>> (local)  > deploy virtualmachine
>> zoneid=9c667177-4f6e-4d51-b841-2b83f5e851ac
>> domainid=6ff469a6-d975-11e5-98dc-001e8c29bd11
>> templateid=14c5e883-c6c3-4fed-83d5-451d1064dcad
>> networkids=42b962bf-27f1-434b-bd14-239a909a206e
>> serviceofferingid=8aab1d01-1ec3-4eec-85ab-6caba370288a
>> details[0].cpuNumber=2 details[0].cpuSpeed=1000 details[0].Memory=512M
>> name=test1
>> Error 401 Authentication error
>> errorcode = 401
>> errortext = unable to verify user credentials and/or request signature
>> uuidList:
>>
>> From the cloudmonkey log:
>>
>> 2016-03-23 15:09:23,953 - connectionpool.py:207 - [INFO] Starting new 
>> HTTP connection (1): localhost
>> 2016-03-23 15:09:23,984 - connectionpool.py:387 - [DEBUG] "GET 
>> /client/api?networkids=42b962bf-27f1-434b-bd14-239a909a206e=6
>> ff469a6-d975-11e5-98dc-001e8c29bd11=M5-9oj4BbFevrrNL92dHA3N_ema
>> WHirVzFLxhKenCT8z8dMpzb1PmeeAJv2ICmNuSsnj6tl371T8VYxjGtPLJg=test1
>> %5B0%5D.cpuSpeed=1000=2016-03-23T14%3A19%3A23%2B
>> ignatureversion=3=9c667177-4f6e-4d51-b841-2b83f5e851ac%
>> 5B0%5D.cpuNumber=2=deployVirtualMachine=14c5e883-c6
>> c3-4fed-83d5-451d1064dcad=json=VDammkzNSFJI25IZNRzM
>> X9I0K7U%3D=8aab1d01-1ec3-4eec-85ab-6caba370288a
>> ils%5B0%5D.Memory=512M
>> HTTP/1.1" 401 137
>> 2016-03-23 15:09:23,985 - requester.py:49 - [DEBUG] Request sent:
>> http://localhost:8080/client/api?networkids=42b962bf-27f1-434b-bd14-23
>> 9a909a206e=6ff469a6-d975-11e5-98dc-001e8c29bd11=M5-9oj
>> 4BbFevrrNL92dHA3N_emaWHirVzFLxhKenCT8z8dMpzb1PmeeAJv2ICmNuSsnj6tl371T8
>> VYxjGtPLJg=test1%5B0%5D.cpuSpeed=1000=2016-03-23T
>> 14%3A19%3A23%2B=3=9c667177-4f6e-4d51-b841-
>> 2b83f5e851ac%5B0%5D.cpuNumber=2=deployVirtualMachine
>> emplateid=14c5e883-c6c3-4fed-83d5-451d1064dcad=json
>> =VDammkzNSFJI25IZNRzMX9I0K7U%3D=8aab1d01-1ec3-4eec-8
>> 5ab-6caba370288a%5B0%5D.Memory=512M
>> 2016-03-23 15:09:23,985 - requester.py:49 - [DEBUG] Response received:
>> {"deployvirtualmachineresponse":{"uuidList":[],"errorcode":401,"errort
>> ext":"unable to verify user credentials and/or request signature"}}
>> 2016-03-23 15:09:23,985 - requester.py:49 - [DEBUG] Error: 401 
>> Authentication error
>> 2016-03-23 15:09:23,985 - requester.py:49 - [DEBUG]  END 
>> Request 
>>
>> This is the corresponding line in the cloudstack management log:
>>
>> 2016-03-23 15:09:23,963 DEBUG [c.c.a.ApiServlet]
>> (catalina-exec-10:ctx-57c16d4d) (logid:e9a26d51) ===START===
>> 0:0:0:0:0:0:0:1 -- GET

Re: XenServer cluster size

2016-03-09 Thread Yiping Zhang
I think I didn’t make it clear.  Those 10 hosts are all Gen8 blades, that’s why 
I want them to form a pool of their own.  New Gen9 blades will go into a 
different pool.



On 3/9/16, 10:14 AM, "Tim Mackey" <tmac...@gmail.com> wrote:

>In that case, Yiping, I would *definitely* recommend putting those servers
>into at least two pools. The processors used in Gen8 and Gen9 servers can
>not currently be joined into the same pool, and you actually need to be
>very sensitive to the processor steppings. Dundee should fix that, but no
>current version of CloudStack supports Dundee (and neither does Citrix at
>the moment).
>
>-tim
>
>On Wed, Mar 9, 2016 at 12:38 PM, Yiping Zhang <yzh...@marketo.com> wrote:
>
>> Hi, Tim:
>>
>> Thanks for very detailed reply.
>>
>> These are Gen8 HP blades and all my new servers will be Gen9.  That’s why
>> I’d like to combine them into one maxed out cluster.  I have only two guest
>> VLAN’s and roughly 400 VM instances for this 10 hosts cluster.  So I think
>> performance wise I should be OK.
>>
>> Yiping
>>
>>
>>
>> On 3/8/16, 4:28 PM, "Tim Mackey" <tmac...@gmail.com> wrote:
>>
>> >Yiping,
>> >
>> >Here's the detailed answer 
>> >
>> >From the XenServer perspective, there are a number of factors which go
>> into
>> >how various configuration limits are arrived at. Most of the time, they
>> >aren't hard limits (for example I know of users with more than 16 hosts in
>> >a pool). What the XenServer team do is for a given metric they determine
>> >the point at which overall scalability is reduced to a target threshold.
>> >That then becomes the "configuration limit" for a given release, and we
>> >retest with every version.
>> >
>> >In the case of the "hosts per pool" limit, we need to ensure that all
>> >operations we have can be performed without impairment with a given number
>> >of hosts in a pool. We've kept the same maximum number of hosts in a pool
>> >for a very long time (close to ten years so far), and that's a direct
>> >reflection of how much we've increased individual host scalability.
>> >
>> >From a CloudStack perspective, there have been a number of serious scale
>> >limits which have pushed XenServer. Hundreds of VLANs is one example that
>> >Ahmad cites, but its also a case of the number of VMs and needing to
>> manage
>> >all those VM objects.  iirc, the eight host recommendation came from some
>> >large deployment requirements. If you don't have a need for 100s of VLANs
>> >per pool, or aren't running 100s of VMs per host, you likely will be able
>> >to get more than eight hosts per pool.
>> >
>> >From an operations perspective, I would look closely at your pool size and
>> >ask the question of why you want to such a large pool.  I'd argue having
>> >two pools of five hosts is more efficient in CloudStack than a single pool
>> >of ten hosts, plus if something should happen to one pool, the remaining
>> >pool will continue to be available.  CloudStack is very efficient at
>> >managing resource pools, so many of the reasons traditional server admins
>> >cite for wanting large pool sizes aren't as relevant in CloudStack.
>> >
>> >Of particular note is how you scale. With a ten host pool size, that's
>> your
>> >scalability block size, so as you grow you'll want to increase capacity in
>> >chunks of ten hosts. With a smaller pool size, you'd be able to add
>> >capacity in much smaller chunks.
>> >
>> >-tim
>> >
>> >On Tue, Mar 8, 2016 at 5:24 PM, Ahmad Emneina <aemne...@gmail.com> wrote:
>> >
>> >> IIRC, its just a recommendation. I think it stemmed from performance
>> >> impact, due to numerous VLAN's present, in environments with lots of
>> >> tenants.
>> >>
>> >> On Tue, Mar 8, 2016 at 2:10 PM, Yiping Zhang <yzh...@marketo.com>
>> wrote:
>> >>
>> >> > Hi, all:
>> >> >
>> >> > The CloudStack doc recommends that for XenServer, do not put more
>> than 8
>> >> > hosts in a cluster, while the Citrix XenServer doc says that XenServer
>> >> 6.5
>> >> > can natively support 16 hosts in a cluster (resource pool).
>> >> >
>> >> > I am wondering why CloudStack is recommending a smaller cluster size
>> than
>> >> > that XenServer can natively support?  If I create a cluster with 10
>> >> > XenServers, what could go wrong for me ?  Has any one tried with CS
>> >> cluster
>> >> > with >8 XenServer hosts ?
>> >> >
>> >> > My environment is CS 4.5.1 (soon to be upgraded to 4.8.0) on RHEL 6.7
>> and
>> >> > XenServer 6.5, using NetApp volumes for both primary and secondary
>> >> storages.
>> >> >
>> >> > Yiping
>> >> >
>> >>
>>


Re: XenServer cluster size

2016-03-09 Thread Yiping Zhang
Hi, Tim:

Thanks for very detailed reply.

These are Gen8 HP blades and all my new servers will be Gen9.  That’s why I’d 
like to combine them into one maxed out cluster.  I have only two guest VLAN’s 
and roughly 400 VM instances for this 10 hosts cluster.  So I think performance 
wise I should be OK.

Yiping



On 3/8/16, 4:28 PM, "Tim Mackey" <tmac...@gmail.com> wrote:

>Yiping,
>
>Here's the detailed answer 
>
>From the XenServer perspective, there are a number of factors which go into
>how various configuration limits are arrived at. Most of the time, they
>aren't hard limits (for example I know of users with more than 16 hosts in
>a pool). What the XenServer team do is for a given metric they determine
>the point at which overall scalability is reduced to a target threshold.
>That then becomes the "configuration limit" for a given release, and we
>retest with every version.
>
>In the case of the "hosts per pool" limit, we need to ensure that all
>operations we have can be performed without impairment with a given number
>of hosts in a pool. We've kept the same maximum number of hosts in a pool
>for a very long time (close to ten years so far), and that's a direct
>reflection of how much we've increased individual host scalability.
>
>From a CloudStack perspective, there have been a number of serious scale
>limits which have pushed XenServer. Hundreds of VLANs is one example that
>Ahmad cites, but its also a case of the number of VMs and needing to manage
>all those VM objects.  iirc, the eight host recommendation came from some
>large deployment requirements. If you don't have a need for 100s of VLANs
>per pool, or aren't running 100s of VMs per host, you likely will be able
>to get more than eight hosts per pool.
>
>From an operations perspective, I would look closely at your pool size and
>ask the question of why you want to such a large pool.  I'd argue having
>two pools of five hosts is more efficient in CloudStack than a single pool
>of ten hosts, plus if something should happen to one pool, the remaining
>pool will continue to be available.  CloudStack is very efficient at
>managing resource pools, so many of the reasons traditional server admins
>cite for wanting large pool sizes aren't as relevant in CloudStack.
>
>Of particular note is how you scale. With a ten host pool size, that's your
>scalability block size, so as you grow you'll want to increase capacity in
>chunks of ten hosts. With a smaller pool size, you'd be able to add
>capacity in much smaller chunks.
>
>-tim
>
>On Tue, Mar 8, 2016 at 5:24 PM, Ahmad Emneina <aemne...@gmail.com> wrote:
>
>> IIRC, its just a recommendation. I think it stemmed from performance
>> impact, due to numerous VLAN's present, in environments with lots of
>> tenants.
>>
>> On Tue, Mar 8, 2016 at 2:10 PM, Yiping Zhang <yzh...@marketo.com> wrote:
>>
>> > Hi, all:
>> >
>> > The CloudStack doc recommends that for XenServer, do not put more than 8
>> > hosts in a cluster, while the Citrix XenServer doc says that XenServer
>> 6.5
>> > can natively support 16 hosts in a cluster (resource pool).
>> >
>> > I am wondering why CloudStack is recommending a smaller cluster size than
>> > that XenServer can natively support?  If I create a cluster with 10
>> > XenServers, what could go wrong for me ?  Has any one tried with CS
>> cluster
>> > with >8 XenServer hosts ?
>> >
>> > My environment is CS 4.5.1 (soon to be upgraded to 4.8.0) on RHEL 6.7 and
>> > XenServer 6.5, using NetApp volumes for both primary and secondary
>> storages.
>> >
>> > Yiping
>> >
>>


working with over provisioning factors

2016-03-02 Thread Yiping Zhang
Hi,

I have to change the global and cluster setting memory.overprovisioning.factor 
for one of clusters with hundreds of running VM instances.  Now, in order for 
the changes to take effect, I need to restart all running instances.  I am 
wondering is there a way to update all running VM instances to reflect updated 
memory allocations without restarting every single VM instances?

While I am on the subject of restarting VM instances,  I went into data base to 
get their POWER_STATE_UPDATE_TIME value from vm_instance table directly. I 
noticed that this value is not consistently updated for those VM instances I 
restarted (either in UI or via API calls  with stop followed by start actions). 
 Is this a bug ?

Yiping


Re: R: A Story of a Failed XenServer Upgrade

2016-01-08 Thread Yiping Zhang
Hm  I definitely had included the attachment (as I can see from my Outlook’s 
sent folder that the message has an attachment).

I am wondering if the mailing list requires any special privilege to send 
messages with attachment. Does anyone know ?  I can forward the doc to someone 
who has the power to resend it to the list if anyone volunteers to do so.

To answer Nux!’s suggestion of doing a blog, but I am old fashioned guy that I 
have not tried that yet :)

Yiping




On 1/8/16, 10:35 AM, "Davide Pala" <davide.p...@gesca.it> wrote:

>I think you've forget the attachment ...
>
>
>
>Inviato dal mio dispositivo Samsung
>
>
>---- Messaggio originale 
>Da: Yiping Zhang <yzh...@marketo.com>
>Data: 08/01/2016 18:44 (GMT+01:00)
>A: users@cloudstack.apache.org
>Oggetto: Re: A Story of a Failed XenServer Upgrade
>
>
>See attached pdf document. This is the final procedure we adopted after 
>upgrading seven XenServer pools.
>
>Yiping
>
>
>
>
>
>On 1/8/16, 2:20 AM, "Alessandro Caviglione" <c.alessan...@gmail.com> wrote:
>
>>Hi Yiping,
>>yes, thank you very much!!
>>Please share the doc so I can try again the upgrade process and see if it
>>was only a "unfortunate coincidence of events" or a wrong upgrade process.
>>
>>Thanks!
>>
>>On Fri, Jan 8, 2016 at 10:20 AM, Nux! <n...@li.nux.ro> wrote:
>>
>>> Yiping,
>>>
>>> Why not make a blog post about it so everyone can benefit? :)
>>>
>>> Lucian
>>>
>>> --
>>> Sent from the Delta quadrant using Borg technology!
>>>
>>> Nux!
>>> www.nux.ro
>>>
>>> - Original Message -
>>> > From: "Yiping Zhang" <yzh...@marketo.com>
>>> > To: users@cloudstack.apache.org, aemne...@gmail.com
>>> > Sent: Friday, 8 January, 2016 01:31:21
>>> > Subject: Re: A Story of a Failed XenServer Upgrade
>>>
>>> > Hi, Alessandro
>>> >
>>> > Late to the thread.  Is this still an issue for you ?
>>> >
>>> > I went thru this process before and I have a step by step document that
>>> I can
>>> > share if you still need it.
>>> >
>>> > Yiping
>>> >
>>> >
>>> >
>>> >
>>> > On 1/2/16, 4:43 PM, "Ahmad Emneina" <aemne...@gmail.com> wrote:
>>> >
>>> >>Hi Alessandro,
>>> >>Without seeing the logs, or DB, it will be hard to diagnose the issue.
>>> I've
>>> >>seen something similar in the past, where the XenServer host version isnt
>>> >>getting updated in the DB, as part of the XS upgrade process. That caused
>>> >>CloudStack to use the wrong hypervisor resource to try connecting back to
>>> >>the XenServers... ending up in failure. If you could share sanitized
>>> >>versions of your log and db, someone here might be able to give you the
>>> >>necessary steps to get your cluster back under CloudStack control.
>>> >>
>>> >>On Sat, Jan 2, 2016 at 1:27 PM, Alessandro Caviglione <
>>> >>c.alessan...@gmail.com> wrote:
>>> >>
>>> >>> No guys,as the article wrote, my first action was to put in Maintenance
>>> >>> Mode the Pool Master INSIDE CS; "It is vital that you upgrade the
>>> XenServer
>>> >>> Pool Master first before any of the Slaves.  To do so you need to
>>> empty the
>>> >>> Pool Master of all CloudStack VMs, and you do this by putting the Host
>>> into
>>> >>> Maintenance Mode within CloudStack to trigger a live migration of all
>>> VMs
>>> >>> to alternate Hosts"
>>> >>>
>>> >>> This is exactly what I've done and after the XS upgrade, no hosts was
>>> able
>>> >>> to communicate with CS and also with the upgraded host.
>>> >>>
>>> >>> Putting an host in Maint Mode within CS will trigger MM also on
>>> XenServer
>>> >>> host or just will move the VMs to other hosts?
>>> >>>
>>> >>> And again what's the best practices to upgrade a XS cluster?
>>> >>>
>>> >>> On Sat, Jan 2, 2016 at 7:11 PM, Remi Bergsma <
>>> rberg...@schubergphilis.com>
>>> >>> wrote:
>>> >>>
>>> >>> > CloudStack should always do the migration of VM'

Re: A Story of a Failed XenServer Upgrade

2016-01-08 Thread Yiping Zhang

See attached pdf document. This is the final procedure we adopted after 
upgrading seven XenServer pools.

Yiping





On 1/8/16, 2:20 AM, "Alessandro Caviglione" <c.alessan...@gmail.com> wrote:

>Hi Yiping,
>yes, thank you very much!!
>Please share the doc so I can try again the upgrade process and see if it
>was only a "unfortunate coincidence of events" or a wrong upgrade process.
>
>Thanks!
>
>On Fri, Jan 8, 2016 at 10:20 AM, Nux! <n...@li.nux.ro> wrote:
>
>> Yiping,
>>
>> Why not make a blog post about it so everyone can benefit? :)
>>
>> Lucian
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> - Original Message -
>> > From: "Yiping Zhang" <yzh...@marketo.com>
>> > To: users@cloudstack.apache.org, aemne...@gmail.com
>> > Sent: Friday, 8 January, 2016 01:31:21
>> > Subject: Re: A Story of a Failed XenServer Upgrade
>>
>> > Hi, Alessandro
>> >
>> > Late to the thread.  Is this still an issue for you ?
>> >
>> > I went thru this process before and I have a step by step document that
>> I can
>> > share if you still need it.
>> >
>> > Yiping
>> >
>> >
>> >
>> >
>> > On 1/2/16, 4:43 PM, "Ahmad Emneina" <aemne...@gmail.com> wrote:
>> >
>> >>Hi Alessandro,
>> >>Without seeing the logs, or DB, it will be hard to diagnose the issue.
>> I've
>> >>seen something similar in the past, where the XenServer host version isnt
>> >>getting updated in the DB, as part of the XS upgrade process. That caused
>> >>CloudStack to use the wrong hypervisor resource to try connecting back to
>> >>the XenServers... ending up in failure. If you could share sanitized
>> >>versions of your log and db, someone here might be able to give you the
>> >>necessary steps to get your cluster back under CloudStack control.
>> >>
>> >>On Sat, Jan 2, 2016 at 1:27 PM, Alessandro Caviglione <
>> >>c.alessan...@gmail.com> wrote:
>> >>
>> >>> No guys,as the article wrote, my first action was to put in Maintenance
>> >>> Mode the Pool Master INSIDE CS; "It is vital that you upgrade the
>> XenServer
>> >>> Pool Master first before any of the Slaves.  To do so you need to
>> empty the
>> >>> Pool Master of all CloudStack VMs, and you do this by putting the Host
>> into
>> >>> Maintenance Mode within CloudStack to trigger a live migration of all
>> VMs
>> >>> to alternate Hosts"
>> >>>
>> >>> This is exactly what I've done and after the XS upgrade, no hosts was
>> able
>> >>> to communicate with CS and also with the upgraded host.
>> >>>
>> >>> Putting an host in Maint Mode within CS will trigger MM also on
>> XenServer
>> >>> host or just will move the VMs to other hosts?
>> >>>
>> >>> And again what's the best practices to upgrade a XS cluster?
>> >>>
>> >>> On Sat, Jan 2, 2016 at 7:11 PM, Remi Bergsma <
>> rberg...@schubergphilis.com>
>> >>> wrote:
>> >>>
>> >>> > CloudStack should always do the migration of VM's not the Hypervisor.
>> >>> >
>> >>> > That's not true. You can safely migrate outside of CloudStack as the
>> >>> power
>> >>> > report will tell CloudStack where the vms live and the db gets
>> updated
>> >>> > accordingly. I do this a lot while patching and that works fine on
>> 6.2
>> >>> and
>> >>> > 6.5. I use both CloudStack 4.4.4 and 4.7.0.
>> >>> >
>> >>> > Regards, Remi
>> >>> >
>> >>> >
>> >>> > Sent from my iPhone
>> >>> >
>> >>> > On 02 Jan 2016, at 16:26, Jeremy Peterson <jpeter...@acentek.net
>> > >>> > jpeter...@acentek.net>> wrote:
>> >>> >
>> >>> > I don't use XenServer maintenance mode until after CloudStack has
>> put the
>> >>> > Host in maintenance mode.
>> >>> >
>> >>> > When you initiate maintenance mode from the host rather than
>> CloudStack
>> >>> > the db does not know where the VM's are and your UUID's get jacked.
>>

Re: R: A Story of a Failed XenServer Upgrade

2016-01-08 Thread Yiping Zhang
Since I can’t use attachment,  I just include the doc in the message. 
Hopefully, the indentations and formats will go through properly.

Yiping

--
XenServer pool manual upgrade from 6.2 to 6.5 using ISO
Reference article for upgrading XenServer pool used for Cloudstack

http://www.shapeblue.com/how-to-upgrade-an-apache-cloudstack-citrix-xenserver-cluster


Manual upgrade to XenServer from 6.2 to 6.5 using ISO


On CloudStack Management server

  *   Edit the file /etc/cloudstack/managment/environment.properties to include 
the following line at the end:
 *   manage.xenserver.pool.master=false
  *   Restart cloudstack-management service
 *   service cloudstack-management restart

Pre-upgrade steps

  *   Disable XenServer pool HA from XenCenter or CLI
  *   Backup XenServer resource pool configurations
 *   Take screen shot for pool network settings in XenCenter
 *   Take notes of Storage Repo mount points and NFS volumes.

Inside CloudStack Web UI

  *   Put pool master host into Maintenance (CLOUDSTACK ONLY!). This should 
migrate all VM instances currently running on the pool master onto other hosts
  *   Unmanage the cluster.  This should make the hypervisors show as 
disconnected in the UI.  PLEASE MAKE SURE THAT YOU CLICK "Unmanage Cluster", 
NOT "Disable Cluster"!!!

On Pool Master

  *   Connect to physical console using DRAC/iLO/equivalent
  *   Attach XenServer 6.5 ISO as virtual DVD
  *   Verify that this host uses Legacy BIOS rather than UEFI, as UEFI is NOT 
supported by XenServer.  (May not be needed, as XS 6.2 also requires Legacy 
BIOS)
  *   Reboot physical server
  *   Once the server boots up off DVD image:
Note: we used an answer file to allow automated upgrade.  You can just manually 
do the upgrade
 *   At the first prompt, hit F2 to get the advanced menu, which won't time 
out quickly
 *   Type:
*   menu.c32
 *   Hit Enter, then hit the Tab key.
 *   You will be presented with a boot line that looks similar to the 
following:
*   mboot.c32 /boot/xen.gz dom0_max_vcpus=2 dom0_mem=752M 
com1=115200,8n1 console=com1,vga — /boot/vmlinuz xencons=hvc console=hvc0 
console=tty0 — /install.img
 *   You will need to edit this line to include the bold parts.  Nothing 
else in this line needs to be changed:
*   mboot.c32 /boot/xen.gz dom0_max_vcpus=2 dom0_mem=752M 
com1=115200,8n1 console=com1,vga — /boot/vmlinuz xencons=hvc console=hvc0 
console=tty0 answerfile=http://server_ip/path/to/answer-file.xml — /install.img
 *   Hit Enter and watch the system upgrade itself.
 *   Once complete, eject the DVD image and reboot the host.

Verify network and storage settings of upgraded host (in XenCenter):

  *   Configure networks if necessary, just in case any additional NIC's need 
to be configured
  *   Repair the HA SR, if necessary

Inside CloudStack Web UI

  *   Re-manage the cluster
  *   Wait for all hosts to be in the Up state (except the pool master, which 
will stay as "disconnected")
  *   Wait for all SR are connected and online
  *   Take the pool master out of Maintenance mode

On the CloudStack management server in a terminal window

  *   Undo the change in /etc/cloudstack/management/environment.properties file
  *   Restart cloudstack-management service and wait for the dust to settle on 
all servers/storage being visible. All hosts should be in Up state, including 
the pool master.

Now for each slave hosts:

In CloudStack web UI

  *   Put slave host into Maintenance mode (CLOUDSTACK ONLY!), in order to 
evacuate all instances running on this host
  *   On physical console, follow the same steps to attach DVD image and do 
upgrade as for pool master node
  *   Perform the eject, reboot, and and host verification as listed above
  *   Verify the networks and that it has properly rejoined the Xenserver Pool 
(in XenCenter)
  *   Take the host out of Maintenance mode in the CloudStack UI

Post-operation steps for each hosts:

  *   Confirm XenCenter Licensing
 *   You may need to remove the license from the cluster before applying a 
new one
  *   Enable HA

You are done !


Re: A Story of a Failed XenServer Upgrade

2016-01-07 Thread Yiping Zhang
Hi, Alessandro

Late to the thread.  Is this still an issue for you ?

I went thru this process before and I have a step by step document that I can 
share if you still need it.

Yiping




On 1/2/16, 4:43 PM, "Ahmad Emneina"  wrote:

>Hi Alessandro,
>Without seeing the logs, or DB, it will be hard to diagnose the issue. I've
>seen something similar in the past, where the XenServer host version isnt
>getting updated in the DB, as part of the XS upgrade process. That caused
>CloudStack to use the wrong hypervisor resource to try connecting back to
>the XenServers... ending up in failure. If you could share sanitized
>versions of your log and db, someone here might be able to give you the
>necessary steps to get your cluster back under CloudStack control.
>
>On Sat, Jan 2, 2016 at 1:27 PM, Alessandro Caviglione <
>c.alessan...@gmail.com> wrote:
>
>> No guys,as the article wrote, my first action was to put in Maintenance
>> Mode the Pool Master INSIDE CS; "It is vital that you upgrade the XenServer
>> Pool Master first before any of the Slaves.  To do so you need to empty the
>> Pool Master of all CloudStack VMs, and you do this by putting the Host into
>> Maintenance Mode within CloudStack to trigger a live migration of all VMs
>> to alternate Hosts"
>>
>> This is exactly what I've done and after the XS upgrade, no hosts was able
>> to communicate with CS and also with the upgraded host.
>>
>> Putting an host in Maint Mode within CS will trigger MM also on XenServer
>> host or just will move the VMs to other hosts?
>>
>> And again what's the best practices to upgrade a XS cluster?
>>
>> On Sat, Jan 2, 2016 at 7:11 PM, Remi Bergsma 
>> wrote:
>>
>> > CloudStack should always do the migration of VM's not the Hypervisor.
>> >
>> > That's not true. You can safely migrate outside of CloudStack as the
>> power
>> > report will tell CloudStack where the vms live and the db gets updated
>> > accordingly. I do this a lot while patching and that works fine on 6.2
>> and
>> > 6.5. I use both CloudStack 4.4.4 and 4.7.0.
>> >
>> > Regards, Remi
>> >
>> >
>> > Sent from my iPhone
>> >
>> > On 02 Jan 2016, at 16:26, Jeremy Peterson  > jpeter...@acentek.net>> wrote:
>> >
>> > I don't use XenServer maintenance mode until after CloudStack has put the
>> > Host in maintenance mode.
>> >
>> > When you initiate maintenance mode from the host rather than CloudStack
>> > the db does not know where the VM's are and your UUID's get jacked.
>> >
>> > CS is your brains not the hypervisor.
>> >
>> > Maintenance in CS.  All VM's will migrate.  Maintenance in XenCenter.
>> > Upgrade.  Reboot.  Join Pool.  Remove Maintenance starting at hypervisor
>> if
>> > needed and then CS and move on to the next Host.
>> >
>> > CloudStack should always do the migration of VM's not the Hypervisor.
>> >
>> > Jeremy
>> >
>> >
>> > -Original Message-
>> > From: Davide Pala [mailto:davide.p...@gesca.it]
>> > Sent: Friday, January 1, 2016 5:18 PM
>> > To: users@cloudstack.apache.org
>> > Subject: R: A Story of a Failed XenServer Upgrade
>> >
>> > Hi alessandro. If u put in maintenance mode the master you force the
>> > election of a new pool master. Now when you have see the upgraded host as
>> > disconnected you are connected to the new pool master and the host (as a
>> > pool member) cannot comunicate with a pool master of an earliest version.
>> > The solution? Launche the upgrade on the pool master without enter in
>> > maintenance mode. And remember a consistent backup!!!
>> >
>> >
>> >
>> > Inviato dal mio dispositivo Samsung
>> >
>> >
>> >  Messaggio originale 
>> > Da: Alessandro Caviglione  > c.alessan...@gmail.com>>
>> > Data: 01/01/2016 23:23 (GMT+01:00)
>> > A: users@cloudstack.apache.org
>> > Oggetto: A Story of a Failed XenServer Upgrade
>> >
>> > Hi guys,
>> > I want to share my XenServer Upgrade adventure to understand if I did
>> > domething wrong.
>> > I upgraded CS from 4.4.4 to 4.5.2 without any issues, after all the VRs
>> > has been upgraded I start the upgrade process of my XenServer hosts from
>> > 6.2 to 6.5.
>> > I do not already have PoolHA enabled so I followed this article:
>> >
>> >
>> http://www.shapeblue.com/how-to-upgrade-an-apache-cloudstack-citrix-xenserver-cluster/
>> >
>> > The cluster consists of n° 3 XenServer hosts.
>> >
>> > First of all I added manage.xenserver.pool.master=false
>> > to environment.properties file and restarted cloudstack-management
>> service.
>> >
>> > After that I put in Maintenance Mode Pool Master host and, after all VMs
>> > has been migrated, I Unmanaged the cluster.
>> > At this point all host appears as "Disconnected" from CS interface and
>> > this should be right.
>> > Now I put XenServer 6.5 CD in the host in Maintenance Mode and start a
>> > in-place upgrade.
>> > After XS6.5 has been 

Re: SSVM and CPVM

2015-11-24 Thread Yiping Zhang
Which version of OS and java are you using ?

We had encountered a similar error before.  AFAICT, we were running RHEL 6.5 
and CS 4.3.2 at the time on a newly setup management server.  The java version 
installed was IBM version when we saw this error.  Once we changed java package 
back to java-1.7.0-openjdk, everything worked without making any other changes.

Good luck.

Yiping




On 11/23/15, 11:21 PM, "kotipalli venkatesh"  
wrote:

>Hi All,
>
>The above solution how can re-solve please help me guys.
>
>Regards,
>Venkatesh.k
>
>On Tue, Nov 24, 2015 at 12:49 PM, kotipalli venkatesh <
>venkateshcloudt...@gmail.com> wrote:
>
>> Hi All,
>>
>> 2014-06-16 08:08:01,108 INFO  [utils.nio.NioClient] (Agent-Selector:null)
>> Connecting to 10.102.192.247:8250
>>
>> 2014-06-16 08:08:01,872 ERROR [utils.nio.NioConnection]
>> (Agent-Selector:null) Unable to initialize the threads.
>>
>> java.io.IOException: SSL: Fail to init SSL! java.io.IOException:
>> Connection closed with -1 on reading size.
>>
>> at com.cloud.utils.nio.NioClient.init(NioClient.java:84)
>>
>> at com.cloud.utils.nio.NioConnection.run(NioConnection.java:108)
>>
>> at java.lang.Thread.run(Thread.java:701)
>>
>> *Solution* : -
>>
>>
>>
>> 1. rm
>> ./client/target/cloud-client-ui-4.3.0.0/WEB-INF/classes/cloudmanagementserver.keystore
>> ./client/target/conf/cloudmanagementserver.keystore
>> ./client/target/generated-webapp/WEB-INF/classes/cloudmanagementserver.keystore
>>
>>
>>
>> 2. remove root entry from cloud.keystore;
>>
>>
>>
>> 3. remove ssl.keystore from cloud.configuration where description like
>> '%key%';
>>
>>
>>
>> 4. restart MS agent in ssvm
>>
>>
>> Regards,
>>
>> Venkatesh.k
>>
>>
>> On Tue, Nov 24, 2015 at 12:44 PM, Yan Bai  wrote:
>>
>>> By the way, there is a similar error mention in document below:
>>>
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM,+templates,+Secondary+storage+troubleshooting
>>>
>>> Seems this error is caused by corrupted keystore. I tried the solution in
>>> the document but did not solve this issue.
>>>
>>> On Tue, Nov 24, 2015 at 1:11 AM, Yan Bai  wrote:
>>>
>>> > I got same error when I deployed CloudStack 4.5 on CentOS 6.5 with yum
>>> > repository(http://cloudstack.apt-get.eu/). I investigated this issue a
>>> > lot online but I failed to find a solution. Then I tried to deploy
>>> > CloudStack 4.3 on Ubuntu then I did not see this error and the system
>>> > running well.
>>> >
>>> > -Yan
>>> >
>>> > On Tue, Nov 24, 2015 at 1:01 AM, kotipalli venkatesh <
>>> > venkateshcloudt...@gmail.com> wrote:
>>> >
>>> >> Hi All,
>>> >>
>>> >> I have successful installed CS 4.6.0 and hypervisor is Xenserver 6.5
>>> >>  Agent state is  not up on both VM's (CPVM and SSVM)
>>> >>
>>> >> Please find the below SSVM log error :
>>> >>
>>> >> 2015-11-24 06:43:48,995 DEBUG [utils.script.Script] (main:null) Looking
>>> >> for
>>> >> createvolume.sh in the classpath
>>> >> 2015-11-24 06:43:48,995 DEBUG [utils.script.Script] (main:null) System
>>> >> resource: null
>>> >> 2015-11-24 06:43:48,995 DEBUG [utils.script.Script] (main:null)
>>> Classpath
>>> >> resource:
>>> >>
>>> file:/usr/local/cloud/systemvm/scripts/storage/secondary/createvolume.sh
>>> >> 2015-11-24 06:43:48,995 DEBUG [utils.script.Script] (main:null)
>>> Absolute
>>> >> path =
>>> >> /usr/local/cloud/systemvm/scripts/storage/secondary/createvolume.sh
>>> >> 2015-11-24 06:43:48,995 INFO  [storage.template.DownloadManagerImpl]
>>> >> (main:null) createvolume.sh found in
>>> >> /usr/local/cloud/systemvm/scripts/storage/secondary/createvolume.sh
>>> >> 2015-11-24 06:43:49,002 INFO  [storage.template.UploadManagerImpl]
>>> >> (main:null) UploadManager: starting additional services since we are
>>> >> inside
>>> >> system vm
>>> >> 2015-11-24 06:43:49,357 INFO  [cloud.serializer.GsonHelper] (main:null)
>>> >> Default Builder inited.
>>> >> 2015-11-24 06:43:49,367 DEBUG [cloud.agent.Agent] (main:null) Adding
>>> >> shutdown hook
>>> >> 2015-11-24 06:43:49,368 INFO  [cloud.agent.Agent] (main:null) Agent
>>> [id =
>>> >> new : type = PremiumSecondaryStorageResource : zone = 2 : pod = 2 :
>>> >> workers
>>> >> = 5 : host = 10.0.0.170 : port = 8250
>>> >> 2015-11-24 06:43:49,379 INFO  [utils.nio.NioClient] (main:null)
>>> Connecting
>>> >> to 10.0.0.170:8250
>>> >> 2015-11-24 06:44:49,607 *ERROR [utils.nio.NioConnection] (main:null)
>>> >> Unable
>>> >> to initialize the threads.*
>>> >> java.io.IOException: Connection closed with -1 on reading size.
>>> >> at com.cloud.utils.nio.Link.doHandshake(Link.java:513)
>>> >> at com.cloud.utils.nio.NioClient.init(NioClient.java:80)
>>> >> at
>>> com.cloud.utils.nio.NioConnection.start(NioConnection.java:88)
>>> >> at com.cloud.agent.Agent.start(Agent.java:227)
>>> >> at com.cloud.agent.AgentShell.launchAgent(AgentShell.java:399)
>>> >> at
>>> >>
>>> 

Re: [Proposal] Template for CloudStack API Reference Pages

2015-11-11 Thread Yiping Zhang
As a user who uses API a lot,  I would like to see following improvements in 
api reference pages:

1) In brief description for Title section, please specify if the referenced API 
is async or not.  Currently, this info is available only on the API listing 
pages with “(A)” after the api name, but not available or obvious anywhere on 
the api reference page itself. 

2)  For each parameter, in addition to , ,  
attributes, it would be great to also provide following:
 := integer | string | array | enumerate | boolean etc
 := true | false | null | 0 etc

A Notes subsection for parameters: IMHO, there are several reasons that 
such a section will be useful:
* A list of values which have special meaning to the api and what are 
their special meanings, if any.  For example, for listVirtualMachines api, 
projectid=-1 would return instances belonging to ALL projects.  Here value “-1” 
is special.
* combination of certain parameters are mutually exclusive, or are 
required.  Some of this info are currently present in the parameter’s 
description field. But they are usually too brief, hard to read and hard to 
understand.


3) Add a limitations section:
   This section describes scenarios where the referenced API does not apply to, 
or not implemented yet, or known to not work properly.  Many apis have 
limitations and the information is scattered all over places in documents, if 
exists at all. So most often users can only find out by trial and errors.
   
For example,  assignVirtualMachine api has following limitations: 1) does 
not work with VM instances belonging to a project, 2) not implemented for 
Advanced networking with security group enabled.

4) Add an Authorization section or just provide info on the page somewhere: 
describe who can make this api call:  root admin, domain admin, or regular 
users.  Currently, this info is  provided by listing available apis in 
different pages titled “Root admin API”, “domain admin api” and “User api”.  
Personally,  I prefer a separate section on each api’s reference page for this 
info so that it can’t be missed.
   
5)  Error response:  I really like the idea of adding this section to the 
reference page.  Please list both HTTP response code as well as CloudStack 
internal error code and error messages.




Finally, please get some one to proof-read all descriptions.  Some of current 
API document are really hard to understand!

BTW: which release is this proposal targeted for ?

Just my $0.02.

Yiping


On 11/10/15, 9:10 PM, "Daejuan Jacobs"  wrote:

>I assume by "Format" you mean data type.
>
>But I think this looks good. It's simple, yet it manages to nail all the
>points you need when developing on a software's API.
>
>On Tue, Nov 10, 2015 at 8:33 AM Rajsekhar K 
>wrote:
>
>> Hi, All,
>>
>> This is the proposal for a new template for CloudStack API reference
>> pages. This template is based on the reference page templates for REST APIs.
>>
>> Please find attached the following documents for your review:
>>
>>- Template for normal and asynchronous CloudStack API references.
>>- Sample API reference page using the template for a CloudStack API
>>(listZones).
>>
>>
>> Please review this template and let me know your thoughts on this.
>>
>> Thanks,
>> Rajsekhar
>>


Re: problem assigning an instance to a different account in a different domain

2015-11-09 Thread Yiping Zhang
The api assignVirtualMachine is for this exact purpose. It seems even to be 
able to move virtual machines into different zones (I never tried this, since I 
have only one zone)

However, there are some limitations:  it does not support advanced networking 
with SecurityGroup enabled.  Since I did not have to move instances to 
different networks,  I just ignored the error. The domain and account are 
assigned properly.

Yiping




On 11/9/15, 12:49 PM, "Stephan Seitz"  
wrote:

>Hi there!
>
>If anyone knows, how to get an instance assigned to a different account
>in a different domain, I'ld be very happy :) Even if it has to be done
>via DB manipulation...
>
>Just if... :)
>
>Stephan
>
>
>Am Mittwoch, den 04.11.2015, 15:59 +0100 schrieb Stephan Seitz:
>> Hi!
>> 
>> I'm trying to assign instances to a different account in a different
>> domain. Currently with no success.
>> 
>> The particular instances have been deployed by the initial "admin"
>> account in the ROOT domain, and should be assigned to a domain-admin
>> account.
>> 
>> id = 0d7a4ee7-5c6f-11e5-a590-3400a30d0aba <--- current domain
>> path = ROOT
>> =
>> ===
>> id = 4298cfba-aa4d-4baa-8b0e-53e70d0ebbe5 <--- destination domain
>> path = ROOT//yyy
>> 
>> 
>> id = 4b143f31-5c6f-11e5-a590-3400a30d0aba <-- current user in ROOT
>> account = admin
>> accountid = 4b14365a-5c6f-11e5-a590-3400a30d0aba
>> =
>> ===
>> id = 54e79c7a-f3de-4b76-8c99-ffc18c555f5d <-- dest. user in dest.
>> domain
>> account = zzz@yy
>> accountid = 76ec77a0-e0ca-459e-b211-eeacce52055c
>> 
>> 
>> With cloudmonkey (logged in as the admin in ROOT), I got following
>> result:
>> 
>> (local)  > assign virtualmachine
>> virtualmachineid=9b76aa5a-f97f-4bd0-8e9d-350816e42515
>> domainid=4298cfba-aa4d-4baa-8b0e-53e70d0ebbe5 
>> account=zzz@yy
>> Error 530: Failed to move vm
>> Acct[76ec77a0-e0ca-459e-b211-eeacce52055c-zzz@yy] does
>> not
>> have permission to operate within domain
>> id=0d7a4ee7-5c6f-11e5-a590-3400a30d0aba
>> cserrorcode = 
>> errorcode = 530
>> errortext = Failed to move vm
>> Acct[76ec77a0-e0ca-459e-b211-eeacce52055c-zzz@yy] does
>> not
>> have permission to operate within domain
>> id=0d7a4ee7-5c6f-11e5-a590-3400a30d0aba
>> 
>> 
>> This looks like, the destination user, who is domain-admin of it's
>> domain needs to have access to the ROOT domain. I think this makes no
>> sense, since I wan't to assign the instance TO it.
>> 
>> Could someone please shed some light how to assign an instance to
>> another user in another domain?
>> 
>> Thanks in advance!
>> 
>> Stephan
>> 
>> 


Re: problem assigning an instance to a different account in a different domain

2015-11-09 Thread Yiping Zhang
Yes, that ’s the other limitation of the api which I wanted to mention, but got 
distracted and forgot.




On 11/9/15, 2:05 PM, "Rafael Weingärtner" <rafaelweingart...@gmail.com> wrote:

>I think that method does not work if the VM is assigned to a Project. I
>remember that I tried to use that method once, but that did not work and I
>had to do the change manually.
>
>On Mon, Nov 9, 2015 at 7:37 PM, Yiping Zhang <yzh...@marketo.com> wrote:
>
>> The api assignVirtualMachine is for this exact purpose. It seems even to
>> be able to move virtual machines into different zones (I never tried this,
>> since I have only one zone)
>>
>> However, there are some limitations:  it does not support advanced
>> networking with SecurityGroup enabled.  Since I did not have to move
>> instances to different networks,  I just ignored the error. The domain and
>> account are assigned properly.
>>
>> Yiping
>>
>>
>>
>>
>> On 11/9/15, 12:49 PM, "Stephan Seitz" <s.se...@secretresearchfacility.com>
>> wrote:
>>
>> >Hi there!
>> >
>> >If anyone knows, how to get an instance assigned to a different account
>> >in a different domain, I'ld be very happy :) Even if it has to be done
>> >via DB manipulation...
>> >
>> >Just if... :)
>> >
>> >Stephan
>> >
>> >
>> >Am Mittwoch, den 04.11.2015, 15:59 +0100 schrieb Stephan Seitz:
>> >> Hi!
>> >>
>> >> I'm trying to assign instances to a different account in a different
>> >> domain. Currently with no success.
>> >>
>> >> The particular instances have been deployed by the initial "admin"
>> >> account in the ROOT domain, and should be assigned to a domain-admin
>> >> account.
>> >>
>> >> id = 0d7a4ee7-5c6f-11e5-a590-3400a30d0aba <--- current domain
>> >> path = ROOT
>> >> =
>> >> ===
>> >> id = 4298cfba-aa4d-4baa-8b0e-53e70d0ebbe5 <--- destination domain
>> >> path = ROOT//yyy
>> >>
>> >>
>> >> id = 4b143f31-5c6f-11e5-a590-3400a30d0aba <-- current user in ROOT
>> >> account = admin
>> >> accountid = 4b14365a-5c6f-11e5-a590-3400a30d0aba
>> >> =
>> >> ===
>> >> id = 54e79c7a-f3de-4b76-8c99-ffc18c555f5d <-- dest. user in dest.
>> >> domain
>> >> account = zzz@yy
>> >> accountid = 76ec77a0-e0ca-459e-b211-eeacce52055c
>> >>
>> >>
>> >> With cloudmonkey (logged in as the admin in ROOT), I got following
>> >> result:
>> >>
>> >> (local)  > assign virtualmachine
>> >> virtualmachineid=9b76aa5a-f97f-4bd0-8e9d-350816e42515
>> >> domainid=4298cfba-aa4d-4baa-8b0e-53e70d0ebbe5
>> >> account=zzz@yy
>> >> Error 530: Failed to move vm
>> >> Acct[76ec77a0-e0ca-459e-b211-eeacce52055c-zzz@yy] does
>> >> not
>> >> have permission to operate within domain
>> >> id=0d7a4ee7-5c6f-11e5-a590-3400a30d0aba
>> >> cserrorcode = 
>> >> errorcode = 530
>> >> errortext = Failed to move vm
>> >> Acct[76ec77a0-e0ca-459e-b211-eeacce52055c-zzz@yy] does
>> >> not
>> >> have permission to operate within domain
>> >> id=0d7a4ee7-5c6f-11e5-a590-3400a30d0aba
>> >>
>> >>
>> >> This looks like, the destination user, who is domain-admin of it's
>> >> domain needs to have access to the ROOT domain. I think this makes no
>> >> sense, since I wan't to assign the instance TO it.
>> >>
>> >> Could someone please shed some light how to assign an instance to
>> >> another user in another domain?
>> >>
>> >> Thanks in advance!
>> >>
>> >> Stephan
>> >>
>> >>
>>
>
>
>
>-- 
>Rafael Weingärtner


Re: how to assign an instance to a different project ?

2015-11-04 Thread Yiping Zhang
Ilya:

Would you mind share your sql statement to do this ? I only have a couple of 
instances left to move, so a workaround is acceptable.

Thanks

Yiping




On 11/4/15, 1:07 PM, "ilya" <ilya.mailing.li...@gmail.com> wrote:

>I did this through db update, worked as expected..
>
>
>On 11/3/15 2:05 PM, Somesh Naidu wrote:
>> I believe this functionality does not exist yet. The only to achieve is to 
>> perform manual DB updates but this has not been tested. If you decide to go 
>> with the manual DB update approach, you should take care of network 
>> association for that VM.
>> 
>> Regards,
>> Somesh
>> 
>> -Original Message-
>> From: Yiping Zhang [mailto:yzh...@marketo.com] 
>> Sent: Tuesday, November 03, 2015 2:43 PM
>> To: users@cloudstack.apache.org
>> Subject: how to assign an instance to a different project ?
>> 
>> Hi,
>> 
>> I need to move some instances from project-1 in domain-1 to project-2 in 
>> domain-2. But I can’t find any API to do so.
>> 
>> The API  “assignVirtualMachine” can move instances between accounts (in the 
>> same or different domains), but it does not work with projects. If there is 
>> an API to move an instance from an account to a project, that would work for 
>> me as well.
>> 
>> How would I do this ?
>> 
>> Thanks
>> 
>> Yiping
>> 


how to assign an instance to a different project ?

2015-11-03 Thread Yiping Zhang
Hi,

I need to move some instances from project-1 in domain-1 to project-2 in 
domain-2. But I can’t find any API to do so.

The API  “assignVirtualMachine” can move instances between accounts (in the 
same or different domains), but it does not work with projects. If there is an 
API to move an instance from an account to a project, that would work for me as 
well.

How would I do this ?

Thanks

Yiping


Re: Unable to put host into maintenance mode. Unable to migrate VM

2015-10-30 Thread Yiping Zhang
Some more considerations:

How much free CPU, memory resources are available on your remaining hypervisors 
?  If your XenServer pool resource usages are near to their maximum limits, 
putting one hypervisor into maintenance could very well push the usage over the 
limits for remaining hypervisors/pool.  Check global settings 
cluster.cpu.allocated.capacity.disablethreshold and 
cluster.memory.allocated.capacity.disablethreshold.  I think their default 
value is 0.75 or 75% of total. You can increase these limits, or shutdown some 
non-critical instances running on your remaining hypervisors to see if you can 
migrate more instances off the hypervisor you are patching.

As a last resort, you could just shutdown the remaining instances and reboot 
the hypervisor and restart stopped instances once the hypervisor is back !

Yiping




On 10/30/15, 6:06 AM, "Jeremy Peterson"  wrote:

>I was able to move a couple servers back to Flex-Xen1 and those VM's are 
>running fine.
>
>I am tailing /var/log/SMlog I will try to migrate another VM from Xen2 to Xen1 
>and see what shows up in the logs.
>
>Jeremy
>
>
>-Original Message-
>From: Abhinandan Prateek [mailto:abhinandan.prat...@shapeblue.com] 
>Sent: Friday, October 30, 2015 12:05 AM
>To: users@cloudstack.apache.org
>Subject: Re: Unable to put host into maintenance mode. Unable to migrate VM
>
>The errors are originating on XenServer Flex-Xen1, check the SMlog for clues.
>Also check if the VMs on that host are running fine.
>
>
>> On 30-Oct-2015, at 2:11 AM, Jeremy Peterson  wrote:
>>
>> http://pastebin.com/jpPbsJb4
>>
>> I put a host in maintenance mode to apply XenServer patches and it gets all 
>> but 5 vm's migrated and now if I cancel maintenance mode and try to manually 
>> move from Flex-Xen1 to Flex-Xen2 I get these errors.
>>
>> XenServer 6.5 SP1
>> CloudStack 4.5.0
>>
>> Advanced Networking
>>
>> iSCSI storage LUN's w/4 multipaths for primary FreeNAS NFS for 
>> secondary Jeremy Peterson
>>
>>
>
>Find out more about ShapeBlue and our range of CloudStack related services
>
>IaaS Cloud Design & Build
>CSForge – rapid IaaS deployment framework
>CloudStack Consulting
>CloudStack Software 
>Engineering
>CloudStack Infrastructure 
>Support
>CloudStack Bootcamp Training Courses
>
>This email and any attachments to it may be confidential and are intended 
>solely for the use of the individual to whom it is addressed. Any views or 
>opinions expressed are solely those of the author and do not necessarily 
>represent those of Shape Blue Ltd or related companies. If you are not the 
>intended recipient of this email, you must neither take any action based upon 
>its contents, nor copy or show it to anyone. Please contact the sender if you 
>believe you have received this email in error. Shape Blue Ltd is a company 
>incorporated in England & Wales. ShapeBlue Services India LLP is a company 
>incorporated in India and is operated under license from Shape Blue Ltd. Shape 
>Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is 
>operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company 
>registered by The Republic of South Africa and is traded under license from 
>Shape Blue Ltd. ShapeBlue is a registered trademark.


Re: XenServer is disconnected after CS hosts shutdown

2015-07-29 Thread Yiping Zhang
Well,  sometimes people can’t answer a question because of lack of
relevant information, or simply because no one has encountered a similar
situation before.

Looking at your past messages on this thread, there were no mentions about
primary storage. Obviously, your primary storage configuration had changed
between the time you shut down CS manager and xenservers and the time you
restarted them. That is the vital info the list didn’t know.

To best of my knowledge, zone wide primary storage has never been
supported for Xen hypervisors.

I do have to say that quite often CloudStack error messages are very
cryptic, do not provide enough *useful* information to help users identify
and trouble shoot actual problems. Those stack trace output might be a
gold mine to developers, but they are utterly useless for end users.

Just my $0.02

Yiping

On 7/28/15, 11:19 PM, tony_caot...@163.com tony_caot...@163.com wrote:


Hi, Finally I resolved this problem by my self.

  * Primary Storage: A storage resource typically provided to a single
cluster for the actual running of instance disk images. (Zone-wide
primary storage is an option, though not typically used.)

This line above is from
http://docs.cloudstack.apache.org/en/master/concepts.html

Because I have a Zone-wide primary storage, ACS can not find the correct
primary which belong to XenServer cluster after reboot.

Then I change the Zone-wide primary to cluster-wide, it resolved.

Right now, I have two primary storage, one is kvm cluster-wide, another
is xenserver cluster-wide.

Above is for people who have the same problem oneday.

by the way, I am very curious why I never receive replys from this a big
community ??   of course except the very beginning.

Is my English skill really really poor, result in no body can understood
what language I am speaking ?

---
Cao Tong

On 07/22/2015 09:03 PM, tony_caot...@163.com wrote:

 Hey!  help please...

 some news.
 I think the cause is that the ACS host can't communicate with
 XenServer host.
 ACS continues outputing logs like this

 2015-07-22 20:42:13,555 DEBUG [c.c.a.m.ClusteredAgentAttache]
 (AgentManager-Handler-7:null) Seq 5-8174877748607582212: Forwarding
 Seq 5-8174877748607582212:  { Cmd , MgmtId: 279278805451459, via: 5,
 Ver: v1, Flags: 100111,
 [{com.cloud.agent.api.MaintainCommand:{wait:0}}] } to
280345368052992

 I am not sure that if the ACS status is wrong or some services on
 xenserver are not opend.

 on xenserver , I found *xenheartbeat.sh is not running.*
 *(/bin/bash /opt/cloud/bin/xenheartbeat.sh
 00d8e0d0-8561-4b3d-9044-cbc496ff22cc 120 60)*

 As some operations about xenserver was pending, xenserver can not be
 deleted from web UI.

 I got a temporary solution

 1. delete jobs from DB cloud.vm_work_job.
 2. delete xenserver from DB cloud.host.
 3. add xenserver host back from web UI.

 then it works.

 Does anyone have a idea for this?

 Could anyone tell what things does ACS do on xenserver host when
 adding a xenserver ?

 Thanks,

 ---
 Cao Tong

 On 07/22/2015 04:26 PM, tony_caot...@163.com wrote:

 @prashant, following it the answer of you questions

 1. Yes, primary storage is connected fine for my xenserver.

 2. No, Xenserver's password is not changed.

 3. yes, web UI is fine, and I can login.

 4.  before reboot, I unmanaged and disabled resources,  and after
 reboot I have enabled all of them.

 5.  hosts is states is UP.

 6. No yum update in anywhere.

 7.  system VMs status is fine, i think.

 ---
 Cao Tong

 On 07/22/2015 04:13 PM, tony_caot...@163.com wrote:

 Hi,

 After reinstall, I got the problem again

 So, I will describe once again.

 WHAT my environment looks like:

 I have a ACS server host and a xenserver host, After both reboot, I
 can not create a VM on xenserver through ACS.
 A KVM and A NFS are running together in ACS manager host.

 the status of new VM is always 'staring' on the WEB, but I can
 create new VM using xencenter.

 - ERR LOGS --
 2015-07-22 15:56:56,357 DEBUG [c.c.s.StorageManagerImpl]
 (StatsCollector-3:ctx-1aa2e8c9) Unable to send storage pool command
 to Pool[4|NetworkFilesystem] via 4
 com.cloud.exception.OperationTimedoutException: Commands
 2829104990918803478 to Host 4 timed out after 3600

 2015-07-22 15:56:56,358 INFO  [c.c.s.StatsCollector]
 (StatsCollector-3:ctx-1aa2e8c9) Unable to reach
 Pool[4|NetworkFilesystem]
 com.cloud.exception.StorageUnavailableException: Resource
 [StoragePool:4] is unreachable: Unable to send command to the pool


 - and there are lots of DEBUG infos  --- repeat
 again and again ---

 2015-07-22 15:36:12,887 DEBUG [c.c.a.m.ClusteredAgentAttache]
 (AgentManager-Handler-14:null) Seq 4-8064821032713715922: Forwarding
 Seq 4-8064821032713715922:  { Cmd , MgmtId: 227448510156211, via: 4,
 Ver: v1, Flags: 100111,
 [{com.cloud.agent.api.MaintainCommand:{wait:0}}] } to
 116784073679673
 2015-07-22 15:36:12,889 DEBUG 

Re: number of expected tcp connections to RabbitMQ cluster

2015-07-17 Thread Yiping Zhang
Hi, Rohit:

Thanks for the reply.

We have CS instances connecting to either a single RabbitMQ broker, or a 
cluster of  2/3 brokers thru a load balancer VIP.  The results are always the 
same .  And yes, the messages are published to brokers.

Yiping

From: Rohit Yadav rohit.ya...@shapeblue.commailto:rohit.ya...@shapeblue.com
Reply-To: users@cloudstack.apache.orgmailto:users@cloudstack.apache.org 
users@cloudstack.apache.orgmailto:users@cloudstack.apache.org
Date: Friday, July 17, 2015 at 2:32 AM
To: users@cloudstack.apache.orgmailto:users@cloudstack.apache.org 
users@cloudstack.apache.orgmailto:users@cloudstack.apache.org
Subject: Re: number of expected tcp connections to RabbitMQ cluster


On 17-Jul-2015, at 12:27 am, Yiping Zhang 
yzh...@marketo.commailto:yzh...@marketo.com wrote:

Hi, all:

We are in the process of upgrading our CloudStack from 4.3.2 to 4.5.1.   I just 
noticed that for CS 4.3.2,  there are two TCP connections from 
cloudstack-management java process to RabbitMQ cluster, but for CS 4.5.1 there 
is only one TCP connection to RabbitMQ cluster.

Can someone please confirm that this is correct?  If this is the expected 
behavior,  I need to update our Nagios setup accordingly.

Is your mgmt server publishing to one broker (rabbitmq)? 4.5.1 uses the newer 
rabbitmq client library, so not sure how the communication actually happens at 
the TCP level, from CloudStack’s side we simply make a connection using the 
java client/rabbitmq library. I would say, if you’re able to publish messages 
to it, you don’t need to worry about anything.


Thanks,

Yiping

Regards,
Rohit Yadav
Software Architect, ShapeBlue


[cid:9DD97B41-04C5-45F0-92A7-951F3E962F7A]


M. +91 88 262 30892 | 
rohit.ya...@shapeblue.commailto:rohit.ya...@shapeblue.com
Blog: bhaisaab.orghttp://bhaisaab.org | Twitter: @_bhaisaab




Find out more about ShapeBlue and our range of CloudStack related services

IaaS Cloud Design  Buildhttp://shapeblue.com/iaas-cloud-design-and-build//
CSForge – rapid IaaS deployment frameworkhttp://shapeblue.com/csforge/
CloudStack Consultinghttp://shapeblue.com/cloudstack-consultancy/
CloudStack Software 
Engineeringhttp://shapeblue.com/cloudstack-software-engineering/
CloudStack Infrastructure 
Supporthttp://shapeblue.com/cloudstack-infrastructure-support/
CloudStack Bootcamp Training Courseshttp://shapeblue.com/cloudstack-training/

This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Shape Blue Ltd or related companies. If you are not the 
intended recipient of this email, you must neither take any action based upon 
its contents, nor copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. Shape Blue Ltd is a company 
incorporated in England  Wales. ShapeBlue Services India LLP is a company 
incorporated in India and is operated under license from Shape Blue Ltd. Shape 
Blue Brasil Consultoria Ltda is a company incorporated in Brasil and is 
operated under license from Shape Blue Ltd. ShapeBlue SA Pty Ltd is a company 
registered by The Republic of South Africa and is traded under license from 
Shape Blue Ltd. ShapeBlue is a registered trademark.


number of expected tcp connections to RabbitMQ cluster

2015-07-16 Thread Yiping Zhang
Hi, all:

We are in the process of upgrading our CloudStack from 4.3.2 to 4.5.1.   I just 
noticed that for CS 4.3.2,  there are two TCP connections from 
cloudstack-management java process to RabbitMQ cluster, but for CS 4.5.1 there 
is only one TCP connection to RabbitMQ cluster.

Can someone please confirm that this is correct?  If this is the expected 
behavior,  I need to update our Nagios setup accordingly.

Thanks,

Yiping


[Urgent]: xenserver hosts stuck in alert state after 4.3.2 - 4.5.1 upgrade

2015-07-09 Thread Yiping Zhang
Hi, all:

We just did an upgrade from CS 4.3.2 - 4.5.1.  Our environment is rhel 6 + 
Adv. Zone with SecurtyGroup + XenServer 6.2.

After the upgrade, all xenserver hosts were in alert state and MS can’t connect 
to them with following errors:


INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Host 
10.0.100.25 OpaqueRef:996c575a-ad04-5f3f-cd6d-56b7daa16844: Host 10.0.100.25 is 
already setup.

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Failed to 
configure brige firewall

WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Check host 
10.0.100.25 for CSP is installed or not and check network mode for bridge

WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-13:ctx-b4517020) Unable to 
setup agent 5 due to Failed to configure brige firewall

INFO  [c.c.u.e.CSExceptionErrorCode] (AgentTaskPool-13:ctx-b4517020) Could not 
find exception: com.cloud.exception.ConnectionException in error code list for 
exceptions

WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-13:ctx-b4517020) Monitor 
XcpServerDiscoverer says there is an error in the connect process for 5 due to 
Reinitialize agent after setup.

INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-13:ctx-b4517020) Host 5 is 
disconnecting with event AgentDisconnected

WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-13:ctx-b4517020) Unable to 
connect due to

com.cloud.exception.ConnectionException: Reinitialize agent after setup.

at 
com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer.processConnect(XcpServerDiscoverer.java:621)

at 
com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(AgentManagerImpl.java:539)

at 
com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentManagerImpl.java:1447)

at 
com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1794)

at 
com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManagerImpl.java:1920)

at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)

at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)

at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)

at 
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)

at com.sun.proxy.$Proxy149.createHostAndAgent(Unknown Source)

at 
com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.runInContext(AgentManagerImpl.java:1078)

at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)

at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)

at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

The error implies that it was expecting linux bridge on XenServer and couldn’t 
find it. After checking on XenServer hosts, we have been using openvswitch 
instead of linux bridge as network backend all this time.

So the question comes down to:

 1.  If CS advanced zone with SecurityGroup only supported on XenServer using 
linux bridge backend, why did 4.3.x allow us to work with unsupported 
configurations for so long without any apparent issues, and does CS 4.5.1 now 
enforce this requirement ?
 2.  If we switch network backend from openvswitch to linux bridge, would this 
fix the problem?  We are hoping to avoid this step, as it requires rebooting 
all XenServer hosts and shuffling around hundreds of VM instances.
 3.  Is there any other solutions to make newly upgraded cloudstack 4.5.1 
management server to reconnect to our XenServer hosts ?

Thanks in advance,

Yiping


Re: [Urgent]: xenserver hosts stuck in alert state after 4.3.2 - 4.5.1 upgrade

2015-07-09 Thread Yiping Zhang
Hi, Lucian:

Thanks for the reply. When I said it worked all this time I really meant
that the CS instance worked as expected for what we were doing with it,
not to mean that the SecurityGroup feature worked.

To be honest, we did not really use security group feature per se,  as
this is a private cloud running on our own networks and we picked advanced
networking with SG only to avoid assigning a public network IP range for
guests.  The CS instance went through upgrades from 4.3.0 - 4.3.1 -
4.3.2 without a hiccup, until we tried to upgrade to 4.5.1.

Yiping


On 7/9/15, 12:31 PM, Nux! n...@li.nux.ro wrote:

Hello,

As far as I can tell, Xenserver has always required the network to be in
bridge mode for security groups to work, at least since 4.1 that I've
been playing with. Not sure how exactly it was working in your case ...
did you actually test the firewall rules were doing anything? (Sorry for
dumb question)

Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

- Original Message -
 From: Yiping Zhang yzh...@marketo.com
 To: users@cloudstack.apache.org
 Sent: Thursday, 9 July, 2015 19:22:01
 Subject: [Urgent]:  xenserver hosts stuck in alert state after 4.3.2 -
4.5.1 upgrade

 Hi, all:
 
 We just did an upgrade from CS 4.3.2 - 4.5.1.  Our environment is rhel
6 + Adv.
 Zone with SecurtyGroup + XenServer 6.2.
 
 After the upgrade, all xenserver hosts were in alert state and MS can¹t
connect
 to them with following errors:
 
 
 INFO  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Host
 10.0.100.25 OpaqueRef:996c575a-ad04-5f3f-cd6d-56b7daa16844: Host
10.0.100.25 is
 already setup.
 
 WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce)
Failed to
 configure brige firewall
 
 WARN  [c.c.h.x.r.CitrixResourceBase] (DirectAgent-9:ctx-7fc226ce) Check
host
 10.0.100.25 for CSP is installed or not and check network mode for
bridge
 
 WARN  [c.c.h.x.d.XcpServerDiscoverer] (AgentTaskPool-13:ctx-b4517020)
Unable to
 setup agent 5 due to Failed to configure brige firewall
 
 INFO  [c.c.u.e.CSExceptionErrorCode] (AgentTaskPool-13:ctx-b4517020)
Could not
 find exception: com.cloud.exception.ConnectionException in error code
list for
 exceptions
 
 WARN  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-13:ctx-b4517020) Monitor
 XcpServerDiscoverer says there is an error in the connect process for 5
due to
 Reinitialize agent after setup.
 
 INFO  [c.c.a.m.AgentManagerImpl] (AgentTaskPool-13:ctx-b4517020) Host 5
is
 disconnecting with event AgentDisconnected
 
 WARN  [c.c.r.ResourceManagerImpl] (AgentTaskPool-13:ctx-b4517020)
Unable to
 connect due to
 
 com.cloud.exception.ConnectionException: Reinitialize agent after setup.
 
at

com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer.processConn
ect(XcpServerDiscoverer.java:621)
 
at

com.cloud.agent.manager.AgentManagerImpl.notifyMonitorsOfConnection(Agent
ManagerImpl.java:539)
 
at

com.cloud.agent.manager.AgentManagerImpl.handleDirectConnectAgent(AgentMa
nagerImpl.java:1447)
 
at

com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManager
Impl.java:1794)
 
at

com.cloud.resource.ResourceManagerImpl.createHostAndAgent(ResourceManager
Impl.java:1920)
 
at sun.reflect.GeneratedMethodAccessor142.invoke(Unknown Source)
 
at

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI
mpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:606)
 
at

org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(A
opUtils.java:317)
 
at

org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpo
int(ReflectiveMethodInvocation.java:183)
 
at

org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
ectiveMethodInvocation.java:150)
 
at

org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(Ex
poseInvocationInterceptor.java:91)
 
at

org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(Refl
ectiveMethodInvocation.java:172)
 
at

org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAop
Proxy.java:204)
 
at com.sun.proxy.$Proxy149.createHostAndAgent(Unknown Source)
 
at

com.cloud.agent.manager.AgentManagerImpl$SimulateStartTask.runInContext(A
gentManagerImpl.java:1078)
 
at

org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(Manage
dContextRunnable.java:49)
 
at

org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(D
efaultManagedContext.java:56)
 
at

org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWith
Context(DefaultManagedContext.java:103)
 
at

org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithC
ontext

ghost volumes belonging to a project

2015-04-09 Thread Yiping Zhang
Hi,

We always create VM instances in project context, so that all resources are 
assigned to project instead of individual accounts.  I have a script to call 
listProjects API to show project’s resource limits and current usages.

Recently, I noticed that for a couple of projects, my script reports a 
different number of volumes from what I can see in web UI for these projects.  
Poking around in mysql db, it looks like web UI only reports volumes in Ready 
state, while listProjects API reports volumes in both Ready and Destroy states.

Following is a project where UI says it has two volumes while listProjects API 
says it has nine:


mysql select v.id, v.name, v.state, v.removed, v.path from volumes v, projects 
p where v.account_id = p.project_account_id and p.name='activityservice' and 
v.state != 'Expunged';

+--+---+-+-+--+

| id   | name  | state   | removed | path   
  |

+--+---+-+-+--+

|  961 | ROOT-1053 | Destroy | 2015-03-02 21:36:09 | NULL   
  |

|  965 | ROOT-1061 | Destroy | 2015-03-03 00:25:17 | NULL   
  |

|  966 | ROOT-1063 | Destroy | 2015-03-03 00:28:16 | NULL   
  |

|  967 | ROOT-1066 | Destroy | 2015-03-03 00:41:48 | NULL   
  |

|  968 | ROOT-1067 | Destroy | 2015-03-03 00:49:11 | NULL   
  |

|  969 | ROOT-1069 | Destroy | 2015-03-03 00:53:08 | NULL   
  |

|  970 | ROOT-1070 | Destroy | 2015-03-03 00:53:52 | NULL   
  |

| 1000 | ROOT-1100 | Ready   | NULL| 
2c315f38-3a5f-4b2e-a765-332ddbe7a938 |

| 1217 | ROOT-1314 | Ready   | NULL| 
1d3f6716-d607-4d9d-b71c-93b7a88cd064 |

+--+---+-+-+--+

9 rows in set (0.00 sec)


mysql


These ghost volumes are counted into project’s total usage of primary storage, 
so I’d like to get rid of them to free up space for real volumes.


How do I clean them up ? Is this a known issue ?


Thanks,


Yiping


inventory storage volumes on primary storage

2015-03-31 Thread Yiping Zhang
HI, all:

I am doing an inventory of all vhd files on my primary storage device.   I have 
only eight active VM instances, but I have 43 vhd files on file system.  
Running cmd “xe vdi-list” on XenServer provides infor on all vhd files.  From 
the output,  I can tell that “name-label” field generally matches the disk 
names shown in CS web UI for active volumes, or for templates copied over to 
primary storage.  However,  there are a number of vdi’s whose “name-label” 
field is either blank or “base copy”, as shown below:

# xe vdi-list

uuid ( RO): 1e2b3e3d-514c-4c99-bf55-d32b60d8eefe

  name-label ( RW): base copy

name-description ( RW):

 sr-uuid ( RO): b3f5cc51-a68b-842e-e5a5-4ffb042eaade

virtual-size ( RO): 21474836480

sharable ( RO): false

   read-only ( RO): true


uuid ( RO): b7470f8a-f014-49b0-8263-65ffe348446c

  name-label ( RW):

name-description ( RW):

 sr-uuid ( RO): b3f5cc51-a68b-842e-e5a5-4ffb042eaade

virtual-size ( RO): 21474836480

sharable ( RO): false

   read-only ( RO): false


uuid ( RO): 4c3b92df-3cda-4bf0-97c3-9d88818e855a

  name-label ( RW): ROOT-26

name-description ( RW):

 sr-uuid ( RO): b3f5cc51-a68b-842e-e5a5-4ffb042eaade

virtual-size ( RO): 262144

sharable ( RO): false

   read-only ( RO): false


uuid ( RO): 2a4992e4-fb97-41d3-95ed-3bf188ace7c1

  name-label ( RW): Template 8206c6e7-1b7f-4365-bdf2-020a47318c3e

name-description ( RW):

 sr-uuid ( RO): b3f5cc51-a68b-842e-e5a5-4ffb042eaade

virtual-size ( RO): 21474836480

sharable ( RO): false

   read-only ( RO): false



(more output)


My questions are: 1) what are those VDI’s whose “name-label” is blank, can I 
safely delete them from file system ? And 2) what are those “base copy” VDI’s, 
how do I find out which VM / templates they belong to ?


Thanks,


Yiping




customize DNS domain names

2015-03-11 Thread Yiping Zhang
Hi, All:

I have been reading on CS admin guide (latest one is for 4.2.0) about 
customizing DNS domains :  
https://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Admin_Guide/customizing-dns.html

After trying it out,  I came to the conclusion that this section applies only 
to isolated networks.  Is this correct?

I am using advanced networking with SecurityGroup (CS version 4.3.2),  and my 
VM instances get their DNS domains from: 1) guest network’s own network domain 
setting, if one is set, or 2) global setting guest.domain.suffix. Network 
domain setting for zones, domains or accounts do not seem to change how 
instances get their DNS domain names.

I have a guest network shared by multiple domains,  I would like instances on 
this network be assigned different DNS domains depending on which domain the 
instance belongs to.  Is this doable with shared networks?

Thanks

Yiping




Re: host global config

2015-03-06 Thread Yiping Zhang
If you CloudStack manager is multi-homed, the CloudStack will populate
host global settings var with IP address of the interface that is on the
same network as your default gateway on your mgr node.  If this IP is not
your (cloudstack) management interface IP, then you need to change host
value, as well as global setting var management.network.cidr

Yiping

On 3/6/15, 11:21 AM, Somesh Naidu somesh.na...@citrix.com wrote:

Fedi,

That IP will be used by agents (hosts and System VMs) to communicate with
management server on port 8250.

Generally, this would be the same IP as listed in the output of select *
from mshost;.

Regards,
Somesh

-Original Message-
From: Fedi Ben Ali [mailto:ben.ali.fe...@gmail.com]
Sent: Friday, March 06, 2015 7:13 AM
To: users@cloudstack.apache.org
Subject: host global config

Hello,

I'm wondering wish IP adress of the management server to put on the host
entries in the global config , is it the public or the management ip ?

Thx.



Re: how to assign an AffinityGroups to VM instances belonging to a project

2015-03-05 Thread Yiping Zhang
Hi, Rene:

Thanks, much appreciated.

Yiping

On 3/5/15, 6:04 AM, Rene Moser m...@renemoser.net wrote:

Hi

On 05.03.2015 02:17, Yiping Zhang wrote:
 How would one bump up priority for this issue and hopefully get looked
at
 by someone sooner than later ?

I will try to jump in fixing the project support.

Regards
René



Re: how to assign an AffinityGroups to VM instances belonging to a project

2015-03-04 Thread Yiping Zhang
How would one bump up priority for this issue and hopefully get looked at
by someone sooner than later ?

Yiping

On 3/4/15, 11:50 AM, Rene Moser m...@renemoser.net wrote:

Hi

On 03/04/2015 08:42 PM, Yiping Zhang wrote:
 Hi, all:

 I am trying to assign an affinity group I just created to VM instances
I am creating under a project context, but I got following error:


 INFO  [c.c.a.ApiServer] (catalina-exec-16:ctx-025997c3 ctx-7db7fbec)
PermissionDenied: Entity
AffinityGroup[f55a0c4e-c9f5-4498-b637-3523b6af46d9] and entity
Acct[d79defc7-529c-40ea-920d-75648f64f02b-PrjAcct-Infrastructure-3]
belong to different accounts on objs: []

 Looking at the affinity group with
uuid=f55a0c4e-c9f5-4498-b637-3523b6af46d9, it belongs to my account.
It seems that I can¹t create new affinity groups under project context
(IOW, affinity group  owned by a project, rather than owned by my own
account).  If I am not in any project, just in ³Default View², then I
can create VM instances and assign them to my affinity group.

 How would one assign affinity groups to VM instance belonging to a
project?

Not (yet) supported, see
https://issues.apache.org/jira/browse/CLOUDSTACK-6237. I am waiting for
this feature ever since.


Yours
René



Re: Cloudstack + XenServer 6.2 + NetApp in production

2015-02-15 Thread Yiping Zhang
Tim,

Thanks, for the reply.

In our case, the NetApp cluster as a whole did not fail.  The NetApp
cluster failover was happening because Operations team was performing a
scheduled maintenance, this is normal behavior. To best of my knowledge,
NetApp head failover should take anywhere 10-15 seconds.

As you guessed correctly, our XenServer resource pool does have HA
enabled, and HA shared SR is indeed on the same NetApp cluster as the
primary storage SR.  Though I am not sure if enabling xen pool HA is the
cause of xenserver¹s  rebooting under this particular scenario.

I am not sure if I understand your statement that In that case, HA would
detect the storage failure and fence the XenServer host².  Can you
elaborate a little more on this statement?

Thanks again,

Yiping


On 2/14/15, 6:26 AM, Tim Mackey tmac...@gmail.com wrote:

Yiping,

The specific problem covered by that note was solved a long time ago.
Timeouts can be caused by a number of things, and if the entire NetApp
cluster went offline, the XenServer host would be impacted.  Since you are
experiencing a host reboot when this happens, I suspect you have XenServer
HA enabled with the heartbeat on the same NetApp cluster.  In that case,
HA
would detect the storage failure and fence the XenServer host.

The solution here would be to understand why your NetApp cluster failed
during scheduled maintenance. Something in your configuration has created
a
single point of failure. If you've enabled HA, I also would like to
understand why you've chosen to do that.  Going slightly commercial for a
second, I would also advise you to look into a commercial support contract
for your production XenServer hosts. That team is going to be able to go
deeper, and much quicker, when production issues arise than this list.
NetApp and XenServer is used in a very large number of deployments, so if
there is something wrong they'll be more likely to know. For example,
there
could be a set of XenServer or OnTap patches to help sort this out.

-tim

On Fri, Feb 13, 2015 at 7:36 PM, Yiping Zhang yzh...@marketo.com wrote:

 Hi, all:

 I am wondering if any one is running their CloudStack in production
 deployments with  XenServer 6.2 + NetApp clusters ?

 Recently, in our non production deployment (rhel 6.6 + CS 4.3.0 +
 XenServer 6.2 cluster + NetApp cluster), all our XenServer rebooted
 automatically because of NFS timeout, when our NetApp cluster failover
 happened during a scheduled filer maintenance. My google search turned
up
 this Citrix hot fix: http://support.citrix.com/article/CTX135623 for
 XenServer 6.0.2, and this post about XenServer 6.2:
 http://www.gossamer-threads.com/lists/xen/devel/320020 .

 Obviously the problem still exists for XenServer 6.2 and we are very
 concerned about going to production deployment based on this technology
 stack.

 If anyone has a similar setup, please share your experiences.

 Thanks,

 Yiping






Re: Cloudstack + XenServer 6.2 + NetApp in production

2015-02-15 Thread Yiping Zhang
Hi, Tim and Adriano:

Thanks very much for detailed and very insightful replies.

After going back and reread the log files again, I am fully convinced that
it was indeed that the HA feature caused the actual xenservers fencing
themselves. I’ll follow that Citrix support article to determine best HA
timeout value to use in our environment.

Yiping


On 2/15/15, 1:50 PM, Tim Mackey tmac...@gmail.com wrote:

Here's a KB which covers how to change the *XenServer* HA setting (not the
CloudStack one): http://support.citrix.com/article/CTX139166.  It would be
good to check /var/log/xha.log to see if any issues were logged there.
Also note that with HA you want always have your hosts NTP sync'd.  With
the default timeout being 30 seconds, I'd start by verifying from your
NetApp admins how long the head was actually offline.  I'd also look into
any network config issues (assuming you've bonded your storage network).

-tim

On Sun, Feb 15, 2015 at 4:28 PM, Adriano Paterlini paterl...@usp.br
wrote:

 Yiping,

 We do have a production environment with similar configuration, you can
 check some parameters and logs.

 First of all, xenserver nfs timeout will occur every time nfs server
takes
 more than 13.3 (40.0/3.0) seconds to answer read or write nfs calls,
this
 is defined as SOFTMOUNT_TIMEOUT at /opt/xensource/sm/nfs.py. There are
some
 xenserver forum discussions about changing this parameter, my conclusion
 that its not recommended, the consequence would be virtual machines
going
 into ready only mode, unless vm parameters are also modified, linux
 defaults usually are 30 seconds. NFS timeouts are shown at
 /var/log/kern.log.

 However, the timeout itself does not cause host reboot, the reboot is
 probably due to cloudstack HA storage fence, just as Tim mentioned,
storage
 fence is enforced at the script  /opt/cloud/bin/xenheartbeat.sh, you can
 check for the log entries to confirm if it was the case. If its is
really
 the case you case adjust the cloudstack global settings parameters
 xenserver.heartbeat.interval and xenserver.heartbeat.timeout to
accommodate
 planned maintenance and even automatic storage side HA, you should check
 with Netapp for recommend values for your environment, takeover/giveback
 delays may vary according to controller version and even current
controller
 load, Netapp documentation mention 180 seconds as maximum delay. Also
check
 if script is running correctly #ps -aux | grep heartbeat, it should
take 3
 parameters, if not you may be affected by
 https://issues.apache.org/jira/browse/CLOUDSTACK-7184.

 Hope the comments help your decision.


 Regards,
 Adriano


 On Sun, Feb 15, 2015 at 12:38 PM, cyr...@usp.br wrote:

  FYI
 
  Sent from my iPhone
 
  Begin forwarded message:
 
  *From:* Yiping Zhang yzh...@marketo.com
  *Date:* February 15, 2015 at 2:00:05 AM GMT-2
  *To:* users@cloudstack.apache.org users@cloudstack.apache.org
  *Subject:* *Re: Cloudstack + XenServer 6.2 + NetApp in production*
  *Reply-To:* users@cloudstack.apache.org
 
  Tim,
 
  Thanks, for the reply.
 
  In our case, the NetApp cluster as a whole did not fail.  The NetApp
  cluster failover was happening because Operations team was performing
a
  scheduled maintenance, this is normal behavior. To best of my
knowledge,
  NetApp head failover should take anywhere 10-15 seconds.
 
  As you guessed correctly, our XenServer resource pool does have HA
  enabled, and HA shared SR is indeed on the same NetApp cluster as the
  primary storage SR.  Though I am not sure if enabling xen pool HA is
the
  cause of xenserver¹s  rebooting under this particular scenario.
 
  I am not sure if I understand your statement that In that case, HA
would
  detect the storage failure and fence the XenServer host².  Can you
  elaborate a little more on this statement?
 
  Thanks again,
 
  Yiping
 
 
  On 2/14/15, 6:26 AM, Tim Mackey tmac...@gmail.com wrote:
 
  Yiping,
 
 
  The specific problem covered by that note was solved a long time ago.
 
  Timeouts can be caused by a number of things, and if the entire NetApp
 
  cluster went offline, the XenServer host would be impacted.  Since you
 are
 
  experiencing a host reboot when this happens, I suspect you have
 XenServer
 
  HA enabled with the heartbeat on the same NetApp cluster.  In that
case,
 
  HA
 
  would detect the storage failure and fence the XenServer host.
 
 
  The solution here would be to understand why your NetApp cluster
failed
 
  during scheduled maintenance. Something in your configuration has
created
 
  a
 
  single point of failure. If you've enabled HA, I also would like to
 
  understand why you've chosen to do that.  Going slightly commercial
for a
 
  second, I would also advise you to look into a commercial support
 contract
 
  for your production XenServer hosts. That team is going to be able to
go
 
  deeper, and much quicker, when production issues arise than this list.
 
  NetApp and XenServer is used in a very large number of deployments,
so

Re: how to configure multi-homed management server?

2015-02-13 Thread Yiping Zhang
Update:

Now my mgr¹s eth0 is configured with IP address on the lab network, the
hostname is set to the name of lab network IP. However, after I run
cloudstack-setup-databases and cloudstack-setup-management scripts, I
still find the wrong IP address in global setting param hosts, and wrong
value for param ³management.network.cidr².

It looks like that running cloudstack-setup-management script (with
immediate start of cloudstack-management service) populates DB tables.  So
my question comes down to how does cloudstack-management service
determine these configuration values ?

Thanks

Yiping

On 2/11/15, 8:21 PM, Praveen B pbprave...@gmail.com wrote:

Hi Zhang,

host parameter in the global settings will decide system VMs to
communicate to MGMT ip address on port 8250. Since you have two IP
addresses on management server, CloudStack has picked up your corporate
network IP address.

As a fix, change the host parameter to your lab network IP address and
destroy system VMs. New system VMs will point to correct mgmt IP. Let me
know how it goes.

Thanks,
Praveen

On Thu, Feb 12, 2015 at 5:39 AM, Yiping Zhang yzh...@marketo.com wrote:

 Hi, all:

 My  CS management server has two IP addresses: one IP address on our
 corporate network (for general access) and one IP address on lab
network ,
 which is used at CloudStack¹s management network.

 When I run cloudstack-setup-database script, I have given ³‹mshost²
option
 with its IP address on lab network.  However,  when CPVM comes up, the
 cloud service is not running . Looking at /var/log/cloud.log file on
CPVM
 indicates that it is trying to connect to port 8250 of management
server¹s
 IP address on corporate network instead of the IP address on lab
network.
 On SSVM, its cloud service also tries to connect to port 8250 of the
wrong
 MS IP address

 How does systemVMs decide which IP address on the management server to
use
 when starting cloud service ?  How do I make systemVM¹s to use proper
 interface on management server ?

 Thanks

 Yiping




Cloudstack + XenServer 6.2 + NetApp in production

2015-02-13 Thread Yiping Zhang
Hi, all:

I am wondering if any one is running their CloudStack in production deployments 
with  XenServer 6.2 + NetApp clusters ?

Recently, in our non production deployment (rhel 6.6 + CS 4.3.0 + XenServer 6.2 
cluster + NetApp cluster), all our XenServer rebooted automatically because of 
NFS timeout, when our NetApp cluster failover happened during a scheduled filer 
maintenance. My google search turned up this Citrix hot fix: 
http://support.citrix.com/article/CTX135623 for XenServer 6.0.2, and this post 
about XenServer 6.2: http://www.gossamer-threads.com/lists/xen/devel/320020 .

Obviously the problem still exists for XenServer 6.2 and we are very concerned 
about going to production deployment based on this technology stack.

If anyone has a similar setup, please share your experiences.

Thanks,

Yiping




how to configure multi-homed management server?

2015-02-11 Thread Yiping Zhang
Hi, all:

My  CS management server has two IP addresses: one IP address on our corporate 
network (for general access) and one IP address on lab network , which is used 
at CloudStack’s management network.

When I run cloudstack-setup-database script, I have given “—mshost” option with 
its IP address on lab network.  However,  when CPVM comes up, the cloud service 
is not running . Looking at /var/log/cloud.log file on CPVM indicates that it 
is trying to connect to port 8250 of management server’s IP address on 
corporate network instead of the IP address on lab network.  On SSVM, its cloud 
service also tries to connect to port 8250 of the wrong MS IP address

How does systemVMs decide which IP address on the management server to use when 
starting cloud service ?  How do I make systemVM’s to use proper interface on 
management server ?

Thanks

Yiping


  1   2   >