Re: ACS 4.8.1: Virtual router - /var/log always full

2016-11-27 Thread Will Stevens
Are you using XenServer?  We recently found a bug in XenServer which has
similar characteristics.  I can put together details of the problem if
yes...

Cheers,

Will

*Will STEVENS*
Lead Developer



On Sun, Nov 27, 2016 at 11:36 PM, Cloud List  wrote:

> Dear all,
>
> Please ignore this, managed to find the reason - paswd_server_ip process is
> holding to a log file which has been deleted earlier to fix the disk space
> issue. Restarted the process and the disk space is fine now.
>
> ===
> passwd_se  2199   root1w  REG 254,10
> 171802407 15 /var/log/cloud.log (deleted)
> passwd_se  2199   root2w  REG 254,10
> 171802407 15 /var/log/cloud.log (deleted)
> passwd_se  2199   root3w  REG 254,10
> 171802407 15 /var/log/cloud.log (deleted)
> python 2202   root3w  REG 254,10
> 171802407 15 /var/log/cloud.log (deleted)
>
>  2199 ?S  0:00 /bin/bash /opt/cloud/bin/passwd_server_ip
> X.X.X.2 dummy
>  2202 ?S  7:13 python /opt/cloud/bin/passwd_server_ip.py
> X.X.X.2
> ===
>
> Thank you.
>
>
> On Mon, Nov 28, 2016 at 11:46 AM, Cloud List  wrote:
>
> > Dear all,
> >
> > After upgrading to ACS 4.8.1, one of our virtual router's /var/log
> > partition is always full and used up quite fast. This caused the VR not
> > able to serve DHCP and password requests from VM.
> >
> > root@r-4155-VM:/var/log# df -h
> > Filesystem  Size  Used Avail
> > Use% Mounted on
> > rootfs  461M  158M  280M
> > 37% /
> > udev 10M 0
> > 10M   0% /dev
> > tmpfs25M  236K
> > 25M   1% /run
> > /dev/disk/by-uuid/30c81d3d-ee9f-4a88-81c1-5f349b22ba1d  461M  158M  280M
> > 37% /
> > tmpfs   5.0M 0
> > 5.0M   0% /run/lock
> > tmpfs   157M 0
> > 157M   0% /run/shm
> > /dev/vda173M   23M   47M
> > 33% /boot
> > /dev/vda692M  5.6M
> > 81M   7% /home
> > /dev/vda8   184M  6.2M
> > 169M   4% /opt
> > /dev/vda11   92M  5.6M
> > 81M   7% /tmp
> > /dev/vda7   751M  493M  219M
> > 70% /usr
> > /dev/vda9   563M  282M  252M
> > 53% /var
> > /dev/vda10  184M  176M 0
> > 100% /var/log
> >
> > Even after rotating and clearing the logs, the usage of /var/log is only
> > 4.7M so I am not too sure where is the 176M coming from.
> >
> > root@r-4155-VM:/var/log# du -h
> > 1.0K./samba
> > 3.8M./sysstat
> > 68K ./apt
> > 7.0K./apache2
> > 3.0K./fsck
> > 317K./installer/cdebconf
> > 809K./installer
> > 1.0K./news
> > 12K ./lost+found
> > 1.0K./ntpstats
> > 4.7M.
> >
> > /dev/vda10  184M  175M  475K
> > 100% /var/log
> >
> > I would need to clear the logs and do a "service dnsmasq restart"
> > regularly to make the VR functioning again, which is quite troublesome.
> >
> > Any advice is greatly appreciated.
> >
> > Looking forward to your reply, thank you.
> >
> > Cheers.
> >
>


Re: ACS 4.8.1: Virtual router - /var/log always full

2016-11-27 Thread Cloud List
Dear all,

Please ignore this, managed to find the reason - paswd_server_ip process is
holding to a log file which has been deleted earlier to fix the disk space
issue. Restarted the process and the disk space is fine now.

===
passwd_se  2199   root1w  REG 254,10
171802407 15 /var/log/cloud.log (deleted)
passwd_se  2199   root2w  REG 254,10
171802407 15 /var/log/cloud.log (deleted)
passwd_se  2199   root3w  REG 254,10
171802407 15 /var/log/cloud.log (deleted)
python 2202   root3w  REG 254,10
171802407 15 /var/log/cloud.log (deleted)

 2199 ?S  0:00 /bin/bash /opt/cloud/bin/passwd_server_ip
X.X.X.2 dummy
 2202 ?S  7:13 python /opt/cloud/bin/passwd_server_ip.py X.X.X.2
===

Thank you.


On Mon, Nov 28, 2016 at 11:46 AM, Cloud List  wrote:

> Dear all,
>
> After upgrading to ACS 4.8.1, one of our virtual router's /var/log
> partition is always full and used up quite fast. This caused the VR not
> able to serve DHCP and password requests from VM.
>
> root@r-4155-VM:/var/log# df -h
> Filesystem  Size  Used Avail
> Use% Mounted on
> rootfs  461M  158M  280M
> 37% /
> udev 10M 0
> 10M   0% /dev
> tmpfs25M  236K
> 25M   1% /run
> /dev/disk/by-uuid/30c81d3d-ee9f-4a88-81c1-5f349b22ba1d  461M  158M  280M
> 37% /
> tmpfs   5.0M 0
> 5.0M   0% /run/lock
> tmpfs   157M 0
> 157M   0% /run/shm
> /dev/vda173M   23M   47M
> 33% /boot
> /dev/vda692M  5.6M
> 81M   7% /home
> /dev/vda8   184M  6.2M
> 169M   4% /opt
> /dev/vda11   92M  5.6M
> 81M   7% /tmp
> /dev/vda7   751M  493M  219M
> 70% /usr
> /dev/vda9   563M  282M  252M
> 53% /var
> /dev/vda10  184M  176M 0
> 100% /var/log
>
> Even after rotating and clearing the logs, the usage of /var/log is only
> 4.7M so I am not too sure where is the 176M coming from.
>
> root@r-4155-VM:/var/log# du -h
> 1.0K./samba
> 3.8M./sysstat
> 68K ./apt
> 7.0K./apache2
> 3.0K./fsck
> 317K./installer/cdebconf
> 809K./installer
> 1.0K./news
> 12K ./lost+found
> 1.0K./ntpstats
> 4.7M.
>
> /dev/vda10  184M  175M  475K
> 100% /var/log
>
> I would need to clear the logs and do a "service dnsmasq restart"
> regularly to make the VR functioning again, which is quite troublesome.
>
> Any advice is greatly appreciated.
>
> Looking forward to your reply, thank you.
>
> Cheers.
>


ACS 4.8.1: Virtual router - /var/log always full

2016-11-27 Thread Cloud List
Dear all,

After upgrading to ACS 4.8.1, one of our virtual router's /var/log
partition is always full and used up quite fast. This caused the VR not
able to serve DHCP and password requests from VM.

root@r-4155-VM:/var/log# df -h
Filesystem  Size  Used Avail
Use% Mounted on
rootfs  461M  158M  280M
37% /
udev 10M 0   10M
0% /dev
tmpfs25M  236K   25M
1% /run
/dev/disk/by-uuid/30c81d3d-ee9f-4a88-81c1-5f349b22ba1d  461M  158M  280M
37% /
tmpfs   5.0M 0  5.0M
0% /run/lock
tmpfs   157M 0  157M
0% /run/shm
/dev/vda173M   23M   47M
33% /boot
/dev/vda692M  5.6M   81M
7% /home
/dev/vda8   184M  6.2M  169M
4% /opt
/dev/vda11   92M  5.6M   81M
7% /tmp
/dev/vda7   751M  493M  219M
70% /usr
/dev/vda9   563M  282M  252M
53% /var
/dev/vda10  184M  176M 0
100% /var/log

Even after rotating and clearing the logs, the usage of /var/log is only
4.7M so I am not too sure where is the 176M coming from.

root@r-4155-VM:/var/log# du -h
1.0K./samba
3.8M./sysstat
68K ./apt
7.0K./apache2
3.0K./fsck
317K./installer/cdebconf
809K./installer
1.0K./news
12K ./lost+found
1.0K./ntpstats
4.7M.

/dev/vda10  184M  175M  475K
100% /var/log

I would need to clear the logs and do a "service dnsmasq restart" regularly
to make the VR functioning again, which is quite troublesome.

Any advice is greatly appreciated.

Looking forward to your reply, thank you.

Cheers.


Re: AW: 2node HA ACS Cluster with DRBD

2016-11-27 Thread Adrian Sender
I have setup and ran cloudstack using KVM, DRBD, CLVM primary/primary - I used
to run the VMs on the storage nodes themselves (better IO). No need to run
ISCSI / NFS exports that slow everything down.

As currently DRBD is limited to 2 nodes, I run multiple 2 node KVM clusters
with DRBD.

I noticed much better IO using CLVM over the GFS2 file system.

Everything worked great and I ran this configuration for years. Never had
corruption or data loss.

Regards,
Adrian Sender

-- Original Message ---
From: Dag Sonstebo 
To: "users@cloudstack.apache.org" ,
"jeroen.ke...@keerl-it.com" 
Sent: Fri, 25 Nov 2016 09:32:42 +
Subject: Re: AW: 2node HA ACS Cluster with DRBD

> Hi Jeoroen,
> 
> Sorry I missed you yesterday [UTF-8?]–  meant to catch up after the user 
> group but had to run to catch my flight.
> 
> I think what you describe will work [UTF-8?]– however I have my doubts it 
> will fail over gracefully. For pacemaker to fail over cleanly the 
> failover has to be perfectly synched [UTF-8?]– i.e. all packets have to be 
> written in both primary storage pools, traffic ideally quiesced [UTF-8?]– 
> then pacemaker can move the NFS or iSCSI endpoint. If you are even a 
> byte out you could end up with data corruption [UTF-8?]– and even if this 
> does work I have my doubts the VMs would stay online afterwards.
> 
> Anyway [UTF-8?]– as the proverb goes, the proof is in the pudding
[UTF-8?]– so I 
> can only suggest you test this out. Very interested in the result 
> though [UTF-8?]– so please let us know how you get on (if it works it 
> would be a good talk for the next user group [UTF-8?]☺ ).
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> 
> From: Jeroen Keerl 
> Reply-To: "users@cloudstack.apache.org" 
> , "jeroen.ke...@keerl-it.com" 
> 
 Date: Wednesday, 23 November 2016 at 
> 22:43
 To: "users@cloudstack.apache.org" 
> 
 Subject: AW: 2node HA ACS Cluster 
> with DRBD
> 
> Hi Dag, Erik,
> 
> thanks for your input so far.
> What I am aiming for is a "HyperConverged" infrastructure, if 
> possible with just two servers.
> 
> The reason why I didn't look into Ceph any further, is that they 
> explicitly state that they'll need 3 hosts.
 Apart from that, the 
> seems to be quite a lot of resource needs to get Ceph up & running.
> 
> DRBD and GlusterFS look like they're not that heavy on load.
> GlusterFS has moved away from 2 hosts only as well, and it seems 
> less flexible when it comes to expansion, if I recall correctly.
> 
> Hence: DRBD, which runs in Master-Slave or Dual Master mode.
> Together with Pacemaker and NFS or iSCSI software, this could work,
>  albeit -after overthinking it all- probably in a master-slave mode, 
> since the shared / clustered IP address can only be available on one 
> of two nodes.
> 
> As written before: HA-Lizard does all this out of the box, including 
> HA - if needed, and fairly well too.
> 
> Since I'll be hopping over to visit the CS User Group tomorrow, I'll 
> have no time to look into this any further until Tuesday.
> (Dag, will I have to chance to see you there as well?)
> 
> Cheers
> JK
> 
> dag.sonst...@shapeblue.com 
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
> 
> -Ursprüngliche Nachricht-
> Von: Dag Sonstebo 
> [mailto:dag.sonst...@shapeblue.com]
 Gesendet: Mittwoch, 23. November 2016 10:35
 An: users@cloudstack.apache.org
 Betreff: Re: 2node HA ACS Cluster with DRBD
> 
> Hi Jeroen,
> 
> My twopence worth:
> 
> First of all [UTF-8?]– I guess your plan is to run two nodes [UTF-8?]–
each with 
> CloudStack management, MySQL (master-slave), KVM and storage?
> 
> This all depends on your user case. As a bit of an experiment or as 
> a small scale lab I think this may work [UTF-8?]– but I would be very 
> reluctant to rely on this for a production workload. I think you 
> will potentially have stability and integrity issues at the storage 
> level in a HA failover scenario, on top of this I [UTF-8?]don’t think this 
> will scale well. You may also end up with considerable storage 
> overhead depending on number of nodes + technology used. With two 
> nodes you immediately only have 50% max space utilization.
> 
> Putting all of that aside I think it could work, [UTF-8?]I’ve played with 
> similar ideas in the past (without actually spending enough time to 
> get it working). I think you could get around the heartbeating / 
> split brain situations relatively easily. The CloudStack and MySQL 
> installs as well as KVM should work OK, but your challenge will be 
> storage, which both has to work in the synchronized setup you want + 
> very importantly fail over gracefully. I guess you would probably 
> look at Ceph - if Wido or any of the other Ceph users read this they 
> are probably better placed to advise.
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> 
> From: Jeroen Keer

Re: Fault percentage value of CPU usage in Cloud Platform

2016-11-27 Thread Sergey Levitskiy
First of all, if you don’t have support for CCP you need to strongly consider 
migrating to ACP. This community doesn’t have access to CCP code so we have 
very limited way to assist you.
In any case looking at the log you provided it seems you have 2 cluster: one 
(cluster id =5) that crossed your allocation threshold for capacity CPU. The 
other cluster (id=1) simply has no host matching host tag of your service 
offering (“WinL”) so that cluster can’t be allocated either.
Can you execute this SQL query and post results back:

SELECT 
`c`.`uuid` AS `cluster_uuid`,
`c`.`id` AS `cluster_id`,
`c`.`name` AS `cluster_name`,
`d`.`uuid` AS `zone_uuid`,
`d`.`name` AS `zone_name`,
IF((`capacity`.`capacity_type` = 1),
'cpu',
'ram') AS `type`,
((SUM(`capacity`.`used_capacity`) + 
SUM(`capacity`.`reserved_capacity`)) + 0) AS `used_capacity`,
SUM((`capacity`.`total_capacity` * `overcommit`.`value`)) AS 
`total_capacity`,
FORMATSUM(`capacity`.`used_capacity`) + 
SUM(`capacity`.`reserved_capacity`)) + 0) / SUM((`capacity`.`total_capacity` * 
`overcommit`.`value`))),
2) AS `utilization`,
(CASE (SELECT 
COUNT(0)
FROM
`cluster_details` `details`
WHERE
((`details`.`cluster_id` = `capacity`.`cluster_id`)
AND (`details`.`name` = 
'cluster.cpu.allocated.capacity.disablethreshold')))
WHEN
1
THEN
(CASE
WHEN
ISNULL((SELECT 
`details`.`value`
FROM
`cluster_details` `details`
WHERE
((`details`.`cluster_id` = 
`capacity`.`cluster_id`)
AND (`details`.`name` = 
'cluster.cpu.allocated.capacity.disablethreshold'
THEN
(SELECT 
`config`.`value`
FROM
`configuration` `config`
WHERE
(`config`.`name` = 
'cluster.cpu.allocated.capacity.disablethreshold'))
ELSE (SELECT 
`details`.`value`
FROM
`cluster_details` `details`
WHERE
((`details`.`cluster_id` = `capacity`.`cluster_id`)
AND (`details`.`name` = 
'cluster.cpu.allocated.capacity.disablethreshold')))
END)
ELSE (SELECT 
`config`.`value`
FROM
`configuration` `config`
WHERE
(`config`.`name` = 
'cluster.cpu.allocated.capacity.disablethreshold'))
END) AS `threshold`,
(CASE (SELECT 
COUNT(0)
FROM
`cluster_details` `details`
WHERE
((`details`.`cluster_id` = `capacity`.`cluster_id`)
AND (`details`.`name` = 'cpuOvercommitRatio')))
WHEN
1
THEN
(CASE
WHEN
ISNULL((SELECT 
`details`.`value`
FROM
`cluster_details` `details`
WHERE
((`details`.`cluster_id` = 
`capacity`.`cluster_id`)
AND (`details`.`name` = 
'cpuOvercommitRatio'
THEN
(SELECT 
`config`.`value`
FROM
`configuration` `config`
WHERE
(`config`.`name` = 'cpuOvercommitRatio'))
ELSE (SELECT 
`details`.`value`
FROM
`cluster_details` `details`
WHERE
((`details`.`cluster_id` = `capacity`.`cluster_id`)
AND (`details`.`name` = 'cpuOvercommitRatio')))
END)
ELSE (SELECT 
`config`.`value`
FROM
`configuration` `config`
WHERE
(`config`.`name` = 'cpuOvercommitRatio'))
END) AS `overprovisioning`
FROM
`op_host_capacity` `capacity`
JOIN `cluster_details` `overcommit` ON ((`overcommit`.`cluster_id` = 
`capacity`.`cluster_id`)))
JOIN `cluster` `c` ON ((`c`.`id` =