from:"Andrei Mikhailovsky"

Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade errors

2024-08-15 Thread Andrei Mikhailovsky

Joao, i spoke too soon. with a bit of Claude help i got the following sql query 
that worked:

DROP PROCEDURE IF EXISTS `cloud_usage`.`IDEMPOTENT_ADD_COLUMN`;

DELIMITER //

CREATE PROCEDURE `cloud_usage`.`IDEMPOTENT_ADD_COLUMN` (
IN in_table_name VARCHAR(200),
IN in_column_name VARCHAR(200),
IN in_column_definition VARCHAR(1000)
)
BEGIN
DECLARE CONTINUE HANDLER FOR 1060 BEGIN END;

SET @ddl = CONCAT('ALTER TABLE ', in_table_name);
SET @ddl = CONCAT(@ddl, ' ', 'ADD COLUMN');
SET @ddl = CONCAT(@ddl, ' ', in_column_name);
SET @ddl = CONCAT(@ddl, ' ', in_column_definition);

PREPARE stmt FROM @ddl;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END //

DELIMITER ;

After restarting the management server it looks like the problem is solved. 
thanks for your help

Cheers

Andrei


- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Thursday, 15 August, 2024 22:27:58
> Subject: Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade 
> errors

> Hi Joao,
> 
> Do you have a link to an unformatted sql syntax for creating the procedure as
> when I copy/paste i get a bunch of sql errors.
> 
> Cheers
> 
> - Original Message -
>> From: "João Jandre Paraquetti" 
>> To: "users" 
>> Sent: Thursday, 15 August, 2024 22:13:53
>> Subject: Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade
>> errors
> 
>> Hello, Andrei
>> 
>> This is happening because the procedure IDEMPOTENT_ADD_COLUMN does not
>> exist in your cloud_usage DB. You can create it manually with the
>> following queries:
>> 
>> DROP PROCEDURE IF EXISTS `cloud_usage`.`IDEMPOTENT_ADD_COLUMN`;
>> CREATE PROCEDURE `cloud_usage`.`IDEMPOTENT_ADD_COLUMN` (
>>     IN in_table_name VARCHAR(200)
>> , IN in_column_name VARCHAR(200)
>> , IN in_column_definition VARCHAR(1000)
>> )
>> BEGIN
>>     DECLARE CONTINUE HANDLER FOR 1060 BEGIN END; SET @ddl =
>> CONCAT('ALTER TABLE ', in_table_name); SET @ddl = CONCAT(@ddl, ' ', 'ADD
>> COLUMN') ; SET @ddl = CONCAT(@ddl, ' ', in_column_name); SET @ddl =
>> CONCAT(@ddl, ' ', in_column_definition); PREPARE stmt FROM @ddl; EXECUTE
>> stmt; DEALLOCATE PREPARE stmt; END;
>> 
>> After defining the procedure, you will have to restart your
>> cloudstack-management service.
>> 
>> This type of issue will should not happen in the next releases, as
>> https://github.com/apache/cloudstack/pull/9385 reworked how procedures
>> are defined for ACS DBs.
>> 
>> Best regards,
>> 
>> João Jandre
>> 
>> On 8/15/24 16:40, Andrei Mikhailovsky wrote:
>>> Hello all,
>>>
>>> I've tried to upgrade my cloudstack from 4.19.0.1 to the latest 4.19.1.1. I 
>>> am
>>> running Ubuntu Server 20.04 with latest updates using Galera + 
>>> mariadb-server
>>> version 10.3.39-0ubuntu0.20.04.2. the cloudstack installation is pretty old
>>> (over 10+ years old) and I have been upgrading every major version release 
>>> and
>>> most of the minor version releases too.
>>>
>>> after the cloudstack-* packages were installed to the latest version and the
>>> cloudstack-management server restarted I have the following information in 
>>> the
>>> management-server.log file:
>>>
>>> 2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
>>> (logid:)
>>> DB version = 4.19.0.1 Code Version = 4.19.1.1
>>> 2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
>>> (logid:)
>>> Database upgrade must be performed from 4.19.0.1 to 4.19.1.1
>>> 2024-08-15 19:58:06,969 DEBUG [c.c.u.DatabaseUpgradeChecker] (main:null)
>>> (logid:) Running upgrade Upgrade41900to41910 to upgrade from 
>>> 4.19.0.0-4.19.1.0
>>> to 4.19.1.0
>>> 2024-08-15 19:58:06,971 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>>> Schema upgrade from 4.19.0.0 to 4.19.1.0
>>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>>> Updates the populated Quota tariff's types VM_DISK_BYTES_READ,
>>> VM_DISK_BYTES_WRITE, VM_DISK_IO_READ and VM_DISK_IO_WRITE to the correct 
>>> unit.
>>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>>> UPDATE
>>> cloud_usage.quota_tariff SET usage_unit = 'Bytes', updated_on = NOW() WHERE
>>> effective_on = '2010-05-04 00:00:00' AND name IN ('VM

Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade errors

2024-08-15 Thread Andrei Mikhailovsky

Hi Joao,

Do you have a link to an unformatted sql syntax for creating the procedure as 
when I copy/paste i get a bunch of sql errors.

Cheers

- Original Message -
> From: "João Jandre Paraquetti" 
> To: "users" 
> Sent: Thursday, 15 August, 2024 22:13:53
> Subject: Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade 
> errors

> Hello, Andrei
> 
> This is happening because the procedure IDEMPOTENT_ADD_COLUMN does not
> exist in your cloud_usage DB. You can create it manually with the
> following queries:
> 
> DROP PROCEDURE IF EXISTS `cloud_usage`.`IDEMPOTENT_ADD_COLUMN`;
> CREATE PROCEDURE `cloud_usage`.`IDEMPOTENT_ADD_COLUMN` (
>     IN in_table_name VARCHAR(200)
> , IN in_column_name VARCHAR(200)
> , IN in_column_definition VARCHAR(1000)
> )
> BEGIN
>     DECLARE CONTINUE HANDLER FOR 1060 BEGIN END; SET @ddl =
> CONCAT('ALTER TABLE ', in_table_name); SET @ddl = CONCAT(@ddl, ' ', 'ADD
> COLUMN') ; SET @ddl = CONCAT(@ddl, ' ', in_column_name); SET @ddl =
> CONCAT(@ddl, ' ', in_column_definition); PREPARE stmt FROM @ddl; EXECUTE
> stmt; DEALLOCATE PREPARE stmt; END;
> 
> After defining the procedure, you will have to restart your
> cloudstack-management service.
> 
> This type of issue will should not happen in the next releases, as
> https://github.com/apache/cloudstack/pull/9385 reworked how procedures
> are defined for ACS DBs.
> 
> Best regards,
> 
> João Jandre
> 
> On 8/15/24 16:40, Andrei Mikhailovsky wrote:
>> Hello all,
>>
>> I've tried to upgrade my cloudstack from 4.19.0.1 to the latest 4.19.1.1. I 
>> am
>> running Ubuntu Server 20.04 with latest updates using Galera + mariadb-server
>> version 10.3.39-0ubuntu0.20.04.2. the cloudstack installation is pretty old
>> (over 10+ years old) and I have been upgrading every major version release 
>> and
>> most of the minor version releases too.
>>
>> after the cloudstack-* packages were installed to the latest version and the
>> cloudstack-management server restarted I have the following information in 
>> the
>> management-server.log file:
>>
>> 2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
>> (logid:)
>> DB version = 4.19.0.1 Code Version = 4.19.1.1
>> 2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
>> (logid:)
>> Database upgrade must be performed from 4.19.0.1 to 4.19.1.1
>> 2024-08-15 19:58:06,969 DEBUG [c.c.u.DatabaseUpgradeChecker] (main:null)
>> (logid:) Running upgrade Upgrade41900to41910 to upgrade from 
>> 4.19.0.0-4.19.1.0
>> to 4.19.1.0
>> 2024-08-15 19:58:06,971 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>> Schema upgrade from 4.19.0.0 to 4.19.1.0
>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>> Updates the populated Quota tariff's types VM_DISK_BYTES_READ,
>> VM_DISK_BYTES_WRITE, VM_DISK_IO_READ and VM_DISK_IO_WRITE to the correct 
>> unit.
>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> UPDATE
>> cloud_usage.quota_tariff SET usage_unit = 'Bytes', updated_on = NOW() WHERE
>> effective_on = '2010-05-04 00:00:00' AND name IN ('VM_DISK_BYTES_READ',
>> 'VM_DISK_BYTES_WRITE')
>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> UPDATE
>> cloud_usage.quota_tariff SET usage_unit = 'IOPS', updated_on = NOW() WHERE
>> effective_on = '2010-05-04 00:00:00' AND name IN ('VM_DISK_IO_READ',
>> 'VM_DISK_IO_WRITE')
>> 2024-08-15 19:58:06,973 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- 
>> PR
>> #7236 - [Usage] Create network billing
>> 2024-08-15 19:58:06,973 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> CREATE
>> TABLE IF NOT EXISTS `cloud_usage`.`usage_networks` ( `id` bigint(20) unsigned
>> NOT NULL AUTO_INCREMENT, `network_offering_id` bigint(20) unsigned NOT NULL,
>> `zone_id` bigint(20) unsigned NOT NULL, `network_id` bigint(20) unsigned NOT
>> NULL, `account_id` bigint(20) unsigned NOT NULL, `domain_id` bigint(20)
>> unsigned NOT NULL, `state` varchar(100) DEFAULT NULL, `removed` datetime
>> DEFAULT NULL, `created` datetime NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB
>> CHARSET=utf8
>> 2024-08-15 19:58:06,990 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>> allow for bigger urls
>> 2024-08-15 19:58:06,990 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> ALTER
>> TABLE `cloud`.`vm_template` MODIFY COLUMN `url` VARCHAR(1024) DE

Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade errors

2024-08-15 Thread Andrei Mikhailovsky

Thanks Joao,

I will try it and revert back.

Cheers
- Original Message -
> From: "João Jandre Paraquetti" 
> To: "users" 
> Sent: Thursday, 15 August, 2024 22:13:53
> Subject: Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade 
> errors

> Hello, Andrei
> 
> This is happening because the procedure IDEMPOTENT_ADD_COLUMN does not
> exist in your cloud_usage DB. You can create it manually with the
> following queries:
> 
> DROP PROCEDURE IF EXISTS `cloud_usage`.`IDEMPOTENT_ADD_COLUMN`;
> CREATE PROCEDURE `cloud_usage`.`IDEMPOTENT_ADD_COLUMN` (
>     IN in_table_name VARCHAR(200)
> , IN in_column_name VARCHAR(200)
> , IN in_column_definition VARCHAR(1000)
> )
> BEGIN
>     DECLARE CONTINUE HANDLER FOR 1060 BEGIN END; SET @ddl =
> CONCAT('ALTER TABLE ', in_table_name); SET @ddl = CONCAT(@ddl, ' ', 'ADD
> COLUMN') ; SET @ddl = CONCAT(@ddl, ' ', in_column_name); SET @ddl =
> CONCAT(@ddl, ' ', in_column_definition); PREPARE stmt FROM @ddl; EXECUTE
> stmt; DEALLOCATE PREPARE stmt; END;
> 
> After defining the procedure, you will have to restart your
> cloudstack-management service.
> 
> This type of issue will should not happen in the next releases, as
> https://github.com/apache/cloudstack/pull/9385 reworked how procedures
> are defined for ACS DBs.
> 
> Best regards,
> 
> João Jandre
> 
> On 8/15/24 16:40, Andrei Mikhailovsky wrote:
>> Hello all,
>>
>> I've tried to upgrade my cloudstack from 4.19.0.1 to the latest 4.19.1.1. I 
>> am
>> running Ubuntu Server 20.04 with latest updates using Galera + mariadb-server
>> version 10.3.39-0ubuntu0.20.04.2. the cloudstack installation is pretty old
>> (over 10+ years old) and I have been upgrading every major version release 
>> and
>> most of the minor version releases too.
>>
>> after the cloudstack-* packages were installed to the latest version and the
>> cloudstack-management server restarted I have the following information in 
>> the
>> management-server.log file:
>>
>> 2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
>> (logid:)
>> DB version = 4.19.0.1 Code Version = 4.19.1.1
>> 2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
>> (logid:)
>> Database upgrade must be performed from 4.19.0.1 to 4.19.1.1
>> 2024-08-15 19:58:06,969 DEBUG [c.c.u.DatabaseUpgradeChecker] (main:null)
>> (logid:) Running upgrade Upgrade41900to41910 to upgrade from 
>> 4.19.0.0-4.19.1.0
>> to 4.19.1.0
>> 2024-08-15 19:58:06,971 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>> Schema upgrade from 4.19.0.0 to 4.19.1.0
>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>> Updates the populated Quota tariff's types VM_DISK_BYTES_READ,
>> VM_DISK_BYTES_WRITE, VM_DISK_IO_READ and VM_DISK_IO_WRITE to the correct 
>> unit.
>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> UPDATE
>> cloud_usage.quota_tariff SET usage_unit = 'Bytes', updated_on = NOW() WHERE
>> effective_on = '2010-05-04 00:00:00' AND name IN ('VM_DISK_BYTES_READ',
>> 'VM_DISK_BYTES_WRITE')
>> 2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> UPDATE
>> cloud_usage.quota_tariff SET usage_unit = 'IOPS', updated_on = NOW() WHERE
>> effective_on = '2010-05-04 00:00:00' AND name IN ('VM_DISK_IO_READ',
>> 'VM_DISK_IO_WRITE')
>> 2024-08-15 19:58:06,973 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- 
>> PR
>> #7236 - [Usage] Create network billing
>> 2024-08-15 19:58:06,973 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> CREATE
>> TABLE IF NOT EXISTS `cloud_usage`.`usage_networks` ( `id` bigint(20) unsigned
>> NOT NULL AUTO_INCREMENT, `network_offering_id` bigint(20) unsigned NOT NULL,
>> `zone_id` bigint(20) unsigned NOT NULL, `network_id` bigint(20) unsigned NOT
>> NULL, `account_id` bigint(20) unsigned NOT NULL, `domain_id` bigint(20)
>> unsigned NOT NULL, `state` varchar(100) DEFAULT NULL, `removed` datetime
>> DEFAULT NULL, `created` datetime NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB
>> CHARSET=utf8
>> 2024-08-15 19:58:06,990 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) --
>> allow for bigger urls
>> 2024-08-15 19:58:06,990 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
>> ALTER
>> TABLE `cloud`.`vm_template` MODIFY COLUMN `url` VARCHAR(1024) DEFAULT NULL
>> COMMENT 'the url where the template exists externally'
>&

Re: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade errors

2024-08-15 Thread Andrei Mikhailovsky

Further to my previous message i've done some more digging on the DB side:

SHOW PROCEDURE STATUS WHERE Db = 'cloud_usage';
Empty set (0.001 sec)


Doesn't look like there are any procedures in the cloud_usage database. 

MariaDB [cloud_usage]> show tables;
++
| Tables_in_cloud_usage  |
++
| account|
| bucket_statistics  |
| cloud_usage|
| quota_account  |
| quota_balance  |
| quota_credits  |
| quota_email_templates  |
| quota_tariff   |
| quota_usage|
| usage_backup   |
| usage_event|
| usage_event_details|
| usage_ip_address   |
| usage_job  |
| usage_load_balancer_policy |
| usage_network  |
| usage_network_offering |
| usage_networks |
| usage_port_forwarding  |
| usage_security_group   |
| usage_snapshot_on_primary  |
| usage_storage  |
| usage_vm_disk  |
| usage_vm_instance  |
| usage_vmsnapshot   |
| usage_volume   |
| usage_vpc  |
| usage_vpn_user |
| user_statistics|
| vm_disk_statistics |
++
30 rows in set (0.000 sec)


There are a lot of tables and the backup db size is over 40mb.



In contrast, the 'cloud' database does contain a bunch of procedures, including 
the IDEMPOTENT_ADD_COLUMN which is not present in the cloud_usage db which 
seems to be the cause of the db schema upgrade:


SHOW PROCEDURE STATUS WHERE Db = 'cloud';
+---+-+---+-+-+-+---+-+--+--++
| Db| Name| Type  | Definer | 
Modified| Created | Security_type | Comment | 
character_set_client | collation_connection | Database Collation |
+---+-+---+-+-+-+---+-+--+--++
| cloud | ADD_GUEST_OS_AND_HYPERVISOR_MAPPING | PROCEDURE | cloud@localhost | 
2023-04-17 19:12:20 | 2023-04-17 19:12:20 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
| cloud | IDEMPOTENT_ADD_COLUMN   | PROCEDURE | cloud@localhost | 
2023-04-17 19:21:55 | 2023-04-17 19:21:55 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
| cloud | IDEMPOTENT_ADD_FOREIGN_KEY  | PROCEDURE | cloud@localhost | 
2023-04-17 19:21:55 | 2023-04-17 19:21:55 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
| cloud | IDEMPOTENT_ADD_KEY  | PROCEDURE | cloud@localhost | 
2023-04-17 19:21:55 | 2023-04-17 19:21:55 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
| cloud | IDEMPOTENT_ADD_UNIQUE_KEY   | PROCEDURE | cloud@localhost | 
2023-04-17 19:21:55 | 2023-04-17 19:21:55 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
| cloud | IDEMPOTENT_CHANGE_COLUMN| PROCEDURE | cloud@localhost | 
2023-04-17 19:21:55 | 2023-04-17 19:21:55 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
| cloud | IDEMPOTENT_DROP_FOREIGN_KEY | PROCEDURE | cloud@localhost | 
2023-04-17 19:21:55 | 2023-04-17 19:21:55 | DEFINER   | | utf8mb4   
   | utf8mb4_general_ci   | utf8mb4_general_ci |
+---+-+---+-+-+-+---+-+--+--++
7 rows in set (0.001 sec)


Any advice on what I am missing and how to get it fixed?

Cheers

Andrei


----- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Thursday, 15 August, 2024 21:40:07
> Subject: Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade 
> errors

> Hello all,
> 
> I've tried to upgrade my cloudstack from 4.19.0.1 to the latest 4.19.1.1. I am
> running Ubuntu Server 20.04 with latest updates using Galera + mariadb-server
> version 10.3.39-0ubuntu0.20.04.2. the cloudstack installation is pretty old
> (over 10+ years old) and I have been upgrading every major version release and
> most of the minor version releases too.
> 
> after the cloudstack-* packages were installed to the latest version and the
> cloudstack-management server restarted I have the following information

Upgrade from 4.19.0.1 to 4.19.1.1 fails due to DB schema upgrade errors

2024-08-15 Thread Andrei Mikhailovsky

Hello all, 

I've tried to upgrade my cloudstack from 4.19.0.1 to the latest 4.19.1.1. I am 
running Ubuntu Server 20.04 with latest updates using Galera + mariadb-server 
version 10.3.39-0ubuntu0.20.04.2. the cloudstack installation is pretty old 
(over 10+ years old) and I have been upgrading every major version release and 
most of the minor version releases too. 

after the cloudstack-* packages were installed to the latest version and the 
cloudstack-management server restarted I have the following information in the 
management-server.log file: 

2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) DB version = 4.19.0.1 Code Version = 4.19.1.1 
2024-08-15 19:58:06,899 INFO [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Database upgrade must be performed from 4.19.0.1 to 4.19.1.1 
2024-08-15 19:58:06,969 DEBUG [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Running upgrade Upgrade41900to41910 to upgrade from 4.19.0.0-4.19.1.0 
to 4.19.1.0 
2024-08-15 19:58:06,971 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- 
Schema upgrade from 4.19.0.0 to 4.19.1.0 
2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- 
Updates the populated Quota tariff's types VM_DISK_BYTES_READ, 
VM_DISK_BYTES_WRITE, VM_DISK_IO_READ and VM_DISK_IO_WRITE to the correct unit. 
2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
UPDATE cloud_usage.quota_tariff SET usage_unit = 'Bytes', updated_on = NOW() 
WHERE effective_on = '2010-05-04 00:00:00' AND name IN ('VM_DISK_BYTES_READ', 
'VM_DISK_BYTES_WRITE') 
2024-08-15 19:58:06,972 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
UPDATE cloud_usage.quota_tariff SET usage_unit = 'IOPS', updated_on = NOW() 
WHERE effective_on = '2010-05-04 00:00:00' AND name IN ('VM_DISK_IO_READ', 
'VM_DISK_IO_WRITE') 
2024-08-15 19:58:06,973 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- PR 
#7236 - [Usage] Create network billing 
2024-08-15 19:58:06,973 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
CREATE TABLE IF NOT EXISTS `cloud_usage`.`usage_networks` ( `id` bigint(20) 
unsigned NOT NULL AUTO_INCREMENT, `network_offering_id` bigint(20) unsigned NOT 
NULL, `zone_id` bigint(20) unsigned NOT NULL, `network_id` bigint(20) unsigned 
NOT NULL, `account_id` bigint(20) unsigned NOT NULL, `domain_id` bigint(20) 
unsigned NOT NULL, `state` varchar(100) DEFAULT NULL, `removed` datetime 
DEFAULT NULL, `created` datetime NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB 
CHARSET=utf8 
2024-08-15 19:58:06,990 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- 
allow for bigger urls 
2024-08-15 19:58:06,990 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) ALTER 
TABLE `cloud`.`vm_template` MODIFY COLUMN `url` VARCHAR(1024) DEFAULT NULL 
COMMENT 'the url where the template exists externally' 
2024-08-15 19:58:06,996 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) -- PR 
#7235 - [Usage] Create VPC billing 
2024-08-15 19:58:06,997 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) 
CREATE TABLE IF NOT EXISTS `cloud_usage`.`usage_vpc` ( `id` bigint(20) unsigned 
NOT NULL AUTO_INCREMENT, `vpc_id` bigint(20) unsigned NOT NULL, `zone_id` 
bigint(20) unsigned NOT NULL, `account_id` bigint(20) unsigned NOT NULL, 
`domain_id` bigint(20) unsigned NOT NULL, `state` varchar(100) DEFAULT NULL, 
`created` datetime NOT NULL, `removed` datetime DEFAULT NULL, PRIMARY KEY 
(`id`) ) ENGINE=InnoDB CHARSET=utf8 
2024-08-15 19:58:07,004 DEBUG [c.c.u.d.ScriptRunner] (main:null) (logid:) CALL 
`cloud_usage`.`IDEMPOTENT_ADD_COLUMN`('cloud_usage.cloud_usage', 'state', 
'VARCHAR(100) DEFAULT NULL') 

2024-08-15 19:58:07,014 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) Error 
executing: CALL 
`cloud_usage`.`IDEMPOTENT_ADD_COLUMN`('cloud_usage.cloud_usage', 'state', 
'VARCHAR(100) DEFAULT NULL') 
2024-08-15 19:58:07,015 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) 
java.sql.SQLSyntaxErrorException: PROCEDURE cloud_usage.IDEMPOTENT_ADD_COLUMN 
does not exist 
2024-08-15 19:58:07,015 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Unable to execute upgrade script 

2024-08-15 19:58:07,015 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Unable to execute upgrade script 
java.sql.SQLSyntaxErrorException: PROCEDURE cloud_usage.IDEMPOTENT_ADD_COLUMN 
does not exist 
at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185) 
at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87) 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:236)
 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.upgrade(DatabaseUpgradeChecker.java:320)
 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.check(DatabaseUpgradeChecker.java:435) 
at 
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.checkIntegrity(CloudStackExtendedLifeCycle.java:64)
 
at 
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:54)
 
at 
org.springframe

Re: change Temp / tmp folder path

2024-06-24 Thread Andrei Mikhailovsky

Thanks

- Original Message -
> From: "Nux" 
> To: "users" 
> Cc: "Andrei Mikhailovsky" 
> Sent: Friday, 21 June, 2024 23:45:52
> Subject: Re: change Temp / tmp folder path

> If you are sure it's Cloudstack's fault, then you can try to adjust
> JAVA_OPTS in /etc/default/cloudstack-management and define a tmp dir
> like this (untested..):
> -Djava.io.tmpdir=/var/tmp
> 
> hth
> 
> On 2024-06-21 15:35, Andrei Mikhailovsky wrote:
>> Hi,
>> 
>> I am noticing that the /tmp folder on my host / management servers is
>> filling up on a regular basis. I have a suspicion that some cloudstack
>> management or agent processes might be running scripts and saving data
>> in tmp. I would like to change the location of the temporary folder to
>> be /var/tmp and not /tmp. I've looked in the /etc/cloudstack folder,
>> but was not able to locate the setting option. Also, cloudstack
>> documentation didn't find anything useful. Could someone suggest how to
>> change the temp folder for cloudstack management, usage and agent
>> services?
>> 
>> P.S. I've already changed the default /tmp location of the sql server,
>> but it didn't help.
>> 
>> 
>> Thanks
>> 
> > Andrei

change Temp / tmp folder path

2024-06-21 Thread Andrei Mikhailovsky

Hi, 

I am noticing that the /tmp folder on my host / management servers is filling 
up on a regular basis. I have a suspicion that some cloudstack management or 
agent processes might be running scripts and saving data in tmp. I would like 
to change the location of the temporary folder to be /var/tmp and not /tmp. 
I've looked in the /etc/cloudstack folder, but was not able to locate the 
setting option. Also, cloudstack documentation didn't find anything useful. 
Could someone suggest how to change the temp folder for cloudstack management, 
usage and agent services? 

P.S. I've already changed the default /tmp location of the sql server, but it 
didn't help. 


Thanks 

Andrei

Re: UI Slowness while populating Instance

2024-06-20 Thread Andrei Mikhailovsky

Nixon, 

also, i would suggest keeping an eye on the amount of rows in the vm_stats 
table. In my case, ACS hasn't been removing them properly and as a result i've 
accumulated over 25m rows in that table, which caused the slow response in the 
acs gui.

Something like this:

SELECT table_name, table_rows FROM information_schema.tables WHERE table_schema 
= 'cloud';

check the vm_stats rows. If you need to remove some old data, you could do 
something like this:

DELETE FROM vm_stats WHERE timestamp < '2024-05-20 00:00:01';

that will remove all rows with the timestamp older than 20th of May 2024. If 
you have a lot of data, you might need remove the data in shorter increments. 
I've found that my /tmp folder didn't have enough space, so i had to play 
around with dates to remove all the old data.

Now, the ACS ui is pretty fast and usable again.

Hope that helps.

Andriej

- Original Message -
> From: "Nixon Varghese K S" 
> To: "users" 
> Sent: Thursday, 20 June, 2024 12:14:16
> Subject: Re: UI Slowness while populating Instance

> Hi,
> 
> Thank you for the suggestions..
> 
> @Andrei I had done the same settings on global configuration and now UI
> seems to be pretty fast.. Thank you so much for the help...
> 
> @Joao Thanks for the information that 14.19.1 will have much improved
> functions...
> 
> Thank you guys...
> 
> With Regards
> Nixon Varghese
> 
> On Tue, Jun 18, 2024 at 11:47 PM João Jandre Paraquetti <
> j...@scclouds.com.br> wrote:
> 
>> Hi, Nixon
>>
>> What you are experiencing is most likely the same as Andrei (see
>> https://lists.apache.org/thread/ltsw9tkkxv6pl2tr9r4q5m34xwlxxbqg), by
>> default, the API used by the UI to list the VMs also lists the VM's
>> metrics; since you have 100+ VMs, it is understandable that it would
>> take some time to list all of those metrics. This behavior has been
>> discussed and changed with PR
>> https://github.com/apache/cloudstack/pull/8782. On the next minor
>> release (4.19.1) there will be a configuration to let you change the
>> behavior of the `listVirtualMachines` API so that it does not return the
>> metrics by default.
>>
>> Also, if you have too many metrics collected, you might run into the
>> issue that is described here
>> https://github.com/apache/cloudstack/pull/8740, where due to the amount
>> of metrics that ACS tries to delete in a single query, the query always
>> times out, snowballing into a huge amount of metrics on your DB, slowing
>> you down even more. The linked PR solves this adding another
>> configuration to limit the amount of metrics deleted per query,
>> hopefully it will be in by 4.19.1.0. Until then, if you notice that the
>> metrics are not being deleted, you might have to manually delete the old
>> ones on the DB.
>>
>> Best regards,
>>
>> João Jandre
>>
>> On 6/18/24 06:31, Nixon Varghese K S wrote:
>> > Hello,
>> >
>> > I am using ACS 4.18.0.0v in my production environment, and more than 100
>> > instances, including Kubernetes instances, are running on my setup. The
>> > user interface appears to be stuck in the loading phase when you click on
>> > the instance page; you will need to wait five to ten minutes for the
>> > instance list to appear. Not just the instance page, but also the place
>> > where instances are listed out; for example, adding port forwarding or
>> > listing instances running in VR everywhere they are the same. Is anyone
>> > facing the same issue?
>> > I checked the management log and saw that there was no error message and
>> > that the management server's resource utilization was normal. . If
>> someone
>> > could provide some troubleshooting steps to identify the issue, that
>> would
>> > be very helpful.
>> >
>>
> 
> 
> --
> With Regards,
> Nixon Varghese

Re: Disabling VM metrics

2024-06-18 Thread Andrei Mikhailovsky

Thank you, Joao,

This feature would be of a great help for me for sure. Can't wait for the 
4.19.1 to be released!

In the mean time, i will check out the cloud-usage database and purge the old 
records. Perhaps that would help too.

Cheers


- Original Message -
> From: "João Jandre Paraquetti" 
> To: "users" 
> Sent: Tuesday, 18 June, 2024 19:09:30
> Subject: Re: Disabling VM metrics

> Hello, Andrei
> 
> Currently you cannot disable listing metrics when using the UI, this
> behavior has been discussed and changed with PR
> https://github.com/apache/cloudstack/pull/8782. On the next minor
> release (4.19.1) there will be a configuration to let you change the
> behavior of the `listVirtualMachines` API so that it does not return the
> metrics by default.
> 
> Furthermore, if you do not care for the metrics, you could change the
> `vm.stats.max.retention.time` value to 1, so that the VM's metrics are
> only kept for one minute; thus minimizing the amount of metrics being
> listed. Beware that setting this value to 0 or lower will disable the
> metrics cleanup, so 1 is the lowest value you can go.
> 
> Also, if you have too many metrics collected, you might run into the
> issue that is described here
> https://github.com/apache/cloudstack/pull/8740, where due to the amount
> of metrics that ACS tries to delete in a single query, the query always
> times out and this snowballs into a huge amount of metrics on your DB.
> The linked PR solves this adding another configuration to limit the
> amount of metrics deleted per query, hopefully it will be in by
> 4.19.1.0. Until then, if you notice that the metrics are not being
> deleted, you might have to manually delete the old ones on the DB.
> 
> Best regards,
> 
> João Jandre
> 
> On 6/18/24 03:40, Andrei Mikhailovsky wrote:
>> Hello everyone,
>>
>> Could someone recommend me a way to disable the collection of vm metrics? It 
>> is
>> currently taking a long time (10-20 seconds) for the vm information to show 
>> up
>> when clicking on a vm. Also, when trying to attach a volume(s) to a vm, the
>> full list of vms is not populating (my guess due to the timeout when 
>> retrieving
>> vm metrics). I've tried reducing the length of collected data in the 
>> cloudstack
>> settings, as suggested by someone on this list, but this did not solve the
>> problem.
>>
>> Currently, the vm metrics data are not being used and their collection is
>> causing too much inconvenience and at times makes the gui unusable. Could
>> someone suggest the way the collection of vm metrics data could be switched
>> off?
>>
>> Many thanks
>>
>> Andrei

Re: UI Slowness while populating Instance

2024-06-18 Thread Andrei Mikhailovsky

One thing to note and check is the following Global setting: 
vm.stats.user.vm.only

If you set this to True, the system vm stats will not be collected. So, when 
clicking on the system vms and virtual routers, the information should be 
populated pretty instantly. This is true in my case. However, i still have the 
issues with the virtual machines.


My Global settings are: 

vm.stats.interval - 60
vm.stats.max.retention.time - 60

With the above settings, the data from vms should be collected every 10 minutes 
and it should keep the data for one hour, so 6 readings per vm. I have under 
100 running vms and yet, the UI performance is pretty appalling.

Andriej  

- Original Message -
> From: "Nixon Varghese K S" 
> To: "users" 
> Sent: Tuesday, 18 June, 2024 10:31:03
> Subject: UI Slowness while populating Instance

> Hello,
> 
> I am using ACS 4.18.0.0v in my production environment, and more than 100
> instances, including Kubernetes instances, are running on my setup. The
> user interface appears to be stuck in the loading phase when you click on
> the instance page; you will need to wait five to ten minutes for the
> instance list to appear. Not just the instance page, but also the place
> where instances are listed out; for example, adding port forwarding or
> listing instances running in VR everywhere they are the same. Is anyone
> facing the same issue?
> I checked the management log and saw that there was no error message and
> that the management server's resource utilization was normal. . If someone
> could provide some troubleshooting steps to identify the issue, that would
> be very helpful.
> 
> --
> With Regards,
> Nixon Varghese

Re: UI Slowness while populating Instance

2024-06-18 Thread Andrei Mikhailovsky

Hi Nixon,

I have similar issues, but in my case, it takes around 15 seconds to load. I've 
investigated this before and posted here on the list, but no solutions seem to 
resolve the issue

What i've noticed is that it looks like it relates to the collection and 
display of Metrics data on vms. My deduction was that it takes pretty much 
identical amount of time to show the Metrics page of a vm and showing the vm 
itself. Any other vm tabs are showing me the information pretty much instantly.

You could investigate on reducing the amount of metrics data you store, but 
that really didn't help my case.

I've posted this morning if there is a way to switch off the collection of vm 
metrics data altogether. But no one yet replied.

Please share your thoughts and successes. it would be good to learn how to deal 
with it.

P.S. I am using the latest 4.19.x branch, but had the same issue with 4.18 too. 
The problem started to show up when we've upgraded to either 4.18 branch or 
4.17. I can't remember exactly. It's been well over a year i think.

Cheers

Andrei

- Original Message -
> From: "Nixon Varghese K S" 
> To: "users" 
> Sent: Tuesday, 18 June, 2024 10:31:03
> Subject: UI Slowness while populating Instance

> Hello,
> 
> I am using ACS 4.18.0.0v in my production environment, and more than 100
> instances, including Kubernetes instances, are running on my setup. The
> user interface appears to be stuck in the loading phase when you click on
> the instance page; you will need to wait five to ten minutes for the
> instance list to appear. Not just the instance page, but also the place
> where instances are listed out; for example, adding port forwarding or
> listing instances running in VR everywhere they are the same. Is anyone
> facing the same issue?
> I checked the management log and saw that there was no error message and
> that the management server's resource utilization was normal. . If someone
> could provide some troubleshooting steps to identify the issue, that would
> be very helpful.
> 
> --
> With Regards,
> Nixon Varghese

Disabling VM metrics

2024-06-17 Thread Andrei Mikhailovsky

Hello everyone, 

Could someone recommend me a way to disable the collection of vm metrics? It is 
currently taking a long time (10-20 seconds) for the vm information to show up 
when clicking on a vm. Also, when trying to attach a volume(s) to a vm, the 
full list of vms is not populating (my guess due to the timeout when retrieving 
vm metrics). I've tried reducing the length of collected data in the cloudstack 
settings, as suggested by someone on this list, but this did not solve the 
problem. 

Currently, the vm metrics data are not being used and their collection is 
causing too much inconvenience and at times makes the gui unusable. Could 
someone suggest the way the collection of vm metrics data could be switched 
off? 

Many thanks 

Andrei

Re: Slow Metrics output in GUI

2024-03-06 Thread Andrei Mikhailovsky

Thanks Joan,

I have updated vm.stats.max.retention.time value to 60 to see if this resolves 
my slow performance.

Cheers

- Original Message -
> From: "Joan g" 
> To: "users" 
> Sent: Tuesday, 27 February, 2024 10:01:05
> Subject: Re: Slow Metrics output in GUI

> just cleaning up the vm_stats table in cloudstack 'cloud' db.
> 
>> truncate table vm_stats;
> 
> and setting  vm.stats.max.retention.time to a lower value addressed our
> issues.
> 
> Joan
> 
> On Tue, Feb 27, 2024 at 12:19 AM Andrei Mikhailovsky
>  wrote:
> 
>> Interesting.
>>
>> Joan, do you mind sharing how you are doing it?
>>
>> Thanks
>>
>> - Original Message -
>> > From: "Joan g" 
>> > To: "users" 
>> > Sent: Monday, 26 February, 2024 18:06:58
>> > Subject: Re: Slow Metrics output in GUI
>>
>> > I am facing the same problem in my 4.17.2 version. We are manually
>> clearing
>> > the stats table to  make the instance list page load faster :(
>> >
>> >
>> > Joan
>> >
>> > On Mon, 26 Feb, 2024, 22:24 Andrei Mikhailovsky,
>> 
>> > wrote:
>> >
>> >> Hello everyone,
>> >>
>> >> My setup: ACS 4.18.1.0 on Ubuntu 20.04.6. Two management servers and
>> mysql
>> >> active-active replication.
>> >>
>> >>
>> >> I seem to have a very slow response on viewing vms. It takes about 20
>> >> seconds for the vm data to show when I click on any vm under Compute >
>> >> Instances. When I click on various vm tabs (like NICs, Disks, Details,
>> etc)
>> >> the only tab that takes about 15-20 seconds to refresh is the Metrics
>> tab.
>> >> When the spinner stops I get the following message: "No data to show for
>> >> the selected period." Also this information is shown in red colour: The
>> >> Control Plane Status of this instance is Offline. Some actions on this
>> >> instance will fail, if so please wait a while and retry. When I click on
>> >> the 12 or 24 hours tab it takes a bit of time, but it does show the
>> tables
>> >> and the message in red colour is not shown.
>> >> On mysql server I see the mysql process is using over 100% cpu (with 0%
>> >> iowait) while ACS tries to retrieve the Metrics data. Also, the
>> >> cloudstack-management server cpu usage goes to 200-400%.
>> >>
>> >>
>> >> I've tried all the obvious (restarting management servers, stopping one
>> of
>> >> the management servers, restarting host servers).
>> >>
>> >> Does anyone know what is the issue? why does it take so long to retrieve
>> >> the vm data and metrics? I don't remember having this problem before
>> 4.18.
>> >>
>> >> Many thanks for any pointers.
>> >>
>> >> cheers
>> >>
>> >> Andrei
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>

Re: Slow Metrics output in GUI

2024-02-26 Thread Andrei Mikhailovsky

Interesting.

Joan, do you mind sharing how you are doing it?

Thanks

- Original Message -
> From: "Joan g" 
> To: "users" 
> Sent: Monday, 26 February, 2024 18:06:58
> Subject: Re: Slow Metrics output in GUI

> I am facing the same problem in my 4.17.2 version. We are manually clearing
> the stats table to  make the instance list page load faster :(
> 
> 
> Joan
> 
> On Mon, 26 Feb, 2024, 22:24 Andrei Mikhailovsky, 
> wrote:
> 
>> Hello everyone,
>>
>> My setup: ACS 4.18.1.0 on Ubuntu 20.04.6. Two management servers and mysql
>> active-active replication.
>>
>>
>> I seem to have a very slow response on viewing vms. It takes about 20
>> seconds for the vm data to show when I click on any vm under Compute >
>> Instances. When I click on various vm tabs (like NICs, Disks, Details, etc)
>> the only tab that takes about 15-20 seconds to refresh is the Metrics tab.
>> When the spinner stops I get the following message: "No data to show for
>> the selected period." Also this information is shown in red colour: The
>> Control Plane Status of this instance is Offline. Some actions on this
>> instance will fail, if so please wait a while and retry. When I click on
>> the 12 or 24 hours tab it takes a bit of time, but it does show the tables
>> and the message in red colour is not shown.
>> On mysql server I see the mysql process is using over 100% cpu (with 0%
>> iowait) while ACS tries to retrieve the Metrics data. Also, the
>> cloudstack-management server cpu usage goes to 200-400%.
>>
>>
>> I've tried all the obvious (restarting management servers, stopping one of
>> the management servers, restarting host servers).
>>
>> Does anyone know what is the issue? why does it take so long to retrieve
>> the vm data and metrics? I don't remember having this problem before 4.18.
>>
>> Many thanks for any pointers.
>>
>> cheers
>>
>> Andrei
>>
>>
>>
>>
>>
>>
>>

Slow Metrics output in GUI

2024-02-26 Thread Andrei Mikhailovsky

Hello everyone, 

My setup: ACS 4.18.1.0 on Ubuntu 20.04.6. Two management servers and mysql 
active-active replication. 


I seem to have a very slow response on viewing vms. It takes about 20 seconds 
for the vm data to show when I click on any vm under Compute > Instances. When 
I click on various vm tabs (like NICs, Disks, Details, etc) the only tab that 
takes about 15-20 seconds to refresh is the Metrics tab. When the spinner stops 
I get the following message: "No data to show for the selected period." Also 
this information is shown in red colour: The Control Plane Status of this 
instance is Offline. Some actions on this instance will fail, if so please wait 
a while and retry. When I click on the 12 or 24 hours tab it takes a bit of 
time, but it does show the tables and the message in red colour is not shown. 
On mysql server I see the mysql process is using over 100% cpu (with 0% iowait) 
while ACS tries to retrieve the Metrics data. Also, the cloudstack-management 
server cpu usage goes to 200-400%. 


I've tried all the obvious (restarting management servers, stopping one of the 
management servers, restarting host servers). 

Does anyone know what is the issue? why does it take so long to retrieve the vm 
data and metrics? I don't remember having this problem before 4.18. 

Many thanks for any pointers. 

cheers 

Andrei

Re: KVM + Ceph volume attach problem

2023-12-18 Thread Andrei Mikhailovsky

Thanks, Jayanth,

I've tried it and it didn't solve the issue. I will check out the bug page to 
open a ticket.

Cheers
- Original Message -
> From: "Jayanth Reddy" 
> To: "users" 
> Sent: Sunday, 17 December, 2023 23:46:29
> Subject: Re: KVM + Ceph volume attach problem

> Hello Andrei,
> Please try setting "default.page.size" to 500 or lesser and see if the issue
> still persists. IIRC, there were a couple of PRs merged to 4.17.x for dealing
> with such large items in the API response and aimed at focusing the
> improvements in UI.
> 
> We've also got v4.16.1.0 where there are over 4000 compute offerings with page
> "default.page.size" being 2000 where it takes 10 to 15 seconds for the first
> page to the returned with 20 items. I've not experienced any weird responses
> from the API though.
> 
> 
> Regards,
> Jayanth
> 
> From: Andrei Mikhailovsky 
> Sent: Monday, December 18, 2023 2:41:38 AM
> To: users 
> Subject: Re: KVM + Ceph volume attach problem
> 
> Sure, here are the values for:
> 
> default.page.size: 5000
> default.ui.page.size: 20
> 
> My setup is: ACS: 4.18.1.0 ; Ceph: 17.2.7
> 
> The cluster has been set up around 10 years ago with incremental upgrades of
> every major version. I've done most of the minor version upgrades too, but may
> have missed one or two over the 10 years period.
> 
> I have two management servers with are running active/active. I've tested with
> one of management servers offline and still have this issue. The was no
> infrastructure changes done for a good few years. The only changes were done
> were the standard os level security updates and minor version updates for both
> ACS and Ceph.
> 
> Thanks for trying to help.
> 
> Andrei
> 
> - Original Message -
>> From: "Jayanth Reddy" 
>> To: "users" 
>> Sent: Sunday, 17 December, 2023 15:35:16
>> Subject: Re: KVM + Ceph volume attach problem
> 
>> Hello Andrei,
>> Please share the value set for "default.page.size" & "default.ui.page.size"
>> in your global settings. I've got the same setup v4.18.1.0 with Ceph
>> v18.2.0 and I don't seem to experience this issue. Is this a new setup or
>> upgraded one from earlier versions? How are your management servers set up,
>> are they behind a Load Balancer, and if yes, have you tried testing with
>> individual management server endpoints? Also please help me in
>> understanding whether there was any infrastructure change done recently.
>>
>> Regards,
>> Jayanth Reddy
>>
>> On Sun, Dec 17, 2023 at 6:15 PM Andrei Mikhailovsky
>>  wrote:
>>
>>> Hi Stephan,
>>>
>>> I've checked the browser cache and it's not what is causing the problem.
>>> Done a few private sessions and the issue persists. I've done some more
>>> testing and I think it might relate to some sort of timeout / delay when
>>> the web gui retrieves the list of vms when I try to attach a volume.
>>>
>>> So, I click on the attach icon and I have a new modal window with a
>>> spinning circle. The circle spins for about 5 seconds or so and goes away.
>>> I click on the VM ID and it shows me about 5-6 vms. I am expecting at least
>>> about 15 in the list. Now, when I type the name of the vm (which is not in
>>> the list that is shown up to begin with) in the field and wait about 15-20
>>> seconds, the vm all of a sudden is shown. After that I delete the name that
>>> I've just typed and now the list is fully populated with about 15 or so vms.
>>>
>>> This happens every time. I've restarted the ACS, the database service,
>>> I've even restarted the ACS server itself and the DB servers. the behaviour
>>> is repeatable. It happens in Safari and Chrome.
>>>
>>> Very very strange.
>>>
>>> Andrei
>>>
>>> - Original Message -
>>> > From: "Andrei Mikhailovsky" 
>>> > To: "users" 
>>> > Sent: Friday, 15 December, 2023 12:44:58
>>> > Subject: Re: KVM + Ceph volume attach problem
>>>
>>> > Stephan, thanks for the quick reply. I doubt the browser cache cleaning
>>> was
>>> > done. Let me check using Private window to see if the problem is with the
>>> > caches.
>>> >
>>> > Cheers
>>> >
>>> > - Original Message -
>>> >> From: "Stephan Bienek" 
>>> >&g

Re: KVM + Ceph volume attach problem

2023-12-17 Thread Andrei Mikhailovsky

Sure, here are the values for: 

default.page.size: 5000
default.ui.page.size: 20

My setup is: ACS: 4.18.1.0 ; Ceph: 17.2.7

The cluster has been set up around 10 years ago with incremental upgrades of 
every major version. I've done most of the minor version upgrades too, but may 
have missed one or two over the 10 years period.

I have two management servers with are running active/active. I've tested with 
one of management servers offline and still have this issue. The was no 
infrastructure changes done for a good few years. The only changes were done 
were the standard os level security updates and minor version updates for both 
ACS and Ceph.

Thanks for trying to help.

Andrei

- Original Message -
> From: "Jayanth Reddy" 
> To: "users" 
> Sent: Sunday, 17 December, 2023 15:35:16
> Subject: Re: KVM + Ceph volume attach problem

> Hello Andrei,
> Please share the value set for "default.page.size" & "default.ui.page.size"
> in your global settings. I've got the same setup v4.18.1.0 with Ceph
> v18.2.0 and I don't seem to experience this issue. Is this a new setup or
> upgraded one from earlier versions? How are your management servers set up,
> are they behind a Load Balancer, and if yes, have you tried testing with
> individual management server endpoints? Also please help me in
> understanding whether there was any infrastructure change done recently.
> 
> Regards,
> Jayanth Reddy
> 
> On Sun, Dec 17, 2023 at 6:15 PM Andrei Mikhailovsky
>  wrote:
> 
>> Hi Stephan,
>>
>> I've checked the browser cache and it's not what is causing the problem.
>> Done a few private sessions and the issue persists. I've done some more
>> testing and I think it might relate to some sort of timeout / delay when
>> the web gui retrieves the list of vms when I try to attach a volume.
>>
>> So, I click on the attach icon and I have a new modal window with a
>> spinning circle. The circle spins for about 5 seconds or so and goes away.
>> I click on the VM ID and it shows me about 5-6 vms. I am expecting at least
>> about 15 in the list. Now, when I type the name of the vm (which is not in
>> the list that is shown up to begin with) in the field and wait about 15-20
>> seconds, the vm all of a sudden is shown. After that I delete the name that
>> I've just typed and now the list is fully populated with about 15 or so vms.
>>
>> This happens every time. I've restarted the ACS, the database service,
>> I've even restarted the ACS server itself and the DB servers. the behaviour
>> is repeatable. It happens in Safari and Chrome.
>>
>> Very very strange.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Andrei Mikhailovsky" 
>> > To: "users" 
>> > Sent: Friday, 15 December, 2023 12:44:58
>> > Subject: Re: KVM + Ceph volume attach problem
>>
>> > Stephan, thanks for the quick reply. I doubt the browser cache cleaning
>> was
>> > done. Let me check using Private window to see if the problem is with the
>> > caches.
>> >
>> > Cheers
>> >
>> > - Original Message -
>> >> From: "Stephan Bienek" 
>> >> To: "users" 
>> >> Sent: Friday, 15 December, 2023 11:14:25
>> >> Subject: Re: KVM + Ceph volume attach problem
>> >
>> >> Hello Andrei,
>> >>
>> >> we are using Ceph with RBD images on CloudStack 4.18.1.0 as well but i
>> can not
>> >> reproduce the issue you are observing.
>> >>
>> >> Using the WebUI i created a new Volume, selected "Attach Disk" and all
>> VMs are
>> >> shown, running and stopped.
>> >> Attaching to a running VM within the WebUI works as expected in my
>> setup.
>> >>
>> >> Probably a simple thing, but did you/the users try cleaning your
>> browser cache
>> >> after the CloudStack Update?
>> >>
>> >> Best regards,
>> >> Stephan
>> >>
>> >>> Andrei Mikhailovsky  hat am 15.12.2023
>> 11:49 CET
>> >>> geschrieben:
>> >>>
>> >>>
>> >>> Hello guys,
>> >>>
>> >>> Any updates or thoughts on what's causing the issue with attaching
>> volumes in
>> >>> the latest ACS? I still can't attach any volumes to a running vis from
>> the ACS
>> >>> web gui. This is a bit of a problem. As I've mentioned earlier, no
>> e

Re: AW: KVM + Ceph volume attach problem

2023-12-17 Thread Andrei Mikhailovsky

Hi Sven,

Thanks for sharing your experience. Our setup is fairly small and I always 
update every piece of the cluster (acs or ceph) when I start the upgrade 
process. All acs related services are running 4.18.1.0 and all ceph services 
are on 17.2.7

Cheers

- Original Message -
> From: "Sven Barczyk" 
> To: "users" 
> Sent: Sunday, 17 December, 2023 15:13:54
> Subject: AW: KVM + Ceph volume attach problem

> Hi Guys,
> not an solution but an maybe an hint.
> 
> We made an strange experience with Ceph and ACS, after some time of usage ACS
> became slower on starting and stopping Instances, took longer than usual.
> Everthing involving attaching and detaching a Compute-Node with Ceph became
> chewy.
> Our assumption was, that after some time and with more VMs running, ACS become
> just slower.
> 
> The thing is, i started an update-spree on the OSDs without finishing it,
> because the hiccups were to bold and some customers went crazy over those
> hiccups.
> i never spend a the smallest thought about the different versions in our Ceph,
> because of its downwards compatibility.
> 
> But after some "lets update our Ceph to a newer Version", with all OSDs.
> The Problem fixed itself after the last OSD was updated. With all OSDs  and 
> Mons
> on the same Version the problem with slow ACS and ceph was fixed.
> 
> My Advice ; Keep your OSD and Mon verions clean.
> 
> KR
> Sven
> 
> 
> -Ursprüngliche Nachricht-
> Von: Andrei Mikhailovsky 
> Gesendet: Sonntag, 17. Dezember 2023 13:45
> An: users 
> Betreff: Re: KVM + Ceph volume attach problem
> 
> Hi Stephan,
> 
> I've checked the browser cache and it's not what is causing the problem. Done 
> a
> few private sessions and the issue persists. I've done some more testing and I
> think it might relate to some sort of timeout / delay when the web gui
> retrieves the list of vms when I try to attach a volume.
> 
> So, I click on the attach icon and I have a new modal window with a spinning
> circle. The circle spins for about 5 seconds or so and goes away. I click on
> the VM ID and it shows me about 5-6 vms. I am expecting at least about 15 in
> the list. Now, when I type the name of the vm (which is not in the list that 
> is
> shown up to begin with) in the field and wait about 15-20 seconds, the vm all
> of a sudden is shown. After that I delete the name that I've just typed and 
> now
> the list is fully populated with about 15 or so vms.
> 
> This happens every time. I've restarted the ACS, the database service, I've 
> even
> restarted the ACS server itself and the DB servers. the behaviour is
> repeatable. It happens in Safari and Chrome.
> 
> Very very strange.
> 
> Andrei
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Friday, 15 December, 2023 12:44:58
>> Subject: Re: KVM + Ceph volume attach problem
> 
>> Stephan, thanks for the quick reply. I doubt the browser cache
>> cleaning was done. Let me check using Private window to see if the
>> problem is with the caches.
>> 
>> Cheers
>> 
>> - Original Message -
>>> From: "Stephan Bienek" 
>>> To: "users" 
>>> Sent: Friday, 15 December, 2023 11:14:25
>>> Subject: Re: KVM + Ceph volume attach problem
>> 
>>> Hello Andrei,
>>> 
>>> we are using Ceph with RBD images on CloudStack 4.18.1.0 as well but
>>> i can not reproduce the issue you are observing.
>>> 
>>> Using the WebUI i created a new Volume, selected "Attach Disk" and
>>> all VMs are shown, running and stopped.
>>> Attaching to a running VM within the WebUI works as expected in my setup.
>>> 
>>> Probably a simple thing, but did you/the users try cleaning your
>>> browser cache after the CloudStack Update?
>>> 
>>> Best regards,
>>> Stephan
>>> 
>>>> Andrei Mikhailovsky  hat am 15.12.2023
>>>> 11:49 CET
>>>> geschrieben:
>>>> 
>>>>  
>>>> Hello guys,
>>>> 
>>>> Any updates or thoughts on what's causing the issue with attaching
>>>> volumes in the latest ACS? I still can't attach any volumes to a
>>>> running vis from the ACS web gui. This is a bit of a problem. As
>>>> I've mentioned earlier, no errors in the management.log. The
>>>> attaching works okay from the cmk. However, most of the users that we have 
>>>> are
>>>> using web gui to manage their vms.
>>&g

Re: KVM + Ceph volume attach problem

2023-12-17 Thread Andrei Mikhailovsky

Hi Stephan,

I've checked the browser cache and it's not what is causing the problem. Done a 
few private sessions and the issue persists. I've done some more testing and I 
think it might relate to some sort of timeout / delay when the web gui 
retrieves the list of vms when I try to attach a volume.

So, I click on the attach icon and I have a new modal window with a spinning 
circle. The circle spins for about 5 seconds or so and goes away. I click on 
the VM ID and it shows me about 5-6 vms. I am expecting at least about 15 in 
the list. Now, when I type the name of the vm (which is not in the list that is 
shown up to begin with) in the field and wait about 15-20 seconds, the vm all 
of a sudden is shown. After that I delete the name that I've just typed and now 
the list is fully populated with about 15 or so vms.

This happens every time. I've restarted the ACS, the database service, I've 
even restarted the ACS server itself and the DB servers. the behaviour is 
repeatable. It happens in Safari and Chrome.

Very very strange.

Andrei 

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Friday, 15 December, 2023 12:44:58
> Subject: Re: KVM + Ceph volume attach problem

> Stephan, thanks for the quick reply. I doubt the browser cache cleaning was
> done. Let me check using Private window to see if the problem is with the
> caches.
> 
> Cheers
> 
> - Original Message -
>> From: "Stephan Bienek" 
>> To: "users" 
>> Sent: Friday, 15 December, 2023 11:14:25
>> Subject: Re: KVM + Ceph volume attach problem
> 
>> Hello Andrei,
>> 
>> we are using Ceph with RBD images on CloudStack 4.18.1.0 as well but i can 
>> not
>> reproduce the issue you are observing.
>> 
>> Using the WebUI i created a new Volume, selected "Attach Disk" and all VMs 
>> are
>> shown, running and stopped.
>> Attaching to a running VM within the WebUI works as expected in my setup.
>> 
>> Probably a simple thing, but did you/the users try cleaning your browser 
>> cache
>> after the CloudStack Update?
>> 
>> Best regards,
>> Stephan
>> 
>>> Andrei Mikhailovsky  hat am 15.12.2023 11:49 CET
>>> geschrieben:
>>> 
>>>  
>>> Hello guys,
>>> 
>>> Any updates or thoughts on what's causing the issue with attaching volumes 
>>> in
>>> the latest ACS? I still can't attach any volumes to a running vis from the 
>>> ACS
>>> web gui. This is a bit of a problem. As I've mentioned earlier, no errors in
>>> the management.log. The attaching works okay from the cmk. However, most of 
>>> the
>>> users that we have are using web gui to manage their vms.
>>> 
>>> Any help would be appreciated.
>>> 
>>> Cheers
>>> 
>>> Andrei
>>> 
>>> - Original Message -
>>> > From: "Andrei Mikhailovsky" 
>>> > To: "users" 
>>> > Sent: Monday, 11 December, 2023 11:39:27
>>> > Subject: Re: KVM + Ceph volume attach problem
>>> 
>>> > Hey Wei,
>>> > 
>>> > I tested and I can attach it from the cloudmonkey just fine.
>>> > 
>>> > Do you know what could be causing the issue in the GUI?
>>> > 
>>> > Cheers
>>> > 
>>> > Andrei
>>> > 
>>> > - Original Message -
>>> >> From: "Wei ZHOU" 
>>> >> To: "users" 
>>> >> Sent: Monday, 11 December, 2023 10:15:20
>>> >> Subject: Re: KVM + Ceph volume attach problem
>>> > 
>>> >> Hi,
>>> >> 
>>> >> Do the volumes/vms belong to the same account ?
>>> >> 
>>> >> Can you try with cmk/cloudmonkey ?
>>> >> 
>>> >> -Wei
>>> >> 
>>> >> On Mon, 11 Dec 2023 at 11:05, Andrei Mikhailovsky 
>>> >> 
>>> >> wrote:
>>> >> 
>>> >>> Hello guys,
>>> >>>
>>> >>> I am having a strange issue which I've not noticed before. I am running
>>> >>> ACS version 4.18.1.0 with ceph using rbd for images. I've created a new
>>> >>> volume and trying to attach it. However, the list of VMs that I can 
>>> >>> attach
>>> >>> a volume to is only showing STOPPED VMs. I can't seem to attach any 
>>> >>> volumes
>>> >>> to a running vm. I've done the usual, management server logs checking,
>>> >>> restarting the acs management server, etc. It didn't help. I've also
>>> >>> noticed that the nfs storage pool volumes are experiencing the same 
>>> >>> issue.
>>> >>>
>>> >>> I am sure that I was able to attach volumes to a running vm in the past.
>>> >>> What could be causing the issue?
>>> >>>
>>> >>> Cheers
>>> >>>
> > > > >> Andrei

Re: KVM + Ceph volume attach problem

2023-12-15 Thread Andrei Mikhailovsky

Stephan, thanks for the quick reply. I doubt the browser cache cleaning was 
done. Let me check using Private window to see if the problem is with the 
caches.

Cheers

- Original Message -
> From: "Stephan Bienek" 
> To: "users" 
> Sent: Friday, 15 December, 2023 11:14:25
> Subject: Re: KVM + Ceph volume attach problem

> Hello Andrei,
> 
> we are using Ceph with RBD images on CloudStack 4.18.1.0 as well but i can not
> reproduce the issue you are observing.
> 
> Using the WebUI i created a new Volume, selected "Attach Disk" and all VMs are
> shown, running and stopped.
> Attaching to a running VM within the WebUI works as expected in my setup.
> 
> Probably a simple thing, but did you/the users try cleaning your browser cache
> after the CloudStack Update?
> 
> Best regards,
> Stephan
> 
>> Andrei Mikhailovsky  hat am 15.12.2023 11:49 CET
>> geschrieben:
>> 
>>  
>> Hello guys,
>> 
>> Any updates or thoughts on what's causing the issue with attaching volumes in
>> the latest ACS? I still can't attach any volumes to a running vis from the 
>> ACS
>> web gui. This is a bit of a problem. As I've mentioned earlier, no errors in
>> the management.log. The attaching works okay from the cmk. However, most of 
>> the
>> users that we have are using web gui to manage their vms.
>> 
>> Any help would be appreciated.
>> 
>> Cheers
>> 
>> Andrei
>> 
>> - Original Message -
>> > From: "Andrei Mikhailovsky" 
>> > To: "users" 
>> > Sent: Monday, 11 December, 2023 11:39:27
>> > Subject: Re: KVM + Ceph volume attach problem
>> 
>> > Hey Wei,
>> > 
>> > I tested and I can attach it from the cloudmonkey just fine.
>> > 
>> > Do you know what could be causing the issue in the GUI?
>> > 
>> > Cheers
>> > 
>> > Andrei
>> > 
>> > - Original Message -
>> >> From: "Wei ZHOU" 
>> >> To: "users" 
>> >> Sent: Monday, 11 December, 2023 10:15:20
>> >> Subject: Re: KVM + Ceph volume attach problem
>> > 
>> >> Hi,
>> >> 
>> >> Do the volumes/vms belong to the same account ?
>> >> 
>> >> Can you try with cmk/cloudmonkey ?
>> >> 
>> >> -Wei
>> >> 
>> >> On Mon, 11 Dec 2023 at 11:05, Andrei Mikhailovsky 
>> >> 
>> >> wrote:
>> >> 
>> >>> Hello guys,
>> >>>
>> >>> I am having a strange issue which I've not noticed before. I am running
>> >>> ACS version 4.18.1.0 with ceph using rbd for images. I've created a new
>> >>> volume and trying to attach it. However, the list of VMs that I can 
>> >>> attach
>> >>> a volume to is only showing STOPPED VMs. I can't seem to attach any 
>> >>> volumes
>> >>> to a running vm. I've done the usual, management server logs checking,
>> >>> restarting the acs management server, etc. It didn't help. I've also
>> >>> noticed that the nfs storage pool volumes are experiencing the same 
>> >>> issue.
>> >>>
>> >>> I am sure that I was able to attach volumes to a running vm in the past.
>> >>> What could be causing the issue?
>> >>>
>> >>> Cheers
>> >>>
> > > >> Andrei

Re: KVM + Ceph volume attach problem

2023-12-15 Thread Andrei Mikhailovsky

Hello guys,

Any updates or thoughts on what's causing the issue with attaching volumes in 
the latest ACS? I still can't attach any volumes to a running vis from the ACS 
web gui. This is a bit of a problem. As I've mentioned earlier, no errors in 
the management.log. The attaching works okay from the cmk. However, most of the 
users that we have are using web gui to manage their vms.

Any help would be appreciated.

Cheers

Andrei

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 11 December, 2023 11:39:27
> Subject: Re: KVM + Ceph volume attach problem

> Hey Wei,
> 
> I tested and I can attach it from the cloudmonkey just fine.
> 
> Do you know what could be causing the issue in the GUI?
> 
> Cheers
> 
> Andrei
> 
> - Original Message -
>> From: "Wei ZHOU" 
>> To: "users" 
>> Sent: Monday, 11 December, 2023 10:15:20
>> Subject: Re: KVM + Ceph volume attach problem
> 
>> Hi,
>> 
>> Do the volumes/vms belong to the same account ?
>> 
>> Can you try with cmk/cloudmonkey ?
>> 
>> -Wei
>> 
>> On Mon, 11 Dec 2023 at 11:05, Andrei Mikhailovsky 
>> wrote:
>> 
>>> Hello guys,
>>>
>>> I am having a strange issue which I've not noticed before. I am running
>>> ACS version 4.18.1.0 with ceph using rbd for images. I've created a new
>>> volume and trying to attach it. However, the list of VMs that I can attach
>>> a volume to is only showing STOPPED VMs. I can't seem to attach any volumes
>>> to a running vm. I've done the usual, management server logs checking,
>>> restarting the acs management server, etc. It didn't help. I've also
>>> noticed that the nfs storage pool volumes are experiencing the same issue.
>>>
>>> I am sure that I was able to attach volumes to a running vm in the past.
>>> What could be causing the issue?
>>>
>>> Cheers
>>>
> >> Andrei

Re: KVM + Ceph volume attach problem

2023-12-11 Thread Andrei Mikhailovsky

Hey Wei,

I tested and I can attach it from the cloudmonkey just fine.

Do you know what could be causing the issue in the GUI?

Cheers

Andrei

- Original Message -
> From: "Wei ZHOU" 
> To: "users" 
> Sent: Monday, 11 December, 2023 10:15:20
> Subject: Re: KVM + Ceph volume attach problem

> Hi,
> 
> Do the volumes/vms belong to the same account ?
> 
> Can you try with cmk/cloudmonkey ?
> 
> -Wei
> 
> On Mon, 11 Dec 2023 at 11:05, Andrei Mikhailovsky 
> wrote:
> 
>> Hello guys,
>>
>> I am having a strange issue which I've not noticed before. I am running
>> ACS version 4.18.1.0 with ceph using rbd for images. I've created a new
>> volume and trying to attach it. However, the list of VMs that I can attach
>> a volume to is only showing STOPPED VMs. I can't seem to attach any volumes
>> to a running vm. I've done the usual, management server logs checking,
>> restarting the acs management server, etc. It didn't help. I've also
>> noticed that the nfs storage pool volumes are experiencing the same issue.
>>
>> I am sure that I was able to attach volumes to a running vm in the past.
>> What could be causing the issue?
>>
>> Cheers
>>
>> Andrei

Re: KVM + Ceph volume attach problem

2023-12-11 Thread Andrei Mikhailovsky

Hi Wei, 

yeah, they are on the same account. I've done testing with a vm in a 
started/stopped state. When it's stopped, I can attach the volume. When I start 
the vm, it is not shown in the list of available VMs.

I will check the cmk, which I've not used for ages tbh.

Cheers

Andrei
- Original Message -
> From: "Wei ZHOU" 
> To: "users" 
> Sent: Monday, 11 December, 2023 10:15:20
> Subject: Re: KVM + Ceph volume attach problem

> Hi,
> 
> Do the volumes/vms belong to the same account ?
> 
> Can you try with cmk/cloudmonkey ?
> 
> -Wei
> 
> On Mon, 11 Dec 2023 at 11:05, Andrei Mikhailovsky 
> wrote:
> 
>> Hello guys,
>>
>> I am having a strange issue which I've not noticed before. I am running
>> ACS version 4.18.1.0 with ceph using rbd for images. I've created a new
>> volume and trying to attach it. However, the list of VMs that I can attach
>> a volume to is only showing STOPPED VMs. I can't seem to attach any volumes
>> to a running vm. I've done the usual, management server logs checking,
>> restarting the acs management server, etc. It didn't help. I've also
>> noticed that the nfs storage pool volumes are experiencing the same issue.
>>
>> I am sure that I was able to attach volumes to a running vm in the past.
>> What could be causing the issue?
>>
>> Cheers
>>
>> Andrei

KVM + Ceph volume attach problem

2023-12-11 Thread Andrei Mikhailovsky

Hello guys, 

I am having a strange issue which I've not noticed before. I am running ACS 
version 4.18.1.0 with ceph using rbd for images. I've created a new volume and 
trying to attach it. However, the list of VMs that I can attach a volume to is 
only showing STOPPED VMs. I can't seem to attach any volumes to a running vm. 
I've done the usual, management server logs checking, restarting the acs 
management server, etc. It didn't help. I've also noticed that the nfs storage 
pool volumes are experiencing the same issue. 

I am sure that I was able to attach volumes to a running vm in the past. What 
could be causing the issue? 

Cheers 

Andrei

Re: Kubernetes persistent volumes howto/doc

2023-06-08 Thread Andrei Mikhailovsky

Thank you, Wei.

Andrei
- Original Message -
> From: "Wei ZHOU" 
> To: "users" 
> Sent: Wednesday, 7 June, 2023 21:25:58
> Subject: Re: Kubernetes persistent volumes howto/doc

> Hi,
> 
> You can refer to the description of github issue
> https://github.com/apache/cloudstack/issues/7316
> 
> -Wei
> 
> On Wednesday, 7 June 2023, Andrei Mikhailovsky 
> wrote:
> 
>> Hello everyone,
>>
>> Could someone point me in the right direction on how to create and add
>> persistent volumes to cloud stack Kubernetes cluster. I've successfully
>> created cluster, but I am not able to add persistent volumes.
>>
>> Many thanks

Kubernetes persistent volumes howto/doc

2023-06-07 Thread Andrei Mikhailovsky

Hello everyone, 

Could someone point me in the right direction on how to create and add 
persistent volumes to cloud stack Kubernetes cluster. I've successfully created 
cluster, but I am not able to add persistent volumes. 

Many thanks

Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

2023-04-17 Thread Andrei Mikhailovsky

Wei, thanks a lot. That has resolved the problem and I have managed to finish 
the upgrade to 4.18.0 and start acs management server.

Cheers

Andrei

- Original Message -
> From: "Wei ZHOU" 
> To: "users" 
> Sent: Monday, 17 April, 2023 18:57:19
> Subject: Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

> This seems to be a bug with the original commit:
> https://github.com/apache/cloudstack/commit/dc151115be3e922933ea26ab1507eb6469a91e11
> It was committed to 4.4.0, but the SQL changes were added
> to setup/db/db/schema-40to410.sql, which caused the users who used
> 4.1.0/4.2.x/4.3.x not to have the SQL changes.
> 
> I think what to do are:
> 
> 1. restore the old database
> 
> 2. create table `autoscale_vmgroup_vm_map` if not exist
> CREATE TABLE `cloud`.`autoscale_vmgroup_vm_map` (
> `id` bigint unsigned NOT NULL auto_increment,
> `vmgroup_id` bigint unsigned NOT NULL,
> `instance_id` bigint unsigned NOT NULL,
> PRIMARY KEY (`id`),
> CONSTRAINT `fk_autoscale_vmgroup_vm_map__vmgroup_id` FOREIGN KEY
> `fk_autoscale_vmgroup_vm_map__vmgroup_id` (`vmgroup_id`) REFERENCES
> `autoscale_vmgroups` (`id`) ON DELETE CASCADE,
> CONSTRAINT `fk_autoscale_vmgroup_vm_map__instance_id` FOREIGN KEY
> `fk_autoscale_vmgroup_vm_map__instance_id` (`instance_id`) REFERENCES
> `vm_instance` (`id`),
> INDEX `i_autoscale_vmgroup_vm_map__vmgroup_id`(`vmgroup_id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
> 
> 3. add column last_quiet_time to table autoscale_policies if not exist
> ALTER TABLE `cloud`.`autoscale_policies` ADD COLUMN `last_quiet_time`
> datetime DEFAULT NULL AFTER `quiet_time`;
> 
> 4. add column last_interval to table autoscale_vmgroups if not exist
> ALTER TABLE `cloud`.`autoscale_vmgroups` ADD COLUMN `last_interval`
> datetime DEFAULT NULL AFTER `interval`;
> 
> 5. Upgrade and restart mgmt server
> 
> 
> -Wei
> 
> On Mon, 17 Apr 2023 at 19:39, Andrei Mikhailovsky 
> wrote:
> 
>> Rohit,
>>
>> Done some more checks and I don't have this table in db backups dating
>> back early 2021. I don't have older backups than that. So it seems that
>> this table didn't exist in my setup for ages, if ever at all.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Rohit Yadav" 
>> > To: "users" 
>> > Sent: Monday, 17 April, 2023 18:14:02
>> > Subject: Re: ACS upgrade SQL script error 4.17.2 > 4.18.0
>>
>> > Hi Andrei,
>> >
>> > It appears your database schema isn't in right order, the
>> > cloud.autoscale_vmgroup_vm_map table is created when we install/setup
>> > cloudstack for the first time and created by
>> >
>> https://github.com/apache/cloudstack/blob/main/engine/schema/src/main/resources/META-INF/db/schema-40to410.sql#L405
>> >
>> > Did you perhaps run the cloudstack-setup-databases or anything similar
>> on your
>> > database? If this is prod. DB you can try reverting to your backup and
>> try
>> > again.
>> >
>> >
>> > Regards.
>> >
>> > 
>> > From: Andrei Mikhailovsky 
>> > Sent: Monday, April 17, 2023 18:09
>> > To: users 
>> > Subject: ACS upgrade SQL script error 4.17.2 > 4.18.0
>> >
>> > Hello everyone,
>> >
>> > I've done an upgrade of ACS from 4.17.2 to 4.18.0 and faced a problem.
>> The
>> > management service didn't start. Log investigation showed an error
>> during the
>> > database upgrade script:
>> >
>> > 2023-04-17 13:23:26,342 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:)
>> > java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map'
>> > doesn't exist
>> > 2023-04-17 13:23:26,342 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
>> > (logid:) Unable to execute upgrade script
>> > java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map'
>> doesn't
>> > exist
>> > at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185)
>> > at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87)
>> >
>> > []
>> >
>> > 2023-04-17 13:23:26,344 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
>> > (logid:) Unable to upgrade the database
>> > com.cloud.utils.exception.CloudRuntimeException: Unable to execute
>> upgrade
>> > script
>> > at
>> >
>> com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:232)
>> > a

Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

2023-04-17 Thread Andrei Mikhailovsky

Wei, thanks. I will try and let you know how it goes. Cheers



- Original Message -
> From: "Wei ZHOU" 
> To: "users" 
> Sent: Monday, 17 April, 2023 18:57:19
> Subject: Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

> This seems to be a bug with the original commit:
> https://github.com/apache/cloudstack/commit/dc151115be3e922933ea26ab1507eb6469a91e11
> It was committed to 4.4.0, but the SQL changes were added
> to setup/db/db/schema-40to410.sql, which caused the users who used
> 4.1.0/4.2.x/4.3.x not to have the SQL changes.
> 
> I think what to do are:
> 
> 1. restore the old database
> 
> 2. create table `autoscale_vmgroup_vm_map` if not exist
> CREATE TABLE `cloud`.`autoscale_vmgroup_vm_map` (
> `id` bigint unsigned NOT NULL auto_increment,
> `vmgroup_id` bigint unsigned NOT NULL,
> `instance_id` bigint unsigned NOT NULL,
> PRIMARY KEY (`id`),
> CONSTRAINT `fk_autoscale_vmgroup_vm_map__vmgroup_id` FOREIGN KEY
> `fk_autoscale_vmgroup_vm_map__vmgroup_id` (`vmgroup_id`) REFERENCES
> `autoscale_vmgroups` (`id`) ON DELETE CASCADE,
> CONSTRAINT `fk_autoscale_vmgroup_vm_map__instance_id` FOREIGN KEY
> `fk_autoscale_vmgroup_vm_map__instance_id` (`instance_id`) REFERENCES
> `vm_instance` (`id`),
> INDEX `i_autoscale_vmgroup_vm_map__vmgroup_id`(`vmgroup_id`)
> ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
> 
> 3. add column last_quiet_time to table autoscale_policies if not exist
> ALTER TABLE `cloud`.`autoscale_policies` ADD COLUMN `last_quiet_time`
> datetime DEFAULT NULL AFTER `quiet_time`;
> 
> 4. add column last_interval to table autoscale_vmgroups if not exist
> ALTER TABLE `cloud`.`autoscale_vmgroups` ADD COLUMN `last_interval`
> datetime DEFAULT NULL AFTER `interval`;
> 
> 5. Upgrade and restart mgmt server
> 
> 
> -Wei
> 
> On Mon, 17 Apr 2023 at 19:39, Andrei Mikhailovsky 
> wrote:
> 
>> Rohit,
>>
>> Done some more checks and I don't have this table in db backups dating
>> back early 2021. I don't have older backups than that. So it seems that
>> this table didn't exist in my setup for ages, if ever at all.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Rohit Yadav" 
>> > To: "users" 
>> > Sent: Monday, 17 April, 2023 18:14:02
>> > Subject: Re: ACS upgrade SQL script error 4.17.2 > 4.18.0
>>
>> > Hi Andrei,
>> >
>> > It appears your database schema isn't in right order, the
>> > cloud.autoscale_vmgroup_vm_map table is created when we install/setup
>> > cloudstack for the first time and created by
>> >
>> https://github.com/apache/cloudstack/blob/main/engine/schema/src/main/resources/META-INF/db/schema-40to410.sql#L405
>> >
>> > Did you perhaps run the cloudstack-setup-databases or anything similar
>> on your
>> > database? If this is prod. DB you can try reverting to your backup and
>> try
>> > again.
>> >
>> >
>> > Regards.
>> >
>> > 
>> > From: Andrei Mikhailovsky 
>> > Sent: Monday, April 17, 2023 18:09
>> > To: users 
>> > Subject: ACS upgrade SQL script error 4.17.2 > 4.18.0
>> >
>> > Hello everyone,
>> >
>> > I've done an upgrade of ACS from 4.17.2 to 4.18.0 and faced a problem.
>> The
>> > management service didn't start. Log investigation showed an error
>> during the
>> > database upgrade script:
>> >
>> > 2023-04-17 13:23:26,342 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:)
>> > java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map'
>> > doesn't exist
>> > 2023-04-17 13:23:26,342 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
>> > (logid:) Unable to execute upgrade script
>> > java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map'
>> doesn't
>> > exist
>> > at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185)
>> > at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87)
>> >
>> > []
>> >
>> > 2023-04-17 13:23:26,344 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
>> > (logid:) Unable to upgrade the database
>> > com.cloud.utils.exception.CloudRuntimeException: Unable to execute
>> upgrade
>> > script
>> > at
>> >
>> com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:232)
>> > at
>> >
>> com.cloud.upgrade.DatabaseUpgradeChecker.upgrade(Databas

Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

2023-04-17 Thread Andrei Mikhailovsky

Rohit,

Done some more checks and I don't have this table in db backups dating back 
early 2021. I don't have older backups than that. So it seems that this table 
didn't exist in my setup for ages, if ever at all.

Andrei

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Sent: Monday, 17 April, 2023 18:14:02
> Subject: Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

> Hi Andrei,
> 
> It appears your database schema isn't in right order, the
> cloud.autoscale_vmgroup_vm_map table is created when we install/setup
> cloudstack for the first time and created by
> https://github.com/apache/cloudstack/blob/main/engine/schema/src/main/resources/META-INF/db/schema-40to410.sql#L405
> 
> Did you perhaps run the cloudstack-setup-databases or anything similar on your
> database? If this is prod. DB you can try reverting to your backup and try
> again.
> 
> 
> Regards.
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Monday, April 17, 2023 18:09
> To: users 
> Subject: ACS upgrade SQL script error 4.17.2 > 4.18.0
> 
> Hello everyone,
> 
> I've done an upgrade of ACS from 4.17.2 to 4.18.0 and faced a problem. The
> management service didn't start. Log investigation showed an error during the
> database upgrade script:
> 
> 2023-04-17 13:23:26,342 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:)
> java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map'
> doesn't exist
> 2023-04-17 13:23:26,342 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
> (logid:) Unable to execute upgrade script
> java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map' 
> doesn't
> exist
> at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185)
> at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87)
> 
> []
> 
> 2023-04-17 13:23:26,344 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
> (logid:) Unable to upgrade the database
> com.cloud.utils.exception.CloudRuntimeException: Unable to execute upgrade
> script
> at
> com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:232)
> at
> com.cloud.upgrade.DatabaseUpgradeChecker.upgrade(DatabaseUpgradeChecker.java:310)
> at
> com.cloud.upgrade.DatabaseUpgradeChecker.check(DatabaseUpgradeChecker.java:401)
> at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.checkIntegrity(CloudStackExtendedLifeCycle.java:64)
> at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:54)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:178)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:54)
> at
> org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:356)
> at java.base/java.lang.Iterable.forEach(Iterable.java:75)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
> at
> org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
> at java.base/java.lang.Iterable.forEach(Iterable.java:75)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
> at
> org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
> at
> org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:144)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:121)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:244)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:249)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:232)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContexts(DefaultModuleDefinitionSet.java:116)
> at
> org.apache.cloudstack.spring.modu

Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

2023-04-17 Thread Andrei Mikhailovsky

Hi Rohit,

Many thanks for your reply. I've been using cloudstack since around 2012. This 
instance was upgraded a dozen times at least. I don't remember skipping any 
major releases. Today was the first time I got DB error of that kind. 

I have used the documentation guides when I was setting up back in 2012 and 
followed documentation. I am sure I've ran the database setup tool. Not sure 
how reverting back to the db backup would help if it doesn't seem contain the 
table and the error I get relates to not having this table.


$ cat cloud-backup_2023-04-17-131612 |grep -i autoscale_vmgroup_vm_map


Please advice.

Andrei

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Sent: Monday, 17 April, 2023 18:14:02
> Subject: Re: ACS upgrade SQL script error 4.17.2 > 4.18.0

> Hi Andrei,
> 
> It appears your database schema isn't in right order, the
> cloud.autoscale_vmgroup_vm_map table is created when we install/setup
> cloudstack for the first time and created by
> https://github.com/apache/cloudstack/blob/main/engine/schema/src/main/resources/META-INF/db/schema-40to410.sql#L405
> 
> Did you perhaps run the cloudstack-setup-databases or anything similar on your
> database? If this is prod. DB you can try reverting to your backup and try
> again.
> 
> 
> Regards.
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Monday, April 17, 2023 18:09
> To: users 
> Subject: ACS upgrade SQL script error 4.17.2 > 4.18.0
> 
> Hello everyone,
> 
> I've done an upgrade of ACS from 4.17.2 to 4.18.0 and faced a problem. The
> management service didn't start. Log investigation showed an error during the
> database upgrade script:
> 
> 2023-04-17 13:23:26,342 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:)
> java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map'
> doesn't exist
> 2023-04-17 13:23:26,342 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
> (logid:) Unable to execute upgrade script
> java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map' 
> doesn't
> exist
> at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185)
> at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87)
> 
> []
> 
> 2023-04-17 13:23:26,344 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null)
> (logid:) Unable to upgrade the database
> com.cloud.utils.exception.CloudRuntimeException: Unable to execute upgrade
> script
> at
> com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:232)
> at
> com.cloud.upgrade.DatabaseUpgradeChecker.upgrade(DatabaseUpgradeChecker.java:310)
> at
> com.cloud.upgrade.DatabaseUpgradeChecker.check(DatabaseUpgradeChecker.java:401)
> at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.checkIntegrity(CloudStackExtendedLifeCycle.java:64)
> at
> org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:54)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:178)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:54)
> at
> org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:356)
> at java.base/java.lang.Iterable.forEach(Iterable.java:75)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
> at
> org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
> at java.base/java.lang.Iterable.forEach(Iterable.java:75)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
> at
> org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
> at
> org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
> at
> org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:144)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:121)
> at
> org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:244)
> at
> org.apache.cloudstack.spring

ACS upgrade SQL script error 4.17.2 > 4.18.0

2023-04-17 Thread Andrei Mikhailovsky

Hello everyone, 

I've done an upgrade of ACS from 4.17.2 to 4.18.0 and faced a problem. The 
management service didn't start. Log investigation showed an error during the 
database upgrade script: 

2023-04-17 13:23:26,342 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) 
java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map' 
doesn't exist 
2023-04-17 13:23:26,342 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Unable to execute upgrade script 
java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map' 
doesn't exist 
at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185) 
at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87) 

[] 

2023-04-17 13:23:26,344 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Unable to upgrade the database 
com.cloud.utils.exception.CloudRuntimeException: Unable to execute upgrade 
script 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:232)
 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.upgrade(DatabaseUpgradeChecker.java:310)
 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.check(DatabaseUpgradeChecker.java:401) 
at 
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.checkIntegrity(CloudStackExtendedLifeCycle.java:64)
 
at 
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:54)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:178)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:54)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:356)
 
at java.base/java.lang.Iterable.forEach(Iterable.java:75) 
at 
org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
 
at 
org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
 
at java.base/java.lang.Iterable.forEach(Iterable.java:75) 
at 
org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
 
at 
org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
 
at 
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:144)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:121)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:244)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:249)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:232)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContexts(DefaultModuleDefinitionSet.java:116)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.load(DefaultModuleDefinitionSet.java:78)
 
at 
org.apache.cloudstack.spring.module.factory.ModuleBasedContextFactory.loadModules(ModuleBasedContextFactory.java:37)
 
at 
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.init(Clou 

[.] 


2023-04-17 13:23:26,349 DEBUG [c.c.u.d.T.Transaction] (main:null) (logid:) 
Rolling back the transaction: Time = 631 Name = Upgrade; called by 
-TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-DatabaseUpgradeChecker.upgrade:325-DatabaseUpgradeChecker.check:401-CloudStackExtendedLifeCycle.checkIntegrity:64-CloudStackExtendedLifeCycle.start:54-DefaultLifecycleProcessor.doStart:178-DefaultLifecycleProcessor.access$200:54-DefaultLifecycleProcessor$LifecycleGroup.start:356-Iterable.forEach:75-DefaultLifecycleProcessor.startBeans:155
 


My setup: 

Ubuntu 20.04.x with latest updates for both management, agent and usage 
servers. DB: mariadb-server 1:10.3.38-0ubuntu0.20.04.1 

Has anyone faced that issue? How do I solve it? 

Many thanks 

Andrei

ACS upgrade SQL script error 4.17.2 > 4.18.0

2023-04-17 Thread Andrei Mikhailovsky

Hello everyone, 

I've done an upgrade of ACS from 4.17.2 to 4.18.0 and faced a problem. The 
management service didn't start. Log investigation showed an error during the 
database upgrade script: 

2023-04-17 13:23:26,342 ERROR [c.c.u.d.ScriptRunner] (main:null) (logid:) 
java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map' 
doesn't exist 
2023-04-17 13:23:26,342 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Unable to execute upgrade script 
java.sql.SQLSyntaxErrorException: Table 'cloud.autoscale_vmgroup_vm_map' 
doesn't exist 
at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:185) 
at com.cloud.utils.db.ScriptRunner.runScript(ScriptRunner.java:87) 

[] 

2023-04-17 13:23:26,344 ERROR [c.c.u.DatabaseUpgradeChecker] (main:null) 
(logid:) Unable to upgrade the database 
com.cloud.utils.exception.CloudRuntimeException: Unable to execute upgrade 
script 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.runScript(DatabaseUpgradeChecker.java:232)
 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.upgrade(DatabaseUpgradeChecker.java:310)
 
at 
com.cloud.upgrade.DatabaseUpgradeChecker.check(DatabaseUpgradeChecker.java:401) 
at 
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.checkIntegrity(CloudStackExtendedLifeCycle.java:64)
 
at 
org.apache.cloudstack.spring.lifecycle.CloudStackExtendedLifeCycle.start(CloudStackExtendedLifeCycle.java:54)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:178)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.access$200(DefaultLifecycleProcessor.java:54)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor$LifecycleGroup.start(DefaultLifecycleProcessor.java:356)
 
at java.base/java.lang.Iterable.forEach(Iterable.java:75) 
at 
org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
 
at 
org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
 
at java.base/java.lang.Iterable.forEach(Iterable.java:75) 
at 
org.springframework.context.support.DefaultLifecycleProcessor.startBeans(DefaultLifecycleProcessor.java:155)
 
at 
org.springframework.context.support.DefaultLifecycleProcessor.onRefresh(DefaultLifecycleProcessor.java:123)
 
at 
org.springframework.context.support.AbstractApplicationContext.finishRefresh(AbstractApplicationContext.java:935)
 
at 
org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:586)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContext(DefaultModuleDefinitionSet.java:144)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet$2.with(DefaultModuleDefinitionSet.java:121)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:244)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:249)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.withModule(DefaultModuleDefinitionSet.java:232)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.loadContexts(DefaultModuleDefinitionSet.java:116)
 
at 
org.apache.cloudstack.spring.module.model.impl.DefaultModuleDefinitionSet.load(DefaultModuleDefinitionSet.java:78)
 
at 
org.apache.cloudstack.spring.module.factory.ModuleBasedContextFactory.loadModules(ModuleBasedContextFactory.java:37)
 
at 
org.apache.cloudstack.spring.module.factory.CloudStackSpringContext.init(Clou 

[.] 


2023-04-17 13:23:26,349 DEBUG [c.c.u.d.T.Transaction] (main:null) (logid:) 
Rolling back the transaction: Time = 631 Name = Upgrade; called by 
-TransactionLegacy.rollback:888-TransactionLegacy.removeUpTo:831-TransactionLegacy.close:655-DatabaseUpgradeChecker.upgrade:325-DatabaseUpgradeChecker.check:401-CloudStackExtendedLifeCycle.checkIntegrity:64-CloudStackExtendedLifeCycle.start:54-DefaultLifecycleProcessor.doStart:178-DefaultLifecycleProcessor.access$200:54-DefaultLifecycleProcessor$LifecycleGroup.start:356-Iterable.forEach:75-DefaultLifecycleProcessor.startBeans:155
 


My setup: 

Ubuntu 20.04.x with latest updates for both management, agent and usage 
servers. DB: mariadb-server 1:10.3.38-0ubuntu0.20.04.1 

Has anyone faced that issue? How do I solve it? 

Many thanks 

Andrei

Re: Unable to login to GUI onto second management server

2022-08-02 Thread Andrei Mikhailovsky

I have followed the instructions from 
https://www.shapeblue.com/dynamic-roles-in-cloudstack/ and can confirm that 
after running the migration python script I was able to login to the new 
management server. 

Perhaps the installation manuals for the multiple management server setup 
should mention the above instructions for the ACS servers that were crated 
before 4.9.x. In my case, I've installed ACS back in 2012 and didn't update the 
permissions structure

Thanks everyone for looking into this issue.

Andrei

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Cc: "Harikrishna Patnala" 
> Sent: Tuesday, 2 August, 2022 15:49:55
> Subject: Re: Unable to login to GUI onto second management server

> It seems that my issue is closely related to the issue discussed here:
> https://github.com/apache/cloudstack/issues/3200
> 
> I will investigate this further
> 
> Andrei
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "Harikrishna Patnala" 
>> Cc: "users" 
>> Sent: Sunday, 31 July, 2022 22:56:31
>> Subject: Re: Unable to login to GUI onto second management server
> 
>> Hi Harikrishna,
>> 
>> Tried the below, but still have the same issue.
>> 
>> also, after trying what you've suggested, I've started the old management 
>> server
>> and I was still able to login. not sure if the host setting does anything 
>> login
>> related...
>> 
>> Andrei
>> 
>>> From: "Harikrishna Patnala" 
>>> To: "Andrei Mikhailovsky" 
>>> Cc: "users" 
>>> Sent: Thursday, 28 July, 2022 09:54:39
>>> Subject: Re: Unable to login to GUI onto second management server
>> 
>>> Hi Andrei,
>> 
>>> Can you please also try the below steps? I'm just making sure all pointers 
>>> are
>>> to the new management server only.
>> 
>>> 1. Keep only the new management server IP in the host configuration.
>>> 2. Stop the old management server
>>> 3. Restart the new management server
>>> Thanks,
>>> Harikrishna
>> 
>>> From: Andrei Mikhailovsky 
>>> Sent: Wednesday, July 27, 2022 6:45 PM
>>> To: Harikrishna Patnala 
>>> Cc: users 
>>> Subject: Re: Unable to login to GUI onto second management server
>>> Hi Harikrishna,
>> 
>>> I have added the new management server IP address into the host 
>>> configuration
>>> from the gui. It now shows:
>> 
>>> hostThe ip address of management server. This can also accept comma 
>>> separated
>>> addresses.  Advanced
>>> 192.168.169.13,192.168.169.21
>> 
>>> After that I've started the new management server and unfortunately, I still
>>> have the same issue.
>> 
>>> I have also noticed that after starting the new management server, the table
>>> mshost has been updated to reflect the server status as Up.:
>> 
>>>| 4 | 115129173025114 | 1658099918669 | ais-cloudhost13.csprdc.arhont.com |
>>>| 98405826-0861-11ea-a1da-8003fe80 | Up | 4.16.1.0 | 127.0.0.1 | 9090 |
>>> | 2022-07-27 13:10:05 | NULL | 0 |
>>>| 5 | 165004275141402 | 1658927302926 | ais-compute1.cloud.arhont.com |
>>>| 0d1522a5-5d08-46af-b59c-b577aa22e9bb | Up | 4.16.1.0 | 192.168.169.21 | 
>>>9090 |
>>> | 2022-07-27 13:08:32 | NULL | 0 |
>> 
>>> Anything else I should try?
>> 
>>> Thanks
>> 
>>> Andrei
>> 
>>>> From: "Harikrishna Patnala" 
>>>> To: "Andrei Mikhailovsky" , "users"
>>>> 
>>>> Sent: Wednesday, 27 July, 2022 07:21:24
>>>> Subject: Re: Unable to login to GUI onto second management server
>> 
>>>> Hi Andrei,
>> 
>>>> If the purpose of the second management server is about migration please 
>>>> ignore
>>>> the previous reply.
>> 
>>>> You have the right pointer to the procedure and I hope you have followed 
>>>> it.
>> 
>>>> Please try to provide the following information.
>> 
>>>> 1. Is the old management server also in the 4.16.1 version?
>>>>2. Which database.properties file you have changed to point to the new 
>>>> database
>>>> ?
>>>>3. Can you check the database table "configuration", what is the value 
>>>> for the
>>>> configuration with the name "

Re: Unable to login to GUI onto second management server

2022-08-02 Thread Andrei Mikhailovsky

It seems that my issue is closely related to the issue discussed here: 
https://github.com/apache/cloudstack/issues/3200

I will investigate this further

Andrei

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "Harikrishna Patnala" 
> Cc: "users" 
> Sent: Sunday, 31 July, 2022 22:56:31
> Subject: Re: Unable to login to GUI onto second management server

> Hi Harikrishna,
> 
> Tried the below, but still have the same issue.
> 
> also, after trying what you've suggested, I've started the old management 
> server
> and I was still able to login. not sure if the host setting does anything 
> login
> related...
> 
> Andrei
> 
>> From: "Harikrishna Patnala" 
>> To: "Andrei Mikhailovsky" 
>> Cc: "users" 
>> Sent: Thursday, 28 July, 2022 09:54:39
>> Subject: Re: Unable to login to GUI onto second management server
> 
>> Hi Andrei,
> 
>> Can you please also try the below steps? I'm just making sure all pointers 
>> are
>> to the new management server only.
> 
>>     1. Keep only the new management server IP in the host configuration.
>> 2. Stop the old management server
>> 3. Restart the new management server
>> Thanks,
>> Harikrishna
> 
>> From: Andrei Mikhailovsky 
>> Sent: Wednesday, July 27, 2022 6:45 PM
>> To: Harikrishna Patnala 
>> Cc: users 
>> Subject: Re: Unable to login to GUI onto second management server
>> Hi Harikrishna,
> 
>> I have added the new management server IP address into the host configuration
>> from the gui. It now shows:
> 
>> host The ip address of management server. This can also accept comma 
>> separated
>> addresses.   Advanced
>> 192.168.169.13,192.168.169.21
> 
>> After that I've started the new management server and unfortunately, I still
>> have the same issue.
> 
>> I have also noticed that after starting the new management server, the table
>> mshost has been updated to reflect the server status as Up.:
> 
>>| 4 | 115129173025114 | 1658099918669 | ais-cloudhost13.csprdc.arhont.com |
>>| 98405826-0861-11ea-a1da-8003fe80 | Up | 4.16.1.0 | 127.0.0.1 | 9090 |
>> | 2022-07-27 13:10:05 | NULL | 0 |
>>| 5 | 165004275141402 | 1658927302926 | ais-compute1.cloud.arhont.com |
>>| 0d1522a5-5d08-46af-b59c-b577aa22e9bb | Up | 4.16.1.0 | 192.168.169.21 | 
>>9090 |
>> | 2022-07-27 13:08:32 | NULL | 0 |
> 
>> Anything else I should try?
> 
>> Thanks
> 
>> Andrei
> 
>>> From: "Harikrishna Patnala" 
>>> To: "Andrei Mikhailovsky" , "users"
>>> 
>>> Sent: Wednesday, 27 July, 2022 07:21:24
>>> Subject: Re: Unable to login to GUI onto second management server
> 
>>> Hi Andrei,
> 
>>> If the purpose of the second management server is about migration please 
>>> ignore
>>> the previous reply.
> 
>>> You have the right pointer to the procedure and I hope you have followed it.
> 
>>> Please try to provide the following information.
> 
>>> 1. Is the old management server also in the 4.16.1 version?
>>>2. Which database.properties file you have changed to point to the new 
>>> database
>>> ?
>>>3. Can you check the database table "configuration", what is the value 
>>> for the
>>> configuration with the name "host", is it your new MS host address ?
>>>4. Also, check the "mshost" table in the database if it is pointing to 
>>> the new
>>> management server.
>>> Regards,
>>> Harikrishna
> 
>>> From: Andrei Mikhailovsky 
>>> Sent: Monday, July 25, 2022 7:46 PM
>>> To: users 
>>> Cc: Harikrishna Patnala 
>>> Subject: Re: Unable to login to GUI onto second management server
> 
>>> Hi Harikrishna,
> 
>>> Having read the links that you've sent I am not sure that my issues are 
>>> related.
>>> Perhaps I should have explained my current set up / intensions a bit more. 
>>> My
>>> main reasons for adding the multiple management servers is not to provide 
>>> the
>>> HA / load balancing, but rather to migrate the current management server 
>>> from
>>> old hardware to the new one. I was referring to the post sent by Andrija 
>>> Panic
>>> ( [ https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html |
>>> https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html ] )
&

Re: Unable to login to GUI onto second management server

2022-07-31 Thread Andrei Mikhailovsky

Hi Harikrishna, 

Tried the below, but still have the same issue. 

also, after trying what you've suggested, I've started the old management 
server and I was still able to login. not sure if the host setting does 
anything login related... 

Andrei 

> From: "Harikrishna Patnala" 
> To: "Andrei Mikhailovsky" 
> Cc: "users" 
> Sent: Thursday, 28 July, 2022 09:54:39
> Subject: Re: Unable to login to GUI onto second management server

> Hi Andrei,

> Can you please also try the below steps? I'm just making sure all pointers are
> to the new management server only.

> 1. Keep only the new management server IP in the host configuration.
> 2. Stop the old management server
>     3. Restart the new management server
> Thanks,
> Harikrishna

> From: Andrei Mikhailovsky 
> Sent: Wednesday, July 27, 2022 6:45 PM
> To: Harikrishna Patnala 
> Cc: users 
> Subject: Re: Unable to login to GUI onto second management server
> Hi Harikrishna,

> I have added the new management server IP address into the host configuration
> from the gui. It now shows:

> host  The ip address of management server. This can also accept comma 
> separated
> addresses.Advanced
> 192.168.169.13,192.168.169.21

> After that I've started the new management server and unfortunately, I still
> have the same issue.

> I have also noticed that after starting the new management server, the table
> mshost has been updated to reflect the server status as Up.:

>| 4 | 115129173025114 | 1658099918669 | ais-cloudhost13.csprdc.arhont.com |
>| 98405826-0861-11ea-a1da-8003fe80 | Up | 4.16.1.0 | 127.0.0.1 | 9090 |
> | 2022-07-27 13:10:05 | NULL | 0 |
>| 5 | 165004275141402 | 1658927302926 | ais-compute1.cloud.arhont.com |
>| 0d1522a5-5d08-46af-b59c-b577aa22e9bb | Up | 4.16.1.0 | 192.168.169.21 | 9090 
>|
> | 2022-07-27 13:08:32 | NULL | 0 |

> Anything else I should try?

> Thanks

> Andrei

>> From: "Harikrishna Patnala" 
>> To: "Andrei Mikhailovsky" , "users"
>> 
>> Sent: Wednesday, 27 July, 2022 07:21:24
>> Subject: Re: Unable to login to GUI onto second management server

>> Hi Andrei,

>> If the purpose of the second management server is about migration please 
>> ignore
>> the previous reply.

>> You have the right pointer to the procedure and I hope you have followed it.

>> Please try to provide the following information.

>> 1. Is the old management server also in the 4.16.1 version?
>>2. Which database.properties file you have changed to point to the new 
>> database
>> ?
>>3. Can you check the database table "configuration", what is the value 
>> for the
>> configuration with the name "host", is it your new MS host address ?
>>4. Also, check the "mshost" table in the database if it is pointing to 
>> the new
>> management server.
>> Regards,
>> Harikrishna

>> From: Andrei Mikhailovsky 
>> Sent: Monday, July 25, 2022 7:46 PM
>> To: users 
>> Cc: Harikrishna Patnala 
>> Subject: Re: Unable to login to GUI onto second management server

>> Hi Harikrishna,

>> Having read the links that you've sent I am not sure that my issues are 
>> related.
>> Perhaps I should have explained my current set up / intensions a bit more. My
>> main reasons for adding the multiple management servers is not to provide the
>> HA / load balancing, but rather to migrate the current management server from
>> old hardware to the new one. I was referring to the post sent by Andrija 
>> Panic
>> ( [ https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html |
>> https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html ] )
>> where Andrija has suggested that one should install the second management
>> server, connect it to the database, move the database to a new server and
>> change the database properties to point the new management server to the new
>> db.

>> In my tests, I have installed the second management server without any
>> proxy/load balancing and I tried to connect and authenticate directly to the 
>> IP
>> address of the second management server. I've tried it with the primary
>> management server switched on and off, but I still have the same issues. If I
>> am connecting directly to the new management server IP, I don't see how 
>> having
>> nginx proxy settings changes would fix my issue. Also, I have not seen 
>> anything
>> in the documentation that explicitly requires having a proxy if you install 
>> the
>> sec

Re: Unable to login to GUI onto second management server

2022-07-27 Thread Andrei Mikhailovsky

Hi Harikrishna, 

I have added the new management server IP address into the host configuration 
from the gui. It now shows: 

hostThe ip address of management server. This can also accept comma 
separated addresses.Advanced
192.168.169.13,192.168.169.21 

After that I've started the new management server and unfortunately, I still 
have the same issue. 

I have also noticed that after starting the new management server, the table 
mshost has been updated to reflect the server status as Up.: 

| 4 | 115129173025114 | 1658099918669 | ais-cloudhost13.csprdc.arhont.com | 
98405826-0861-11ea-a1da-8003fe80 | Up | 4.16.1.0 | 127.0.0.1 | 9090 | 
2022-07-27 13:10:05 | NULL | 0 | 
| 5 | 165004275141402 | 1658927302926 | ais-compute1.cloud.arhont.com | 
0d1522a5-5d08-46af-b59c-b577aa22e9bb | Up | 4.16.1.0 | 192.168.169.21 | 9090 | 
2022-07-27 13:08:32 | NULL | 0 | 

Anything else I should try? 

Thanks 

Andrei 

> From: "Harikrishna Patnala" 
> To: "Andrei Mikhailovsky" , "users"
> 
> Sent: Wednesday, 27 July, 2022 07:21:24
> Subject: Re: Unable to login to GUI onto second management server

> Hi Andrei,

> If the purpose of the second management server is about migration please 
> ignore
> the previous reply.

> You have the right pointer to the procedure and I hope you have followed it.

> Please try to provide the following information.

> 1. Is the old management server also in the 4.16.1 version?
>2. Which database.properties file you have changed to point to the new 
> database
> ?
>3. Can you check the database table "configuration", what is the value for 
> the
> configuration with the name "host", is it your new MS host address ?
>4. Also, check the "mshost" table in the database if it is pointing to the 
> new
> management server.
> Regards,
> Harikrishna

> From: Andrei Mikhailovsky 
> Sent: Monday, July 25, 2022 7:46 PM
> To: users 
> Cc: Harikrishna Patnala 
> Subject: Re: Unable to login to GUI onto second management server

> Hi Harikrishna,

> Having read the links that you've sent I am not sure that my issues are 
> related.
> Perhaps I should have explained my current set up / intensions a bit more. My
> main reasons for adding the multiple management servers is not to provide the
> HA / load balancing, but rather to migrate the current management server from
> old hardware to the new one. I was referring to the post sent by Andrija Panic
> ( [ https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html |
> https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html ] )
> where Andrija has suggested that one should install the second management
> server, connect it to the database, move the database to a new server and
> change the database properties to point the new management server to the new
> db.

> In my tests, I have installed the second management server without any
> proxy/load balancing and I tried to connect and authenticate directly to the 
> IP
> address of the second management server. I've tried it with the primary
> management server switched on and off, but I still have the same issues. If I
> am connecting directly to the new management server IP, I don't see how having
> nginx proxy settings changes would fix my issue. Also, I have not seen 
> anything
> in the documentation that explicitly requires having a proxy if you install 
> the
> second management server.

> Why do you think my issue relates to CORS?

> Andrei

> - Original Message -
> > From: "Harikrishna Patnala" 
> > To: "users" 
> > Sent: Wednesday, 20 July, 2022 05:10:13
> > Subject: Re: Unable to login to GUI onto second management server

> > Hi Andrei,

> > This looks to me like a CORS issue.

> > Have you set up any load balancer for these management servers. There is a
> > section
>> [
>> http://docs.cloudstack.apache.org/en/4.16.1.0/adminguide/reliability.html#management-server-load-balancing
> > |
> http://docs.cloudstack.apache.org/en/4.16.1.0/adminguide/reliability.html#management-server-load-balancing
> ]
> > which you need to configure so that you will not face issues with HA and 
> > agents
> > later on.


> > You may need to consider setting cookies like below.

> > If you are using nginx, try with "proxy_cookie_path / "/; Secure;
> > SameSite=None;";" and a similar thing should work haproxy too.

> > I got this reference from a previous discussion on a PR
> > [ 
> > https://github.com/apache/cloudstack-primate/pull/898#issuecomment-760227366
> >  |
> https://github.com/apache/cloudstack-primate/pull/898#issueco

Re: Unable to login to GUI onto second management server

2022-07-27 Thread Andrei Mikhailovsky

Hi Harikrishna, 

Thank you for your prompt reply. Below are the comments/answers to your 
questions. 

> Hi Andrei,

> If the purpose of the second management server is about migration please 
> ignore
> the previous reply.

> You have the right pointer to the procedure and I hope you have followed it.

> Please try to provide the following information.

> 1. Is the old management server also in the 4.16.1 version?

Yes, the old management server is running the same ACS version, 4.16.1. Both 
old and new management servers are running on Ubuntu servers. 

>1. Which database.properties file you have changed to point to the new 
> database
> ?

I did not reach this phase of the migration plan as I should have checked if 
the new management server is working. Both the new and old management servers 
are connecting to the same database. 

>1. Can you check the database table "configuration", what is the value for 
> the
> configuration with the name "host", is it your new MS host address ?

The value of the host in configuration is the IP address of the old management 
server: 

hostThe ip address of management server. This can also accept comma 
separated addresses.Advanced
192.168.169.13 

>1. Also, check the "mshost" table in the database if it is pointing to the 
> new
> management server.

the table mshost has both the old and the new management server entries (id 4 
is the old ms and id5 is the new one). Currently the new management server is 
stopped, hence I guess the state is Down: 

| 4 | 115129173025114 | 1658099918669 | ais-cloudhost13.csprdc.arhont.com | 
98405826-0861-11ea-a1da-8003fe80 | Up | 4.16.1.0 | 127.0.0.1 | 9090 | 
2022-07-27 09:55:20 | NULL | 0 | 

| 5 | 165004275141402 | 1658141860852 | ais-compute1.cloud.arhont.com | 
0d1522a5-5d08-46af-b59c-b577aa22e9bb | Down | 4.16.1.0 | 192.168.169.21 | 9090 
| 2022-07-18 11:18:51 | NULL | 1 | 

> Regards,
> Harikrishna

> From: Andrei Mikhailovsky 
> Sent: Monday, July 25, 2022 7:46 PM
> To: users 
> Cc: Harikrishna Patnala 
> Subject: Re: Unable to login to GUI onto second management server

> Hi Harikrishna,

> Having read the links that you've sent I am not sure that my issues are 
> related.
> Perhaps I should have explained my current set up / intensions a bit more. My
> main reasons for adding the multiple management servers is not to provide the
> HA / load balancing, but rather to migrate the current management server from
> old hardware to the new one. I was referring to the post sent by Andrija Panic
> ( [ https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html |
> https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html ] )
> where Andrija has suggested that one should install the second management
> server, connect it to the database, move the database to a new server and
> change the database properties to point the new management server to the new
> db.

> In my tests, I have installed the second management server without any
> proxy/load balancing and I tried to connect and authenticate directly to the 
> IP
> address of the second management server. I've tried it with the primary
> management server switched on and off, but I still have the same issues. If I
> am connecting directly to the new management server IP, I don't see how having
> nginx proxy settings changes would fix my issue. Also, I have not seen 
> anything
> in the documentation that explicitly requires having a proxy if you install 
> the
> second management server.

> Why do you think my issue relates to CORS?

> Andrei

> - Original Message -
> > From: "Harikrishna Patnala" 
> > To: "users" 
> > Sent: Wednesday, 20 July, 2022 05:10:13
> > Subject: Re: Unable to login to GUI onto second management server

> > Hi Andrei,

> > This looks to me like a CORS issue.

> > Have you set up any load balancer for these management servers. There is a
> > section
>> [
>> http://docs.cloudstack.apache.org/en/4.16.1.0/adminguide/reliability.html#management-server-load-balancing
> > |
> http://docs.cloudstack.apache.org/en/4.16.1.0/adminguide/reliability.html#management-server-load-balancing
> ]
> > which you need to configure so that you will not face issues with HA and 
> > agents
> > later on.


> > You may need to consider setting cookies like below.

> > If you are using nginx, try with "proxy_cookie_path / "/; Secure;
> > SameSite=None;";" and a similar thing should work haproxy too.

> > I got this reference from a previous discussion on a PR
> > [ 
> > https://github.com/apache/cloudstack-primate/pull/898#issuecomment-760227366
>

Re: Unable to login to GUI onto second management server

2022-07-25 Thread Andrei Mikhailovsky



Hi Harikrishna,

Having read the links that you've sent I am not sure that my issues are 
related. Perhaps I should have explained my current set up / intensions a bit 
more. My main reasons for adding the multiple management servers is not to 
provide the HA / load balancing, but rather to migrate the current management 
server from old hardware to the new one. I was referring to the post sent by 
Andrija Panic 
(https://www.mail-archive.com/users@cloudstack.apache.org/msg32889.html) where 
Andrija has suggested that one should install the second management server, 
connect it to the database, move the database to a new server and change the 
database properties to point the new management server to the new db.

In my tests, I have installed the second management server without any 
proxy/load balancing and I tried to connect and authenticate directly to the IP 
address of the second management server. I've tried it with the primary 
management server switched on and off, but I still have the same issues. If I 
am connecting directly to the new management server IP, I don't see how having 
nginx proxy settings changes would fix my issue. Also, I have not seen anything 
in the documentation that explicitly requires having a proxy if you install the 
second management server.

Why do you think my issue relates to CORS?

Andrei



- Original Message -
> From: "Harikrishna Patnala" 
> To: "users" 
> Sent: Wednesday, 20 July, 2022 05:10:13
> Subject: Re: Unable to login to GUI onto second management server

> Hi Andrei,
> 
> This looks to me like a CORS issue.
> 
> Have you set up any load balancer for these management servers. There is a
> section
> http://docs.cloudstack.apache.org/en/4.16.1.0/adminguide/reliability.html#management-server-load-balancing
> which you need to configure so that you will not face issues with HA and 
> agents
> later on.
> 
> 
> You may need to consider setting cookies like below.
> 
> If you are using nginx, try with  "proxy_cookie_path / "/; Secure;
> SameSite=None;";" and a similar thing should work haproxy too.
> 
> I got this reference from a previous discussion on a PR
> https://github.com/apache/cloudstack-primate/pull/898#issuecomment-760227366,
> please refer to it if it helps solve your problem.
> 
> 
> Regards,
> Harikrishna
> 
> From: Andrei Mikhailovsky 
> Sent: Tuesday, July 19, 2022 4:06 PM
> To: users 
> Subject: Re: Unable to login to GUI onto second management server
> 
> Bump please
> 
> 
> 
> 
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Monday, 18 July, 2022 11:45:05
>> Subject: Unable to login to GUI onto second management server
> 
>> Hello,
>>
>> I've recently installed a second management server ACS 4.16.1 following the
>> installation instructions in section Additional Management Servers from the
>> official documentation ( [
>> http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
>> |
>> http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
>> ] ). I've installed the Ubuntu package on the second server of the same 
>> version
>> as the primary management server. Configured the database with
>> cloudstack-setup-databases command followed by running
>> cloudstack-setup-management as per the documentation. There were no errors in
>> the process and the cloudstack-management.service seems to have started just
>> fine. The second ACS management service connected to the same database as the
>> primary one and the login web GUI loaded just fine. The management server 
>> logs
>> seems to show no apparent errors in the startup. The only exceptions I was
>> getting in the logs were from the host agents showing status Disconnected.
>>
>> So, I have tried to login (using domain and ROOT login accounts) to the web 
>> gui
>> of the second management server and the page just hangs after I enter the
>> credentials and press the Login button. I've tried several different browsers
>> at no avail. Supplying the incorrect login credentials produce the error
>> though. The management server logs do not show any errors during the login
>> process. In fact, it seems that all commands produce " is allowed to perform
>> API calls: 0.0.0.0/0,::/0 " message in the logs. There are no exceptions 
>> that I
>> can see either:
>>
>> --
>>
>>
>> 2022-07-18 01:17:33,743 DEBUG [c.c.a.ApiServlet] 
>> (qtp681094281-285:ctx-0cf08734)
>> (logid:94b27

Re: Unable to login to GUI onto second management server

2022-07-25 Thread Andrei Mikhailovsky

Thanks, I will look at the links.



- Original Message -
> From: "Harikrishna Patnala" 
> To: "users" 
> Sent: Wednesday, 20 July, 2022 05:10:13
> Subject: Re: Unable to login to GUI onto second management server

> Hi Andrei,
> 
> This looks to me like a CORS issue.
> 
> Have you set up any load balancer for these management servers. There is a
> section
> http://docs.cloudstack.apache.org/en/4.16.1.0/adminguide/reliability.html#management-server-load-balancing
> which you need to configure so that you will not face issues with HA and 
> agents
> later on.
> 
> 
> You may need to consider setting cookies like below.
> 
> If you are using nginx, try with  "proxy_cookie_path / "/; Secure;
> SameSite=None;";" and a similar thing should work haproxy too.
> 
> I got this reference from a previous discussion on a PR
> https://github.com/apache/cloudstack-primate/pull/898#issuecomment-760227366,
> please refer to it if it helps solve your problem.
> 
> 
> Regards,
> Harikrishna
> 
> From: Andrei Mikhailovsky 
> Sent: Tuesday, July 19, 2022 4:06 PM
> To: users 
> Subject: Re: Unable to login to GUI onto second management server
> 
> Bump please
> 
> 
> 
> 
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Monday, 18 July, 2022 11:45:05
>> Subject: Unable to login to GUI onto second management server
> 
>> Hello,
>>
>> I've recently installed a second management server ACS 4.16.1 following the
>> installation instructions in section Additional Management Servers from the
>> official documentation ( [
>> http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
>> |
>> http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
>> ] ). I've installed the Ubuntu package on the second server of the same 
>> version
>> as the primary management server. Configured the database with
>> cloudstack-setup-databases command followed by running
>> cloudstack-setup-management as per the documentation. There were no errors in
>> the process and the cloudstack-management.service seems to have started just
>> fine. The second ACS management service connected to the same database as the
>> primary one and the login web GUI loaded just fine. The management server 
>> logs
>> seems to show no apparent errors in the startup. The only exceptions I was
>> getting in the logs were from the host agents showing status Disconnected.
>>
>> So, I have tried to login (using domain and ROOT login accounts) to the web 
>> gui
>> of the second management server and the page just hangs after I enter the
>> credentials and press the Login button. I've tried several different browsers
>> at no avail. Supplying the incorrect login credentials produce the error
>> though. The management server logs do not show any errors during the login
>> process. In fact, it seems that all commands produce " is allowed to perform
>> API calls: 0.0.0.0/0,::/0 " message in the logs. There are no exceptions 
>> that I
>> can see either:
>>
>> --
>>
>>
>> 2022-07-18 01:17:33,743 DEBUG [c.c.a.ApiServlet] 
>> (qtp681094281-285:ctx-0cf08734)
>> (logid:94b277ba) ===START=== 192.168.169.251 -- POST
>> 2022-07-18 01:17:33,750 DEBUG [c.c.u.AccountManagerImpl]
>> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) Attempting to log in user:
>> andrei in domain 1
>> 2022-07-18 01:17:33,752 DEBUG [o.a.c.s.a.PBKDF2UserAuthenticator]
>> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) Retrieving user: andrei
>> 2022-07-18 01:17:33,969 DEBUG [c.c.u.AccountManagerImpl]
>> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) CIDRs from which account
>> 'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2,
>> "name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is
>> allowed to perform API calls: 0.0.0.0/0,::/0
>> 2022-07-18 01:17:33,969 DEBUG [c.c.u.AccountManagerImpl]
>> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) User: andrei in domain 1 has
>> successfully logged in
>> 2022-07-18 01:17:34,011 INFO [c.c.a.ApiServer] 
>> (qtp681094281-285:ctx-0cf08734)
>> (logid:94b277ba) Current user logged in under Etc/UTC timezone
>> 2022-07-18 01:17:34,011 INFO [c.c.a.ApiServer] 
>> (qtp681094281-285:ctx-0cf08734)
>> (logid:94b277ba) Timezone offset from UTC is: 0.0
>> 202

Re: Unable to login to GUI onto second management server

2022-07-19 Thread Andrei Mikhailovsky

Bump please



- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 18 July, 2022 11:45:05
> Subject: Unable to login to GUI onto second management server

> Hello,
> 
> I've recently installed a second management server ACS 4.16.1 following the
> installation instructions in section Additional Management Servers from the
> official documentation ( [
> http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
> |
> http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
> ] ). I've installed the Ubuntu package on the second server of the same 
> version
> as the primary management server. Configured the database with
> cloudstack-setup-databases command followed by running
> cloudstack-setup-management as per the documentation. There were no errors in
> the process and the cloudstack-management.service seems to have started just
> fine. The second ACS management service connected to the same database as the
> primary one and the login web GUI loaded just fine. The management server logs
> seems to show no apparent errors in the startup. The only exceptions I was
> getting in the logs were from the host agents showing status Disconnected.
> 
> So, I have tried to login (using domain and ROOT login accounts) to the web 
> gui
> of the second management server and the page just hangs after I enter the
> credentials and press the Login button. I've tried several different browsers
> at no avail. Supplying the incorrect login credentials produce the error
> though. The management server logs do not show any errors during the login
> process. In fact, it seems that all commands produce " is allowed to perform
> API calls: 0.0.0.0/0,::/0 " message in the logs. There are no exceptions that 
> I
> can see either:
> 
> --
> 
> 
> 2022-07-18 01:17:33,743 DEBUG [c.c.a.ApiServlet] 
> (qtp681094281-285:ctx-0cf08734)
> (logid:94b277ba) ===START=== 192.168.169.251 -- POST
> 2022-07-18 01:17:33,750 DEBUG [c.c.u.AccountManagerImpl]
> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) Attempting to log in user:
> andrei in domain 1
> 2022-07-18 01:17:33,752 DEBUG [o.a.c.s.a.PBKDF2UserAuthenticator]
> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) Retrieving user: andrei
> 2022-07-18 01:17:33,969 DEBUG [c.c.u.AccountManagerImpl]
> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) CIDRs from which account
> 'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2,
> "name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is
> allowed to perform API calls: 0.0.0.0/0,::/0
> 2022-07-18 01:17:33,969 DEBUG [c.c.u.AccountManagerImpl]
> (qtp681094281-285:ctx-0cf08734) (logid:94b277ba) User: andrei in domain 1 has
> successfully logged in
> 2022-07-18 01:17:34,011 INFO [c.c.a.ApiServer] (qtp681094281-285:ctx-0cf08734)
> (logid:94b277ba) Current user logged in under Etc/UTC timezone
> 2022-07-18 01:17:34,011 INFO [c.c.a.ApiServer] (qtp681094281-285:ctx-0cf08734)
> (logid:94b277ba) Timezone offset from UTC is: 0.0
> 2022-07-18 01:17:34,015 DEBUG [c.c.a.ApiServlet] 
> (qtp681094281-285:ctx-0cf08734)
> (logid:94b277ba) ===END=== 192.168.169.251 -- POST
> 2022-07-18 01:17:34,123 DEBUG [c.c.a.ApiServlet] 
> (qtp681094281-280:ctx-fafe166c)
> (logid:41d7b4d5) ===START=== 192.168.169.251 -- GET
> listall=true&command=listZones&response=json
> 2022-07-18 01:17:34,133 DEBUG [c.c.a.ApiServer] (qtp681094281-280:ctx-fafe166c
> ctx-2269cc31) (logid:41d7b4d5) CIDRs from which account
> 'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2,
> "name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is
> allowed to perform API calls: 0.0.0.0/0,::/0
> 2022-07-18 01:17:34,133 DEBUG [c.c.a.ApiServlet] 
> (qtp681094281-28:ctx-0906d03f)
> (logid:56b10f23) ===START=== 192.168.169.251 -- GET
> command=listApis&response=json
> 2022-07-18 01:17:34,137 DEBUG [c.c.a.ApiServlet] 
> (qtp681094281-280:ctx-fafe166c
> ctx-2269cc31) (logid:41d7b4d5) ===END=== 192.168.169.251 -- GET
> listall=true&command=listZones&response=json
> 2022-07-18 01:17:34,144 DEBUG [c.c.a.ApiServer] (qtp681094281-28:ctx-0906d03f
> ctx-5a2a7dde) (logid:56b10f23) CIDRs from which account
> 'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2,
> "name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is
> allowed to perform API calls: 0.0.0.0/0,::/0
> 2022-07-18 01:17:34,153 DEBUG [c.c.a.Api

Unable to login to GUI onto second management server

2022-07-18 Thread Andrei Mikhailovsky

Hello, 

I've recently installed a second management server ACS 4.16.1 following the 
installation instructions in section Additional Management Servers from the 
official documentation ( [ 
http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
 | 
http://docs.cloudstack.apache.org/en/4.16.1.0/installguide/management-server/index.html
 ] ). I've installed the Ubuntu package on the second server of the same 
version as the primary management server. Configured the database with 
cloudstack-setup-databases command followed by running 
cloudstack-setup-management as per the documentation. There were no errors in 
the process and the cloudstack-management.service seems to have started just 
fine. The second ACS management service connected to the same database as the 
primary one and the login web GUI loaded just fine. The management server logs 
seems to show no apparent errors in the startup. The only exceptions I was 
getting in the logs were from the host agents showing status Disconnected. 

So, I have tried to login (using domain and ROOT login accounts) to the web gui 
of the second management server and the page just hangs after I enter the 
credentials and press the Login button. I've tried several different browsers 
at no avail. Supplying the incorrect login credentials produce the error 
though. The management server logs do not show any errors during the login 
process. In fact, it seems that all commands produce " is allowed to perform 
API calls: 0.0.0.0/0,::/0 " message in the logs. There are no exceptions that I 
can see either: 

-- 


2022-07-18 01:17:33,743 DEBUG [c.c.a.ApiServlet] 
(qtp681094281-285:ctx-0cf08734) (logid:94b277ba) ===START=== 192.168.169.251 -- 
POST 
2022-07-18 01:17:33,750 DEBUG [c.c.u.AccountManagerImpl] 
(qtp681094281-285:ctx-0cf08734) (logid:94b277ba) Attempting to log in user: 
andrei in domain 1 
2022-07-18 01:17:33,752 DEBUG [o.a.c.s.a.PBKDF2UserAuthenticator] 
(qtp681094281-285:ctx-0cf08734) (logid:94b277ba) Retrieving user: andrei 
2022-07-18 01:17:33,969 DEBUG [c.c.u.AccountManagerImpl] 
(qtp681094281-285:ctx-0cf08734) (logid:94b277ba) CIDRs from which account 
'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2, 
"name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is 
allowed to perform API calls: 0.0.0.0/0,::/0 
2022-07-18 01:17:33,969 DEBUG [c.c.u.AccountManagerImpl] 
(qtp681094281-285:ctx-0cf08734) (logid:94b277ba) User: andrei in domain 1 has 
successfully logged in 
2022-07-18 01:17:34,011 INFO [c.c.a.ApiServer] (qtp681094281-285:ctx-0cf08734) 
(logid:94b277ba) Current user logged in under Etc/UTC timezone 
2022-07-18 01:17:34,011 INFO [c.c.a.ApiServer] (qtp681094281-285:ctx-0cf08734) 
(logid:94b277ba) Timezone offset from UTC is: 0.0 
2022-07-18 01:17:34,015 DEBUG [c.c.a.ApiServlet] 
(qtp681094281-285:ctx-0cf08734) (logid:94b277ba) ===END=== 192.168.169.251 -- 
POST 
2022-07-18 01:17:34,123 DEBUG [c.c.a.ApiServlet] 
(qtp681094281-280:ctx-fafe166c) (logid:41d7b4d5) ===START=== 192.168.169.251 -- 
GET listall=true&command=listZones&response=json 
2022-07-18 01:17:34,133 DEBUG [c.c.a.ApiServer] (qtp681094281-280:ctx-fafe166c 
ctx-2269cc31) (logid:41d7b4d5) CIDRs from which account 
'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2, 
"name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is 
allowed to perform API calls: 0.0.0.0/0,::/0 
2022-07-18 01:17:34,133 DEBUG [c.c.a.ApiServlet] (qtp681094281-28:ctx-0906d03f) 
(logid:56b10f23) ===START=== 192.168.169.251 -- GET 
command=listApis&response=json 
2022-07-18 01:17:34,137 DEBUG [c.c.a.ApiServlet] (qtp681094281-280:ctx-fafe166c 
ctx-2269cc31) (logid:41d7b4d5) ===END=== 192.168.169.251 -- GET 
listall=true&command=listZones&response=json 
2022-07-18 01:17:34,144 DEBUG [c.c.a.ApiServer] (qtp681094281-28:ctx-0906d03f 
ctx-5a2a7dde) (logid:56b10f23) CIDRs from which account 
'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2, 
"name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is 
allowed to perform API calls: 0.0.0.0/0,::/0 
2022-07-18 01:17:34,153 DEBUG [c.c.a.ApiServlet] 
(qtp681094281-318:ctx-fc79b118) (logid:8a349f6d) ===START=== 192.168.169.251 -- 
GET command=cloudianIsEnabled&response=json 
2022-07-18 01:17:34,163 DEBUG [c.c.a.ApiServer] (qtp681094281-318:ctx-fc79b118 
ctx-40fd8f3a) (logid:8a349f6d) CIDRs from which account 
'Acct[06eedc2c-65f2-11e3-9bd1-d8d38559b2d0-admin_group] -- Account {"id": 2, 
"name": "admin_group", "uuid": "06eedc2c-65f2-11e3-9bd1-d8d38559b2d0"}' is 
allowed to perform API calls: 0.0.0.0/0,::/0 
2022-07-18 01:17:34,168 DEBUG [c.c.a.ApiServlet] (qtp681094281-318:ctx-fc79b118 
ctx-40fd8f3a) (logid:8a349f6d) ===END=== 192.168.169.251 -- GET 
command=cloudianIsEnabled&response=json 
2022-07-18 01:17:34,176 DEBUG [c.c.a.ApiServlet] (qtp681094281-34:ctx-20a51695) 
(logid:2436a576) ===START=== 192.168.12022-07-1

Management Server - migrating or adding additional servers

2022-05-13 Thread Andrei Mikhailovsky

Hello CloudStack users, 

Currently I am running a single management server (4.16.1.0) setup on Ubuntu 
servers. Both the management server and the database (mysql-server - 
5.7.35-0ubuntu0.18.04.1) are running on the same physical server. We've had 
this CloudStack setup for around 10 years. We are planning to decommission the 
physical server and thus I am looking at one of the following scenarios: 

1. Adding a second management server with the second database server. Once this 
is done we would decommission the first management and database server. 
2. Migrating the management server and the database to a new server and adding 
the second management and database server for HA/redundancy. 

In an ideal scenario, I would like to decommission the current server and have 
two management and database servers. 

I have been looking at documentation for acs 4.16 and I don't really see any 
notes or information on how to migrate the management server or how to add the 
second management server with the second database server to an existing 
CloudStack setup with running networks, vms, etc. The guides are mainly focused 
on setting up the new management server(s). 

Could someone from the community share with me the correct process of adding 
additional database and management servers to an existing setup and/or 
migrating the management/database servers to a new physical server? 

Many thanks 

Andrei

Re: snapshot compression

2021-07-13 Thread Andrei Mikhailovsky

Andrija, I think qemu-img supports compressing the image while it's being 
exported. I do not believe it is done during cloudstack snapshot process as I 
am seeing very high compression averages on our secondary storage backend where 
we've enabled compression with native btrfs capabilities. We've been averaging 
around 60% compression on average for our snapshots. If similar level of 
compression could be achieved at the export phase, it should free up network 
resources and possibly speed up the snapshotting process.

Andrei

- Original Message -
> From: "Andrija Panic" 
> To: "users" 
> Sent: Monday, 12 July, 2021 17:45:25
> Subject: Re: snapshot compression

> What did you have in mind specifically?
> 
> Ceph volume snapshots export works by using qemu-img to convert raw/RBD
> volume to a qcow2 (if not mistaken, or raw) format on the Secondary Storage.
> 
> Bestm
> 
> On Mon, 12 Jul 2021 at 09:35, Daan Hoogland  wrote:
> 
>> Andrei, good feature request. I don't think it is implemented (never heard
>> of it)
>>
>> On Thu, Jul 8, 2021 at 6:06 PM Andrei Mikhailovsky
>>  wrote:
>>
>> > Hello everyone,
>> >
>> > Is there a way to enable compression on the KVM+Ceph snapshot volumes
>> when
>> > they are being copied to the secondary storage? As far as I can see, this
>> > useful feature is not enabled by default, which could unnecessarily waste
>> > both network and storage resources. It could save tons of space.
>> >
>> > Any idea on how to enable it?
>> >
>> > Cheers
>> >
>> > Andrei
>> >
>>
>>
>> --
>> Daan
>>
> 
> 
> --
> 
> Andrija Panić

snapshot compression

2021-07-08 Thread Andrei Mikhailovsky

Hello everyone, 

Is there a way to enable compression on the KVM+Ceph snapshot volumes when they 
are being copied to the secondary storage? As far as I can see, this useful 
feature is not enabled by default, which could unnecessarily waste both network 
and storage resources. It could save tons of space. 

Any idea on how to enable it? 

Cheers 

Andrei

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Andrei Mikhailovsky

Hi Suresh,

This is what I've answered on the db tables:

The table snapshots has NULL under the removed column in all snapshots that 
I've
removed. The table snapshot_store_ref has no such column, but the state 
shown
as Destroyed.


I've done some more checking under the ssvm itself, which look ok:


root@s-2536-VM:/usr/local/cloud/systemvm# 
/usr/local/cloud/systemvm/ssvm-check.sh

First DNS server is  192.168.169.254
PING 192.168.169.254 (192.168.169.254): 56 data bytes
64 bytes from 192.168.169.254: icmp_seq=0 ttl=64 time=0.520 ms
64 bytes from 192.168.169.254: icmp_seq=1 ttl=64 time=0.294 ms
--- 192.168.169.254 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.294/0.407/0.520/0.113 ms
Good: Can ping DNS server

Good: DNS resolves cloudstack.apache.org

nfs is currently mounted
Mount point is /mnt/SecStorage/ceb27169-9a58-32ef-81b4-33b0b12e9aa2
Good: Can write to mount point

Management server is 192.168.169.13. Checking connectivity.
Good: Can connect to management server 192.168.169.13 port 8250

Good: Java process is running

Tests Complete. Look for ERROR or WARNING above.


The management server does show errors like these, without any further details:

2021-06-17 10:31:06,197 DEBUG [c.c.s.StorageManagerImpl] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete 
snapshot: 55183 from storage
2021-06-17 10:31:06,280 DEBUG [o.a.c.s.s.SnapshotObject] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update 
state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new 
state from Destroyed via DestroyRequested
2021-06-17 10:31:06,281 DEBUG [c.c.s.StorageManagerImpl] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to delete 
snapshot: 84059 from storage
2021-06-17 10:31:06,363 DEBUG [o.a.c.s.s.SnapshotObject] 
(StorageManager-Scavenger-1:ctx-b9b038de) (logid:d96d09c4) Failed to update 
state:com.cloud.utils.fsm.NoTransitionException: Unable to transition to a new 
state from Destroyed via DestroyRequested


Regarding the bug 4797. I can't really comment as it has very little technical 
details without the management log errors, etc. But essentially, at the high 
level, the snapshots are not deleted from the backend in my case, just like in 
the bug 4797.


TBH, I am very much surprised that a bug in such an important function of ACS 
has slipped through the testing methods for the 4.15.0 release and despite 
being discovered over 3 months ago, it hasn't been scheduled for the fix in 
4.15.1 bug fix release. Does that sound right to you? I think this issue should 
be revisited and corrected as it will cause a fill up of the secondary storage 
and ultimately cause all sorts of issues with creation of snapshots.

Andrei


- Original Message -
> From: "Suresh Anaparti" 
> To: "users" 
> Sent: Thursday, 17 June, 2021 11:16:59
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Have you checked the 'status' and 'removed' timestamp in snapshots table, and
> 'state' in snapshot_store_ref table for these snapshots.
> 
> Similar issue logged (by Ed, as mentioned in his email) here:
> https://github.com/apache/cloudstack/issues/4797. Is it the same issue?
> 
> Regards,
> Suresh
> 
>On 17/06/21, 2:18 PM, "Andrei Mikhailovsky"  wrote:
> 
>Hi Suresh, Please see below the answers to your questions.
> 
>
> 
> 
> - Original Message -
>> From: "Suresh Anaparti" 
>> To: "users" 
>> Sent: Thursday, 17 June, 2021 06:36:27
>> Subject: Re: Snapshots are not working after upgrading to 4.15.0
> 
>> Hi Andrei,
>> 
>> Can you check if the storage garbage collector is enabled or not in your 
> env
>> (specified using the global setting 'storage.cleanup.enabled'). If it is
>> enabled, check the interval & delay setting: 'storage.cleanup.interval' 
> and
>> 'storage.cleanup.delay', and see the logs to confirm cleanup is 
> performed or
>> not.
> 
>storage.cleanup.enabled is true
>storage.cleanup.interval is 3600
>storage.cleanup.delay is 360086400
> 
>> 
>> Also, check the snapshot status / state in snapshots & 
> snapshot_store_ref tables
>> for the snapshots that are not deleted during the cleanup. Is 'removed'
>    > timesta

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-17 Thread Andrei Mikhailovsky

Hi Suresh, Please see below the answers to your questions.

- Original Message -
> From: "Suresh Anaparti" 
> To: "users" 
> Sent: Thursday, 17 June, 2021 06:36:27
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Can you check if the storage garbage collector is enabled or not in your env
> (specified using the global setting 'storage.cleanup.enabled'). If it is
> enabled, check the interval & delay setting: 'storage.cleanup.interval' and
> 'storage.cleanup.delay', and see the logs to confirm cleanup is performed or
> not.

storage.cleanup.enabled is true
storage.cleanup.interval is 3600
storage.cleanup.delay is 360086400

> 
> Also, check the snapshot status / state in snapshots & snapshot_store_ref 
> tables
> for the snapshots that are not deleted during the cleanup. Is 'removed'
> timestamp set for them in snapshots table?
> 


The table snapshots has NULL under the removed column in all snapshots that 
I've removed. The table snapshot_store_ref has no such column, but the state 
shown as Destroyed.




> Regards,
> Suresh
> 
>On 16/06/21, 9:46 PM, "Andrei Mikhailovsky"  wrote:
> 
>Hello,
> 
>I've done some more investigation and indeed, the snapshots were not taken
>because the secondary storage was over 90% used. I have started cleaning 
> some
>of the older volumes and noticed another problem. After removing snapshots,
>they do not seem to be removed from the secondary storage. I've removed all
>snapshots over 24 hours ago and it looks like  the disk space hasn't been 
> freed
>up at all.
> 
>Looks like there are issues with snapshotting function after all.
> 
>Andrei
> 
> 
> 
>
> 
> 
> - Original Message -
>> From: "Harikrishna Patnala" 
>> To: "users" 
>> Sent: Tuesday, 8 June, 2021 03:33:57
>> Subject: Re: Snapshots are not working after upgrading to 4.15.0
> 
>> Hi Andrei,
>> 
>> Can you check the following things and let us know?
>> 
>> 
>>  1.  Can you try creating a new volume and then create snapshot of that, 
> to check
>>  if this an issue with old entries
>>  2.  For the snapshots which are failing can you check if you are seeing 
> any
>>  error messages like this "Can't find an image storage in zone with less 
> than".
>>  This is to check if secondary storage free space check failed.
>>  3.  For the snapshots which are failing and if it is delta snapshot can 
> you
>>  check if its parent's snapshot entry exists in "snapshot_store_ref" 
> table with
>>  'parent_snapshot_id' of the current snapshot with 'store_role' "Image". 
> This is
>>  to find the secondary storage where the parent snapshot backup is 
> located.
>> 
>> Regards,
>> Harikrishna
>> 
>> From: Andrei Mikhailovsky 
>> Sent: Monday, June 7, 2021 7:00 PM
>> To: users 
>> Subject: Snapshots are not working after upgrading to 4.15.0
>> 
>> Hello everyone,
>> 
>> I am having an issue with volume snapshots since I've upgraded to 
> 4.15.0. None
>> of the volumes are being snapshotted regardless if the snapshot is 
> initiated
>> manually or from the schedule. The strange thing is that if I manually 
> take the
>> snapshot, the GUI shows Success status, but the Storage>Snapshots show 
> an Error
>> status. Here is what I see in the management server logs:
>> 
>> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Done
>> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
>> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Remove
>> job-86143 from job monitoring
>> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
>> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy 
> snapshot
>> com.cloud.utils.exception.CloudRuntimeException: can not find an image 
> stores
>> at
>> 
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
>> at
>> 
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrat

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-16 Thread Andrei Mikhailovsky

Hello,

I've done some more investigation and indeed, the snapshots were not taken 
because the secondary storage was over 90% used. I have started cleaning some 
of the older volumes and noticed another problem. After removing snapshots, 
they do not seem to be removed from the secondary storage. I've removed all 
snapshots over 24 hours ago and it looks like  the disk space hasn't been freed 
up at all.

Looks like there are issues with snapshotting function after all.

Andrei



- Original Message -
> From: "Harikrishna Patnala" 
> To: "users" 
> Sent: Tuesday, 8 June, 2021 03:33:57
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Can you check the following things and let us know?
> 
> 
>  1.  Can you try creating a new volume and then create snapshot of that, to 
> check
>  if this an issue with old entries
>  2.  For the snapshots which are failing can you check if you are seeing any
>  error messages like this "Can't find an image storage in zone with less 
> than".
>  This is to check if secondary storage free space check failed.
>  3.  For the snapshots which are failing and if it is delta snapshot can you
>  check if its parent's snapshot entry exists in "snapshot_store_ref" table 
> with
>  'parent_snapshot_id' of the current snapshot with 'store_role' "Image". This 
> is
>  to find the secondary storage where the parent snapshot backup is located.
> 
> Regards,
> Harikrishna
> 
> From: Andrei Mikhailovsky 
> Sent: Monday, June 7, 2021 7:00 PM
> To: users 
> Subject: Snapshots are not working after upgrading to 4.15.0
> 
> Hello everyone,
> 
> I am having an issue with volume snapshots since I've upgraded to 4.15.0. None
> of the volumes are being snapshotted regardless if the snapshot is initiated
> manually or from the schedule. The strange thing is that if I manually take 
> the
> snapshot, the GUI shows Success status, but the Storage>Snapshots show an 
> Error
> status. Here is what I see in the management server logs:
> 
> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Done
> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Remove
> job-86143 from job monitoring
> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy snapshot
> com.cloud.utils.exception.CloudRuntimeException: can not find an image stores
> at
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
> at
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
> at
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> 2021-06-07 13:55:20,152 DEBUG [c.c.s.s.SnapshotManagerImpl]
> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Backing up of snapshot
> failed, for snapshot with ID 53531, left with 2 more attempts
> 
> 
> I've checked and the Secondary storage is configured and visible in the GUI. I
> can also mount it manually from the management server and a couple of host
> servers that I've tested. In addition, I can successfully upload an ISO image
> and that registers just fine and I can create new VMs using the newly uploaded
> ISO image.
> 
> I've had no such problems with 4.13.x ACS, so the issue seems to have been
> introduced after doing the upgrade to 4.15.0.
> 
> Could you please let me know how do I fix the issue?
> 
> Cheers
> 
> andrei

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-14 Thread Andrei Mikhailovsky

Thanks Slavka,

I will attempt to clean up some old snaps and see if the backups start working 
again.

Andrei

- Original Message -
> From: "Slavka Peleva" 
> To: "users" 
> Sent: Monday, 14 June, 2021 13:28:01
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi, Andrei,
> 
> After a quick check of the code, I guess that the backup of the snapshot
> fails because the secondary storage space should be less than 90%, and you
> pointed that yours is 90.6% full.
> 
> Regards,
> Slavka
> 
> On Mon, Jun 14, 2021 at 1:10 PM Andrei Mikhailovsky
>  wrote:
> 
>> Hi Harikrishna,
>>
>> I've done some more testing testing just now. Please see below the answers
>> to your questions/comments:
>>
>>
>>
>> - Original Message -
>> > From: "Harikrishna Patnala" 
>> > To: "users" 
>> > Sent: Tuesday, 8 June, 2021 03:33:57
>> > Subject: Re: Snapshots are not working after upgrading to 4.15.0
>>
>> > Hi Andrei,
>> >
>> > Can you check the following things and let us know?
>> >
>> >
>> >  1.  Can you try creating a new volume and then create snapshot of that,
>> to check
>> >  if this an issue with old entries
>>
>> AM: I've done testing with a newly created image which was attached to a
>> vm. The same problem happens with new images as well as old images. Please
>> see the link below for the management server log.
>>
>>
>> https://zerobin.net/?5781e4b65d9e3605#+GtIC7JBtp70Q0cw65cypJDiSyba/r/JldRsAyOI8l4=
>>
>>
>> >  2.  For the snapshots which are failing can you check if you are seeing
>> any
>> >  error messages like this "Can't find an image storage in zone with less
>> than".
>> >  This is to check if secondary storage free space check failed.
>>
>>
>> AM: I do not see any such message in the logs. tried grepping the logs for
>> a couple of weeks and nothing comes up. Having said this, the secondary
>> storage is about 90.6% full.
>>
>>
>> >  3.  For the snapshots which are failing and if it is delta snapshot can
>> you
>> >  check if its parent's snapshot entry exists in "snapshot_store_ref"
>> table with
>> >  'parent_snapshot_id' of the current snapshot with 'store_role' "Image".
>> This is
>> >  to find the secondary storage where the parent snapshot backup is
>> located.
>> >
>>
>> AM: all snapshots are failing, not just selective few. Some volumes are
>> brand new, as I've indicated above, others do have previous snapshots. I
>> only have a single secondary storage, so all snaps should be in one place.
>>
>>
>>
>>
>> > Regards,
>> > Harikrishna
>> > 
>> > From: Andrei Mikhailovsky 
>> > Sent: Monday, June 7, 2021 7:00 PM
>> > To: users 
>> > Subject: Snapshots are not working after upgrading to 4.15.0
>> >
>> > Hello everyone,
>> >
>> > I am having an issue with volume snapshots since I've upgraded to
>> 4.15.0. None
>> > of the volumes are being snapshotted regardless if the snapshot is
>> initiated
>> > manually or from the schedule. The strange thing is that if I manually
>> take the
>> > snapshot, the GUI shows Success status, but the Storage>Snapshots show
>> an Error
>> > status. Here is what I see in the management server logs:
>> >
>> > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01)
>> Done
>> > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
>> > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01)
>> Remove
>> > job-86143 from job monitoring
>> > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
>> > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy
>> snapshot
>> > com.cloud.utils.exception.CloudRuntimeException: can not find an image
>> stores
>> > at
>> >
>> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
>> > at
>> >
>> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
>> > at
>>

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-14 Thread Andrei Mikhailovsky

Hi Harikrishna,

I've done some more testing testing just now. Please see below the answers to 
your questions/comments:



- Original Message -
> From: "Harikrishna Patnala" 
> To: "users" 
> Sent: Tuesday, 8 June, 2021 03:33:57
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Can you check the following things and let us know?
> 
> 
>  1.  Can you try creating a new volume and then create snapshot of that, to 
> check
>  if this an issue with old entries

AM: I've done testing with a newly created image which was attached to a vm. 
The same problem happens with new images as well as old images. Please see the 
link below for the management server log.

https://zerobin.net/?5781e4b65d9e3605#+GtIC7JBtp70Q0cw65cypJDiSyba/r/JldRsAyOI8l4=


>  2.  For the snapshots which are failing can you check if you are seeing any
>  error messages like this "Can't find an image storage in zone with less 
> than".
>  This is to check if secondary storage free space check failed.


AM: I do not see any such message in the logs. tried grepping the logs for a 
couple of weeks and nothing comes up. Having said this, the secondary storage 
is about 90.6% full.


>  3.  For the snapshots which are failing and if it is delta snapshot can you
>  check if its parent's snapshot entry exists in "snapshot_store_ref" table 
> with
>  'parent_snapshot_id' of the current snapshot with 'store_role' "Image". This 
> is
>  to find the secondary storage where the parent snapshot backup is located.
> 

AM: all snapshots are failing, not just selective few. Some volumes are brand 
new, as I've indicated above, others do have previous snapshots. I only have a 
single secondary storage, so all snaps should be in one place.




> Regards,
> Harikrishna
> 
> From: Andrei Mikhailovsky 
> Sent: Monday, June 7, 2021 7:00 PM
> To: users 
> Subject: Snapshots are not working after upgrading to 4.15.0
> 
> Hello everyone,
> 
> I am having an issue with volume snapshots since I've upgraded to 4.15.0. None
> of the volumes are being snapshotted regardless if the snapshot is initiated
> manually or from the schedule. The strange thing is that if I manually take 
> the
> snapshot, the GUI shows Success status, but the Storage>Snapshots show an 
> Error
> status. Here is what I see in the management server logs:
> 
> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Done
> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Remove
> job-86143 from job monitoring
> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy snapshot
> com.cloud.utils.exception.CloudRuntimeException: can not find an image stores
> at
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
> at
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
> at
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> 2021-06-07 13:55:20,152 DEBUG [c.c.s.s.SnapshotManagerImpl]
> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Backing up of snapshot
> failed, for snapshot with ID 53531, left with 2 more attempts
> 
> 
>

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-14 Thread Andrei Mikhailovsky

Oh, should have mentioned that in the original post. I am using KVM hypervisor 
with ceph/rbd as the primary storage and nfs as the secondary storage. 

Andrei 

> From: "Andrija Panic" 
> To: "users" , "andrei" 
> Sent: Tuesday, 8 June, 2021 21:38:28
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> [ mailto:and...@arhont.com | @Andrei Mikhailovsky ] can you advise which
> hypervisor (and version) you are using, what primary storage - let's see if 
> the
> same is true in 4.15.1 (voting happening right now - feel free to test and 
> vote
> as well, please)

> Best,

> On Tue, 8 Jun 2021 at 14:25, Andrei Mikhailovsky 
> wrote:

>> Thanks for the suggestions, Harikrishna. I will check it and revert back.

>> Andrei

>> - Original Message -
>>> From: "Harikrishna Patnala" < [ mailto:harikrishna.patn...@shapeblue.com |
>> > harikrishna.patn...@shapeblue.com ] >
>>> To: "users" < [ mailto:users@cloudstack.apache.org | 
>>> users@cloudstack.apache.org
>> > ] >
>> > Sent: Tuesday, 8 June, 2021 03:33:57
>> > Subject: Re: Snapshots are not working after upgrading to 4.15.0

>> > Hi Andrei,

>> > Can you check the following things and let us know?


>> > 1. Can you try creating a new volume and then create snapshot of that, to 
>> > check
>> > if this an issue with old entries
>> > 2. For the snapshots which are failing can you check if you are seeing any
>> > error messages like this "Can't find an image storage in zone with less 
>> > than".
>> > This is to check if secondary storage free space check failed.
>> > 3. For the snapshots which are failing and if it is delta snapshot can you
>> > check if its parent's snapshot entry exists in "snapshot_store_ref" table 
>> > with
>> > 'parent_snapshot_id' of the current snapshot with 'store_role' "Image". 
>> > This is
>> > to find the secondary storage where the parent snapshot backup is located.

>> > Regards,
>> > Harikrishna
>> > 
>> > From: Andrei Mikhailovsky 
>> > Sent: Monday, June 7, 2021 7:00 PM
>>> To: users < [ mailto:users@cloudstack.apache.org | 
>>> users@cloudstack.apache.org ]
>> > >
>> > Subject: Snapshots are not working after upgrading to 4.15.0

>> > Hello everyone,

>> > I am having an issue with volume snapshots since I've upgraded to 4.15.0. 
>> > None
>> > of the volumes are being snapshotted regardless if the snapshot is 
>> > initiated
>> > manually or from the schedule. The strange thing is that if I manually 
>> > take the
>> > snapshot, the GUI shows Success status, but the Storage>Snapshots show an 
>> > Error
>> > status. Here is what I see in the management server logs:

>> > 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
>> > Done
>> > executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
>> > 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
>> > (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
>> > Remove
>> > job-86143 from job monitoring
>> > 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
>> > (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy 
>> > snapshot
>> > com.cloud.utils.exception.CloudRuntimeException: can not find an image 
>> > stores
>> > at
>> > org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
>> > at
>> > org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
>> > at
>> > com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
>> > at
>> > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
>> > at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
>> > at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
>> > at
>> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
>> > at
>> > org.ap

Re: Snapshots are not working after upgrading to 4.15.0

2021-06-08 Thread Andrei Mikhailovsky

Thanks for the suggestions, Harikrishna. I will check it and revert back.

Andrei



- Original Message -
> From: "Harikrishna Patnala" 
> To: "users" 
> Sent: Tuesday, 8 June, 2021 03:33:57
> Subject: Re: Snapshots are not working after upgrading to 4.15.0

> Hi Andrei,
> 
> Can you check the following things and let us know?
> 
> 
>  1.  Can you try creating a new volume and then create snapshot of that, to 
> check
>  if this an issue with old entries
>  2.  For the snapshots which are failing can you check if you are seeing any
>  error messages like this "Can't find an image storage in zone with less 
> than".
>  This is to check if secondary storage free space check failed.
>  3.  For the snapshots which are failing and if it is delta snapshot can you
>  check if its parent's snapshot entry exists in "snapshot_store_ref" table 
> with
>  'parent_snapshot_id' of the current snapshot with 'store_role' "Image". This 
> is
>  to find the secondary storage where the parent snapshot backup is located.
> 
> Regards,
> Harikrishna
> 
> From: Andrei Mikhailovsky 
> Sent: Monday, June 7, 2021 7:00 PM
> To: users 
> Subject: Snapshots are not working after upgrading to 4.15.0
> 
> Hello everyone,
> 
> I am having an issue with volume snapshots since I've upgraded to 4.15.0. None
> of the volumes are being snapshotted regardless if the snapshot is initiated
> manually or from the schedule. The strange thing is that if I manually take 
> the
> snapshot, the GUI shows Success status, but the Storage>Snapshots show an 
> Error
> status. Here is what I see in the management server logs:
> 
> 2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Done
> executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143
> 2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor]
> (Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) 
> Remove
> job-86143 from job monitoring
> 2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy snapshot
> com.cloud.utils.exception.CloudRuntimeException: can not find an image stores
> at
> org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
> at
> org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
> at
> com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
> at
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
> at
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
> at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> 2021-06-07 13:55:20,152 DEBUG [c.c.s.s.SnapshotManagerImpl]
> (BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Backing up of snapshot
> failed, for snapshot with ID 53531, left with 2 more attempts
> 
> 
> I've checked and the Secondary storage is configured and visible in the GUI. I
> can also mount it manually from the management server and a couple of host
> servers that I've tested. In addition, I can successfully upload an ISO image
> and that registers just fine and I can create new VMs using the newly uploaded
> ISO image.
> 
> I've had no such problems with 4.13.x ACS, so the issue seems to have been
> introduced after doing the upgrade to 4.15.0.
> 
> Could you please let me know how do I fix the issue?
> 
> Cheers
> 
> andrei

Snapshots are not working after upgrading to 4.15.0

2021-06-07 Thread Andrei Mikhailovsky

Hello everyone, 

I am having an issue with volume snapshots since I've upgraded to 4.15.0. None 
of the volumes are being snapshotted regardless if the snapshot is initiated 
manually or from the schedule. The strange thing is that if I manually take the 
snapshot, the GUI shows Success status, but the Storage>Snapshots show an Error 
status. Here is what I see in the management server logs: 

2021-06-07 13:55:20,022 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] 
(Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Done 
executing com.cloud.vm.VmWorkTakeVolumeSnapshot for job-86143 
2021-06-07 13:55:20,024 INFO [o.a.c.f.j.i.AsyncJobMonitor] 
(Work-Job-Executor-81:ctx-08dd4222 job-86141/job-86143) (logid:be34ce01) Remove 
job-86143 from job monitoring 
2021-06-07 13:55:20,094 DEBUG [o.a.c.s.s.SnapshotServiceImpl] 
(BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Failed to copy snapshot 
com.cloud.utils.exception.CloudRuntimeException: can not find an image stores 
at 
org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.backupSnapshot(SnapshotServiceImpl.java:271)
 
at 
org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.backupSnapshot(DefaultSnapshotStrategy.java:171)
 
at 
com.cloud.storage.snapshot.SnapshotManagerImpl$BackupSnapshotTask.runInContext(SnapshotManagerImpl.java:1238)
 
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
 
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
 
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
 
at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
 
at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
 
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 
at java.base/java.lang.Thread.run(Thread.java:829) 
2021-06-07 13:55:20,152 DEBUG [c.c.s.s.SnapshotManagerImpl] 
(BackupSnapshotTask-3:ctx-744796da) (logid:607dbb0e) Backing up of snapshot 
failed, for snapshot with ID 53531, left with 2 more attempts 


I've checked and the Secondary storage is configured and visible in the GUI. I 
can also mount it manually from the management server and a couple of host 
servers that I've tested. In addition, I can successfully upload an ISO image 
and that registers just fine and I can create new VMs using the newly uploaded 
ISO image. 

I've had no such problems with 4.13.x ACS, so the issue seems to have been 
introduced after doing the upgrade to 4.15.0. 

Could you please let me know how do I fix the issue? 

Cheers 

andrei

Re: Problems after upgrade from 4.13.1 to 4.15.0

2021-03-29 Thread Andrei Mikhailovsky

Hello,

So, the connector version which came with CloudStack is indeed 8.0.19.

I have been doing some more testing and troubleshooting over the weekend. I've 
managed to resolve the problem after reinstalling cloudstack-* packages on the 
management server and restarting the server. The sql related error is gone.


I did have tons of issues with the System VMs and virtual routers. They were 
simply not starting properly. Doing the template update of the router or simply 
delete the old router/systemvm and creating a new one was not working either. I 
could see the vm is created on the host, but it wasn't responsive to the virsh 
console commands nor it was responding to pings on either of the ip addresses. 
the CloudStack was showing the routers / System VMs in Starting state for a 
very long time. There were no errors in the management log. A very strange 
thing indeed. I've tried clearing the entries in sync_queque, async_job and 
vm_work_job tables and restarting the management server, but that didn't help.

After a bunch of experimentation, I've found a work around, which seems to have 
worked. I had to update the cloudstack-agent on all host servers in the cluster 
before the management server started properly creating the systemvm and virtual 
router vms. It took me a bit of time to come to that. I think it would greatly 
help other people if something like this is mentioned in the upgrade guide.

Anyways, I will keep updated of any other issues that I discover after 
upgrading to 4.15, if any.

Thanks for your help

andrei

- Original Message -
> From: "Harikrishna Patnala" 
> To: "users" 
> Sent: Monday, 29 March, 2021 08:16:44
> Subject: Re: Problems after upgrade from 4.13.1 to 4.15.0

> Hi Andrei,
> 
> Since this is an upgrade, I assumed you did not face this issue on 4.13.1.
> 
> Quick googling on this specific error shows some problems with
> mysql-connector-java version. It is supposed to be 8.0.19. Can you please 
> check
> what versions of mysql and mysql java connectors are in use after upgrading to
> 4.15.
> 
> Thanks,
> Harikrishna
> 
> From: Andrei Mikhailovsky 
> Sent: Sunday, March 28, 2021 7:20 AM
> To: users 
> Subject: Problems after upgrade from 4.13.1 to 4.15.0
> 
> Hello everyone,
> 
> I've updated my CloudStack management server and an agent from 4.13.1 to 
> 4.15.0.
> I am running Ubuntu 18.04 server. Following the instructions in the
> documentation on the upgrade steps, the management server and the agent 
> started
> ok. I've logged in to the new GUI and at first things seem ok. However, I've
> noticed that I can't perform any vm / systemvm related operations. Things like
> start/stop/migrate/shutdown vms produce a 503 error. Also, I wasn't able to 
> add
> a host running 4.15.0 agent. Inspecting the management server logs I get the
> following exception, which happens with pretty much any vm related action.
> 
> 
> --
> 
> 
> 2021-03-28 02:28:46,811 DEBUG [c.c.v.UserVmManagerImpl]
> (API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) 
> Found
> no ongoing snapshots on volumes associated with th
> e vm with id 695
> 2021-03-28 02:28:46,813 DEBUG [c.c.v.UserVmManagerImpl]
> (API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488)
> Collect vm disk statistics from host before stopping VM
> 2021-03-28 02:28:46,879 DEBUG [c.c.a.t.Request] (AgentManager-Handler-14:null)
> (logid:) Seq 121-4330211041716731948: Processing: { Ans: , MgmtId:
> 115129173025114, via: 121, Ver: v1, Fla
> gs: 10,
> [{"com.cloud.agent.api.GetVmDiskStatsAnswer":{"hostName":"ais-cloudhost13","vmDiskStatsMap":{"i-2-695-VM":[]},"result":"true","details":"","wait":"0"}}]
> }
> 2021-03-28 02:28:46,879 DEBUG [c.c.a.t.Request] 
> (API-Job-Executor-6:ctx-cfe07062
> job-81025 ctx-2387e198) (logid:c5396488) Seq 121-4330211041716731948: 
> Received:
> { Ans: , MgmtId: 1151291
> 73025114, via: 121(ais-cloudhost13), Ver: v1, Flags: 10, { 
> GetVmDiskStatsAnswer
> } }
> 2021-03-28 02:28:46,879 DEBUG [c.c.a.m.AgentManagerImpl]
> (API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488)
> Details from executing class com.cloud.agent.api.GetVmD
> iskStatsCommand:
> 2021-03-28 02:28:46,880 DEBUG [c.c.v.UserVmManagerImpl]
> (API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488)
> Collect vm network statistics from host before stopping
> Vm
> 2021-03-28 02:28:46,897 DEBUG [c.c.a.t.Request] (AgentManager-Handler-1:null)
> (logid:) Seq 121-4330211041716731949: Processing: { Ans: , MgmtId:
> 1151291

Problems after upgrade from 4.13.1 to 4.15.0

2021-03-27 Thread Andrei Mikhailovsky

Hello everyone, 

I've updated my CloudStack management server and an agent from 4.13.1 to 
4.15.0. I am running Ubuntu 18.04 server. Following the instructions in the 
documentation on the upgrade steps, the management server and the agent started 
ok. I've logged in to the new GUI and at first things seem ok. However, I've 
noticed that I can't perform any vm / systemvm related operations. Things like 
start/stop/migrate/shutdown vms produce a 503 error. Also, I wasn't able to add 
a host running 4.15.0 agent. Inspecting the management server logs I get the 
following exception, which happens with pretty much any vm related action. 


-- 


2021-03-28 02:28:46,811 DEBUG [c.c.v.UserVmManagerImpl] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) Found 
no ongoing snapshots on volumes associated with th 
e vm with id 695 
2021-03-28 02:28:46,813 DEBUG [c.c.v.UserVmManagerImpl] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) 
Collect vm disk statistics from host before stopping VM 
2021-03-28 02:28:46,879 DEBUG [c.c.a.t.Request] (AgentManager-Handler-14:null) 
(logid:) Seq 121-4330211041716731948: Processing: { Ans: , MgmtId: 
115129173025114, via: 121, Ver: v1, Fla 
gs: 10, 
[{"com.cloud.agent.api.GetVmDiskStatsAnswer":{"hostName":"ais-cloudhost13","vmDiskStatsMap":{"i-2-695-VM":[]},"result":"true","details":"","wait":"0"}}]
 } 
2021-03-28 02:28:46,879 DEBUG [c.c.a.t.Request] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) Seq 
121-4330211041716731948: Received: { Ans: , MgmtId: 1151291 
73025114, via: 121(ais-cloudhost13), Ver: v1, Flags: 10, { GetVmDiskStatsAnswer 
} } 
2021-03-28 02:28:46,879 DEBUG [c.c.a.m.AgentManagerImpl] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) 
Details from executing class com.cloud.agent.api.GetVmD 
iskStatsCommand: 
2021-03-28 02:28:46,880 DEBUG [c.c.v.UserVmManagerImpl] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) 
Collect vm network statistics from host before stopping 
Vm 
2021-03-28 02:28:46,897 DEBUG [c.c.a.t.Request] (AgentManager-Handler-1:null) 
(logid:) Seq 121-4330211041716731949: Processing: { Ans: , MgmtId: 
115129173025114, via: 121, Ver: v1, Flag 
s: 10, 
[{"com.cloud.agent.api.GetVmNetworkStatsAnswer":{"hostName":"ais-cloudhost13","vmNetworkStatsMap":{"i-2-695-VM":[{"vmName":"i-2-695-VM","macAddress":"02:00:20:a5:00:01","bytesSent
 
":"(335.09 MB) 351364549","bytesReceived":"(294.63 MB) 
308940852"},{"vmName":"i-2-695-VM","macAddress":"06:c7:fe:00:01:0b","bytesSent":"(74.57
 KB) 76358","bytesReceived":"(585.85 MB) 614 
310467"}]},"result":"true","details":"","wait":"0"}}] } 
2021-03-28 02:28:46,897 DEBUG [c.c.a.t.Request] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) Seq 
121-4330211041716731949: Received: { Ans: , MgmtId: 1151291 
73025114, via: 121(ais-cloudhost13), Ver: v1, Flags: 10, { 
GetVmNetworkStatsAnswer } } 
2021-03-28 02:28:46,897 DEBUG [c.c.a.m.AgentManagerImpl] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) 
Details from executing class com.cloud.agent.api.GetVmN 
etworkStatsCommand: 
2021-03-28 02:28:46,909 WARN [o.a.c.f.j.i.AsyncJobManagerImpl] 
(API-Job-Executor-6:ctx-cfe07062 job-81025 ctx-2387e198) (logid:c5396488) 
Unable to schedule async job for command com.clo 
ud.vm.VmWorkMigrate, unexpected exception. 
com.cloud.utils.exception.CloudRuntimeException: Unable to lock vm_instance695. 
Waited 0 
at com.cloud.utils.db.Merovingian2.doAcquire(Merovingian2.java:197) 
at com.cloud.utils.db.Merovingian2.acquire(Merovingian2.java:137) 
at com.cloud.utils.db.TransactionLegacy.lock(TransactionLegacy.java:384) 
at com.cloud.utils.db.GenericDaoBase.lockInLockTable(GenericDaoBase.java:1075) 
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) 
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.base/java.lang.reflect.Method.invoke(Method.java:566) 
at 
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
 
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
 
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
 
at 
com.cloud.utils.db.TransactionContextInterceptor.invoke(TransactionContextInterceptor.java:34)
 
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:175)
 
at 
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
 
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
 
at 
org.springframework.aop.framework.JdkDynam

Re: [RESULT][VOTE] Primate as modern UI for CloudStack

2019-10-23 Thread Andrei Mikhailovsky

Rohit, when do you plan to add this interface to the release of ACS?

thanks
Andrei

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Cc: "dev" , priv...@cloudstack.apache.org
> Sent: Tuesday, 22 October, 2019 09:24:07
> Subject: Re: [RESULT][VOTE] Primate as modern UI for CloudStack

> All,
> 
> The repository is live - https://github.com/apache/cloudstack-primate and can
> accept pull requests now.
> 
> Updates and pending items in this regard:
> 
>  *   Get the Github repo's issues, wiki, projects etc. enabled. I've pinged 
> ASF
>  INFRA in that regard - https://issues.apache.org/jira/browse/INFRA-19274
>  *   I've added a contributing document:
>  https://github.com/apache/cloudstack-primate/blob/master/CONTRIBUTING.md
>  (kindly review)
>  *   Basic (work in progress) documentation section added:
>  https://github.com/apache/cloudstack-primate#documentation
>  *
> Once repository functions are fully enabled, I'll share a proper project
> progress/status update and details of the bi-weekly meeting on the Primate SIG
> thread.
> 
> Thanks.
> 
> 
> Regards,
> 
> Rohit Yadav
> 
> Software Architect, ShapeBlue
> 
> https://www.shapeblue.com
> 
> 
> From: Andrija Panic 
> Sent: Monday, October 21, 2019 14:37
> To: users 
> Cc: d...@cloudstack.apache.org ;
> priv...@cloudstack.apache.org 
> Subject: Re: [RESULT][VOTE] Primate as modern UI for CloudStack
> 
> (that seems like more +1s than for last few ACS releases altogether :) )
> 
> Great work Rohit - thx!
> 
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>  
> 
> 
> On Mon, 21 Oct 2019 at 08:57, Rohit Yadav  wrote:
> 
>> All,
>>
>> After 2 weeks, the vote for accepting Primate as a CloudStack project [1]
>> *passes* with
>> 10 PMC + 11 non-PMC votes.
>>
>> +1 (PMC / binding)
>> 10 person (Mike, Simon, Andrija, Sven, Wido, Will, Syed, Gabriel, Giles,
>> Bruno)
>>
>> +1 (committer, non-binding/users)
>> 11 person (Nicolas, Nitin, Lucian, Ezequiel, Haijiao, Alex, Alessandro,
>> Marco, Anurag, Leonardo, KB Shiv)
>>
>> 0
>> none
>>
>> -1
>> none
>>
>>
>> I'll now request ASF INFRA [2] to help enable issues, pull requests, wiki,
>> projects for the new repository:
>>
>> https://github.com/apache/cloudstack-primate
>>
>>
>> The code will be donated and pushed in the next 24-48 hours from the old
>> repository [3] to the new repository under the Apache CloudStack project.
>>
>>
>> Thanks to everyone participating.
>>
>> [1] https://markmail.org/message/tblrbrtew6cvrusr
>>
>> [2] https://issues.apache.org/jira/browse/INFRA-19274
>>
>> [3] https://github.com/shapeblue/primate
>>
>> Regards,
>>
>> Rohit Yadav
>>
>> Software Architect, ShapeBlue
>>
>> https://www.shapeblue.com
>>
>> 
>>
>> rohit.ya...@shapeblue.com
>> www.shapeblue.com
>> Amadeus House, Floral Street, London  WC2E 9DPUK
>> @shapeblue
>>
>>
>>
>> From: Rohit Yadav
>> Sent: Monday, October 7, 2019 17:01
>> To: d...@cloudstack.apache.org ;
>> users@cloudstack.apache.org ;
>> priv...@cloudstack.apache.org 
>> Subject: [VOTE] Primate as modern UI for CloudStack
>>
>> All,
>>
>> The feedback and response has been positive on the proposal to use Primate
>> as the modern UI for CloudStack [1] [2]. Thank you all.
>>
>> I'm starting this vote (to):
>>
>>   *   Accept Primate codebase [3] as a project under Apache CloudStack
>> project
>>   *   Create and host a new repository (cloudstack-primate) and follow
>> Github based development workflow (issues, pull requests etc) as we do with
>> CloudStack
>>   *   Given this is a new project, to encourage cadence until its feature
>> completeness the merge criteria is proposed as:
>>  *   Manual testing against each PR and/or with screenshots from the
>> author or testing contributor, integration with Travis is possible once we
>> get JS/UI tests
>>  *   At least 1 LGTM from any of the active contributors, we'll move
>> this to 2 LGTMs when the codebase reaches feature parity wrt the
>> existing/old CloudStack UI
>>  *   Squash and merge PRs
>>   *   Accept the proposed timeline [1][2] (subject to achievement of goals
>> wrt Primate technical release and GA)
>>  *   the first technical preview targetted with the winter 2019 LTS
>> release (~Q1 2020) and release to serve a deprecation notice wrt the older
>> UI
>>  *   define a release approach before winter LTS
>>  *   stop taking feature FRs for old/existing UI after winter 2019 LTS
>> release, work on upgrade path/documentation from old UI to Primate
>>  *   the first Primate GA targetted wrt summer LTS 2020 (~H2 2019),
>> but still ship old UI with a final deprecation notice
>>  *   old UI codebase removed from codebase in winter 2020 LTS release
>>
>> The vote will be up for the next two weeks to give enough time for PMC and
>> the community to gather consensus and still have room for questions,
>> feedbac

Re: [ANNOUNCE] Apache CloudStack 4.13.0.0 GA

2019-09-24 Thread Andrei Mikhailovsky

Great work guys and girls!!!

- Original Message -
> From: "Paul Angus" 
> To: annou...@cloudstack.apache.org, "Apache CloudStack Marketing" 
> , "dev"
> , "users" , 
> users...@cloudstack.apache.org
> Sent: Tuesday, 24 September, 2019 11:06:28
> Subject: [ANNOUNCE] Apache CloudStack 4.13.0.0 GA

> *The Apache Software Foundation Announces Apache**®** CloudStack**®** v4.13*
> 
> 
> Apache CloudStack v4.13 features nearly 200 new features, enhancements and
> fixes since 4.12., such as enhanced hypervisor support, performance
> increases and more user-configurable controls.  Highlights include:
> 
> 
> 
>   - Supporting configuration of virtualised appliances
>   - VMware 6.7 support
>   - Increased granularity & control of instance  deployment
>   - Improvements in system VM performance
>   - Allow live migration of DPDK enabled instances
>   - More flexible UI branding
>   - Allowing users to create layer 2 network offerings
> 
> 
> The full list of new features can be found in the project release notes at
> http://docs.cloudstack.apache.org/en/4.13.0.0/releasenotes/changes.html
> 
> 
> 
> Apache CloudStack powers numerous elastic Cloud computing services,
> including solutions that have ranked as Gartner Magic Quadrant leaders.
> Highlighted in the Forrester Q4 2017 Enterprise Open Source Cloud Adoption
> report, Apache CloudStack "sits beneath hundreds of service provider
> clouds", including Fortune 5 multinational corporations. A list of known
> Apache CloudStack users are available at
> http://cloudstack.apache.org/users.html

Re: 4.13 rbd snapshot delete failed

2019-09-09 Thread Andrei Mikhailovsky

A quick feedback from my side. I've never had a properly working delete 
snapshot with ceph. Every week or so I have to manually delete all ceph 
snapshots. However, the NFS secondary storage snapshots are deleted just fine. 
I've been using CloudStack for 5+ years and it was always the case. I am 
currently running 4.11.2 with ceph 13.2.6-1xenial.

Andrei

- Original Message -
> From: "Andrija Panic" 
> To: "Gabriel Beims Bräscher" 
> Cc: "users" , "dev" 
> Sent: Sunday, 8 September, 2019 19:17:59
> Subject: Re: 4.13 rbd snapshot delete failed

> Thx Gabriel for extensive feedback.
> Actually my ex company added the code to really delete a RBD snap back in
> 2016 or so, was part of 4.9 if not mistaken. So I expect the code is there,
> but probably some exception is happening or regression...
> 
> Cheers
> 
> On Sun, Sep 8, 2019, 09:31 Gabriel Beims Bräscher 
> wrote:
> 
>> Thanks for the feedback, Andrija. It looks like delete was not totally
>> supported then (am I missing something?). I will take a look into this and
>> open a PR adding propper support for rbd snapshot deletion if necessary.
>>
>> Regarding the rollback, I have tested it several times and it worked;
>> however, I see a weak point on the Ceph rollback implementation.
>>
>> It looks like Li Jerry was able to execute the rollback without any
>> problem. Li, could you please post here  the log output: "Attempting to
>> rollback RBD snapshot [name:%s], [pool:%s], [volumeid:%s],
>> [snapshotid:%s]"? Andrija will not be able to see that log as the exception
>> happen prior to it, the only way of you checking those values is via remote
>> debugging. If you be able to post those values it would help as well on
>> sorting out what is wrong.
>>
>> I am checking the code base, running a few tests, and evaluating the log
>> that you (Andrija) sent. What I can say for now is that it looks that the
>> parameter "snapshotRelPath = snapshot.getPath()" [1] is a critical piece of
>> code that can definitely break the rollback execution flow. My tests had
>> pointed for a pattern but now I see other possibilities. I will probably
>> add a few parameters on the rollback/revert command instead of using the
>> path or review the path life-cycle and different execution flows in order
>> to keep it safer to be used.
>> [1]
>> https://github.com/apache/cloudstack/blob/50fc045f366bd9769eba85c4bc3ecdc0b7035c11/plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper
>>
>> A few details on the test environments and Ceph/RBD version:
>> CloudStack, KVM, and Ceph nodes are running with Ubuntu 18.04
>> Ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic
>> (stable)
>> RADOS Block Devices has snapshot rollback support since Ceph v10.0.2 [
>> https://github.com/ceph/ceph/pull/6878]
>> Rados-java [https://github.com/ceph/rados-java] supports snapshot
>> rollback since 0.5.0; rados-java 0.5.0 is the version used by CloudStack
>> 4.13.0.0
>>
>> I will be updating here soon.
>>
>> Em dom, 8 de set de 2019 às 12:28, Wido den Hollander 
>> escreveu:
>>
>>>
>>>
>>> On 9/8/19 5:26 AM, Andrija Panic wrote:
>>> > Maaany release ago, deleting Ceph volume snap, was also only deleting
>>> it in
>>> > DB, so the RBD performance become terrible with many tens of (i. e.
>>> Hourly)
>>> > snapshots. I'll try to verify this on 4.13 myself, but Wido and the guys
>>> > will know better...
>>>
>>> I pinged Gabriel and he's looking into it. He'll get back to it.
>>>
>>> Wido
>>>
>>> >
>>> > I
>>> >
>>> > On Sat, Sep 7, 2019, 08:34 li jerry  wrote:
>>> >
>>> >> I found it had nothing to do with  storage.cleanup.delay and
>>> >> storage.cleanup.interval.
>>> >>
>>> >>
>>> >>
>>> >> The reason is that when DeleteSnapshot Cmd is executed, because the RBD
>>> >> snapshot does not have Copy to secondary storage, it only changes the
>>> >> database information, and does not enter the main storage to delete the
>>> >> snapshot.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Log===
>>> >>
>>> >>
>>> >>
>>> >> 2019-09-07 23:27:00,118 DEBUG [c.c.a.ApiServlet]
>>> >> (qtp504527234-17:ctx-2e407b61) (logid:445cbea8) ===START===
>>> 192.168.254.3
>>> >> -- GET
>>> >>
>>> command=deleteSnapshot&id=0b50eb7e-4f42-4de7-96c2-1fae137c8c9f&response=json&_=1567869534480
>>> >>
>>> >> 2019-09-07 23:27:00,139 DEBUG [c.c.a.ApiServer]
>>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) CIDRs from
>>> >> which account 'Acct[2f96c108-9408-11e9-a820-0200582b001a-admin]' is
>>> allowed
>>> >> to perform API calls: 0.0.0.0/0,::/0
>>> >>
>>> >> 2019-09-07 23:27:00,204 DEBUG [c.c.a.ApiServer]
>>> >> (qtp504527234-17:ctx-2e407b61 ctx-679fd276) (logid:445cbea8) Retrieved
>>> >> cmdEventType from job info: SNAPSHOT.DELETE
>>> >>
>>> >> 2019-09-07 23:27:00,217 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
>>> >> (API-Job-Executor-2:ctx-f0843047 job-1378) (logid:c34a368a) Add
>>> job-1378
>>> >> into job monitoring
>>> >>
>>> >> 2019-09-07 23:27:00,219 DEBUG

Concurrent Volume Snapshots

2019-06-12 Thread Andrei Mikhailovsky

Hello everyone 

I am having running snapshot issues on large volumes. The hypervisor is KVM and 
the storage backend is Ceph (rbd). Here is my issue: 

I've got several vms with 3-6 volumes of 2TB each. I have a recurring schedule 
setup to take a snapshot of each volume once a month. It takes a long time for 
a volume to be snapshotted (in a magnitude of 20 hours). As a result, when the 
schedule kicks in, it only manages to snapshot the first volume and the 
snapshots of the other volumes fail due to the async job timeout. From what I 
have discovered, ACS only does a single volume snapshot at a time. I can't seem 
to find the settings to enable concurrent snapshotting. So, it can't snapshot 
all of the vm volumes at the same time. This is very much problematic for many 
reasons, but the main reason is that upon recovery of multiple volumes, the 
data on those will not be consistent. 

Is there a way around it? Perhaps there is an option in the settings that I 
can't find that disables this odd behaviour of the volume snapshots? 

Cheers 

Andrei

Re: Automating creation of ACLs

2019-05-03 Thread Andrei Mikhailovsky

Hi Andrija,

I've setup the CloudMonkey on my local host and done some experimentation. It 
turns out that the API does support specifying multiple ips/networks per single 
ACL. The gui reflects this and shows a comma separated list. So, it looks like 
I can do everything I want from the CloudMonkey.

what I've not tested is that it actually works and creates the fw rule on the 
virtual router. I will test that later on and revert back.

Cheers

- Original Message -
> From: "Andrija Panic" 
> To: "users" 
> Sent: Friday, 3 May, 2019 17:04:24
> Subject: Re: Automating creation of ACLs

> Hi Andrei,
> 
> I didn't claim that work actually - did you test it,  does it actually
> works (if I understand correctly - you want in single rule to specify
> multiple CIDR ranges instead of creating a rule for each CIDR range in
> question) ?
> 
> Best,
> 
> 
> On Fri, 3 May 2019 at 17:36, Andrei Mikhailovsky 
> wrote:
> 
>> Hi Andrija,
>>
>> I wasn't aware the API supports creating ACLs with multiple networks / IP
>> addresses.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Andrija Panic" 
>> > To: "users" 
>> > Sent: Friday, 3 May, 2019 16:11:37
>> > Subject: Re: Automating creation of ACLs
>>
>> > Hi Andrei,
>> >
>> > perhaps I got something wrong, but why don't you use API to create needed
>> > ACL rules ?
>> >
>> > Andrija
>> >
>> > On Fri, 3 May 2019 at 17:04, Andrei Mikhailovsky
>> 
>> > wrote:
>> >
>> >> Hello everyone,
>> >>
>> >> I have come across a need to create an ACLs that includes around 100
>> >> different IP addresses and network ranges for several services. Now,
>> >> looking at the ACS gui, there is currently no way that I could find to
>> >> create an ACL with multiple IP addresses / network ranges. Not sure why
>> >> this hasn't been implemented.
>> >>
>> >> I am looking at a way to automate the creation of ACLs with CloudStack
>> >> where ideally I could feed it a list of IP addresses and it would do its
>> >> job at creating the ACLs. Otherwise it will take a day and sanity to do
>> it
>> >> manually.
>> >>
>> >> I am sure I am not the only one in the ACS community that requires a
>> large
>> >> set of ACLs. Could someone share their scripts / methods of achieving
>> this?
>> >>
>> >> Thanks
>> >>
>> >> Andrei
>> >>
>> >
>> >
>> > --
>> >
>> > Andrija Panić
>>
> 
> 
> --
> 
> Andrija Panić

Re: Automating creation of ACLs

2019-05-03 Thread Andrei Mikhailovsky

Actually, I was wrong and made a mistake. The ACS gui does allow specifying 
multiple networks/IPS on the same ACL. I had a typo when I was testing it. All 
jolly good!

Cheers

- Original Message -
> From: "Andrija Panic" 
> To: "users" 
> Sent: Friday, 3 May, 2019 17:04:24
> Subject: Re: Automating creation of ACLs

> Hi Andrei,
> 
> I didn't claim that work actually - did you test it,  does it actually
> works (if I understand correctly - you want in single rule to specify
> multiple CIDR ranges instead of creating a rule for each CIDR range in
> question) ?
> 
> Best,
> 
> 
> On Fri, 3 May 2019 at 17:36, Andrei Mikhailovsky 
> wrote:
> 
>> Hi Andrija,
>>
>> I wasn't aware the API supports creating ACLs with multiple networks / IP
>> addresses.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Andrija Panic" 
>> > To: "users" 
>> > Sent: Friday, 3 May, 2019 16:11:37
>> > Subject: Re: Automating creation of ACLs
>>
>> > Hi Andrei,
>> >
>> > perhaps I got something wrong, but why don't you use API to create needed
>> > ACL rules ?
>> >
>> > Andrija
>> >
>> > On Fri, 3 May 2019 at 17:04, Andrei Mikhailovsky
>> 
>> > wrote:
>> >
>> >> Hello everyone,
>> >>
>> >> I have come across a need to create an ACLs that includes around 100
>> >> different IP addresses and network ranges for several services. Now,
>> >> looking at the ACS gui, there is currently no way that I could find to
>> >> create an ACL with multiple IP addresses / network ranges. Not sure why
>> >> this hasn't been implemented.
>> >>
>> >> I am looking at a way to automate the creation of ACLs with CloudStack
>> >> where ideally I could feed it a list of IP addresses and it would do its
>> >> job at creating the ACLs. Otherwise it will take a day and sanity to do
>> it
>> >> manually.
>> >>
>> >> I am sure I am not the only one in the ACS community that requires a
>> large
>> >> set of ACLs. Could someone share their scripts / methods of achieving
>> this?
>> >>
>> >> Thanks
>> >>
>> >> Andrei
>> >>
>> >
>> >
>> > --
>> >
>> > Andrija Panić
>>
> 
> 
> --
> 
> Andrija Panić

Re: Automating creation of ACLs

2019-05-03 Thread Andrei Mikhailovsky

Hi Andrija,

I wasn't aware the API supports creating ACLs with multiple networks / IP 
addresses. 

Andrei

- Original Message -
> From: "Andrija Panic" 
> To: "users" 
> Sent: Friday, 3 May, 2019 16:11:37
> Subject: Re: Automating creation of ACLs

> Hi Andrei,
> 
> perhaps I got something wrong, but why don't you use API to create needed
> ACL rules ?
> 
> Andrija
> 
> On Fri, 3 May 2019 at 17:04, Andrei Mikhailovsky 
> wrote:
> 
>> Hello everyone,
>>
>> I have come across a need to create an ACLs that includes around 100
>> different IP addresses and network ranges for several services. Now,
>> looking at the ACS gui, there is currently no way that I could find to
>> create an ACL with multiple IP addresses / network ranges. Not sure why
>> this hasn't been implemented.
>>
>> I am looking at a way to automate the creation of ACLs with CloudStack
>> where ideally I could feed it a list of IP addresses and it would do its
>> job at creating the ACLs. Otherwise it will take a day and sanity to do it
>> manually.
>>
>> I am sure I am not the only one in the ACS community that requires a large
>> set of ACLs. Could someone share their scripts / methods of achieving this?
>>
>> Thanks
>>
>> Andrei
>>
> 
> 
> --
> 
> Andrija Panić

Automating creation of ACLs

2019-05-03 Thread Andrei Mikhailovsky

Hello everyone, 

I have come across a need to create an ACLs that includes around 100 different 
IP addresses and network ranges for several services. Now, looking at the ACS 
gui, there is currently no way that I could find to create an ACL with multiple 
IP addresses / network ranges. Not sure why this hasn't been implemented. 

I am looking at a way to automate the creation of ACLs with CloudStack where 
ideally I could feed it a list of IP addresses and it would do its job at 
creating the ACLs. Otherwise it will take a day and sanity to do it manually. 

I am sure I am not the only one in the ACS community that requires a large set 
of ACLs. Could someone share their scripts / methods of achieving this? 

Thanks 

Andrei

Re: Host HA vs transient NFS problems on KVM

2018-10-23 Thread Andrei Mikhailovsky

Hi Jean,

I have previously done some HA testing and have pretty much came to similar 
conclusions as you have. My testing showed that using HA is very unreliable at 
best and data loosing at worst cases. I have had the following outcome from 
various testing scenarios:

1. Works as expected (very rarely)
2. Starts 2 vms on different hosts (data loss / corruption)
3. Reboots ALL KVM hosts (even those hosts that do not have a single vm with 
nfs volumes)

Now, I can not justify having HA with even a slim chances of having 2 or 3 
above. Honestly, I do not know a single business that is happy to accept those 
scenarios. Frankly speaking, for me the cloudstack HA options create more 
problems than solve and thus I've not enabled them. I have decided that ACS 
with KVM is not HA friendly, full stop. Having said this, I've not tested the 
latest couple of releases, so I will give it a benefit of the doubt and wait 
for user's reports to prove my conclusion otherwise. I've wasted enough of my 
own time on KVM HA.

My HA approach to ACS is more of a manual nature, which is far more reliable 
and is less prone to issues in my experience. I have a monitoring system 
sending me alerts when VMs, host servers and storage become unreachable. It is 
not as convenient as a fully working automatic HA, I agree, but it is far 
better to be woken up at 3am to deal with restarting a handful of vms and 
perhaps a KVM host force reboot than dealing with mass KVM hosts reboots and/or 
trying to find duplicate vms lurking somewhere on the host servers. Been there, 
done that - NO THANKS!

Cheers

Andrei

- Original Message -
> From: "Jean-Francois Nadeau" 
> To: "users" 
> Sent: Monday, 22 October, 2018 22:13:35
> Subject: Host HA vs transient NFS problems on KVM

> Dear community,
> 
> I want to share my concern upgrading from 4.9 to 4.11 in regards to how the
> host HA framework works and the handling of various failure conditions.
> 
> Since we have been running CS on 4.9.3 with NFS on KVM,  VM HA have been
> working as expected when hypervisor crashed and I agree we might have
> been lucky knowing the limitations of the KVM investigator and the
> possibility to fire the same VM on 2 KVM hosts is real when you know the
> recipe for it.
> 
> Still, on 4.9.3 we were tolerant to transient primary NFS storage access
> issues, typical of a network problem (and we've seen it lately for a 22
> minutes disconnection).  Although these events are quite rare,  when they
> do happen their blast radius can be a huge impact on the business.
> 
> So when we initially tested CS on 4.9.3 we purposely blocked access to NFS
> and we observe the results.   Changing the kvmhearbeat.sh script so it
> doesn't reboot the node after 5 minutes has been essential to defuse the
> potential of a massive KVM hosts reboot.In the end,  it's far less
> damage to let NFS recover than having all those VMs rebooted.   On 4.9.3
> the cloudtack-agent will remain "Up"  and not fire any VM twice if the NFS
> storage becomes available again within 30 minutes.
> 
> Now, testing the upgrade from 4.9 to 4.11 in our lab and the same  failure
> conditions we rapidly saw a different behavior although not perfectly
> consistent.  On 4.11.2 without host HA enabled,  we will see the agent
> "try" to disconnect after 5 minutes tho sometimes the KVM host goes into
> Disconnect state and sometimes it goes straight to Down state.  In that
> case we'll see a duplicate VM created in no time and once the NFS issue is
> resolved,  we have 2 copies of that VM and cloudstack only knowns about
> that last copy.   This is obviously a disaster forcing us to look at how
> host HA can help.
> 
> Now with host HA enabled and simulating the same NFS hiccup,  we won't get
> duplicate VMs but we will get a KVM host reset.  The problem here is that,
> yes the host HA does ensure we don't have dup VMs but at scale this would
> also provoke a lot of KVM host resets (if not all of them).   If we are at
> risk with host HA to have massive KVM host resets,  then I might prefer to
> disable host/VM HA entirely and just handle KVM host failures manually.
> This is supper annoying for the ops team,  but far less risky for the
> business.
> 
> Im trying to find if there's a middle ground here between the 4.9 behavior
> with NFS hiccups and the reliability of the new host HA framework.
> 
> best,
> 
> Jean-Francois

Re: ACS 4.11.1.0 - agent.properties file became empty on a KVM host

2018-10-22 Thread Andrei Mikhailovsky

Right, I've managed to fix the issue. Here is how I've done it.

1. Copied the contents of the agent.properties file from another KVM host.
2. Removed the values in the fields like: guid and keystore.passphrase
3. Set the ca.plugin.root.auth.strictness to false in Global Settings. 
Restarted the management server
4. Started the agent. This created a new host entry with Unsecure status. The 
old host entry was still showing Disconnecrted.
5. Provisioned the new keys
6. Removed the old host entry with the Force option ticked
7. Reverted the setting in 3 above
8. Restarted the management server
9. Job done

Cheers


P.S. I think I've figured out what caused the agent.properties file to be 
modified. After I've done point 3 above, I've noticed that the agent.properties 
file on ALL my host servers have been modified (at least according to the 
modified time stamp). Now, it seems that the workflow for performing this 
action is broken - very silly indeed!!! Will send a message to the mailing 
list. Perhaps the responsible developers for this code will fix the issue and 
create a few more sanity checks.



- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 22 October, 2018 14:42:12
> Subject: Re: ACS 4.11.1.0 - agent.properties file became empty on a KVM host

> Hi Gabriel,
> 
> thanks for your reply. What you've suggested will create a default
> agent.properties file, which is no good. The agent will not connect to the
> server with the default agent.properties file for many reasons.
> 
> I think I can recreate most of the file content by looking at the other
> agent.properties files. However, one thing that I am missing is the:
> 
> keystore.passphrase=
> 
> Where do I get the passphrase for the keystore file? is it stored somewhere in
> db of the management server or on the KVM host itself?
> 
> Thanks
> 
> - Original Message -
>> From: "Gabriel Beims Bräscher" 
>> To: "users" 
>> Sent: Monday, 22 October, 2018 13:54:32
>> Subject: Re: ACS 4.11.1.0 - agent.properties file became empty on a KVM host
> 
>> Hi Andrei,
>> 
>> When upgrading the CloudStack agent you can accept or refuse to change the
>> agent.properties. The default operation is to not change configuration
>> files.
>> 
>> The agent service does not impact directly on the running VMs; thus, I
>> would suggest you remove the /etc/cloudstack/agent/agent.properties file,
>> uninstall the CloudStack agent service and reinstall it. I would also
>> suggest you keep a copy of /etc/cloudstack/ just to have a saved state of
>> your agent configurations before reinstalling and compare them if needed.
>> 
>> Em seg, 22 de out de 2018 às 09:36, Andrei Mikhailovsky
>>  escreveu:
>> 
>>> Hi
>>>
>>> I have an issue with one of the host servers. This issue is rather
>>> strange. Perhaps someone can help me with understanding how this happened
>>> and how to fix it.
>>>
>>> About 3 days ago one of the KVM host servers ran out of disk space on its
>>> root partition. I have fixed the issue and reconnected the agent by running
>>> 'service cloudstack-agent restart'.
>>>
>>> I've noticed that the host server is still showing Disconnected status in
>>> the web gui. The agent log files repeat the following 3 lines every 10 or
>>> so seconds:
>>>
>>> 2018-10-22 12:50:54,339 INFO [cloud.agent.AgentShell] (main:null) (logid:)
>>> Agent started
>>> 2018-10-22 12:50:54,343 INFO [cloud.agent.AgentShell] (main:null) (logid:)
>>> Implementation Version is 4.11.1.0
>>> 2018-10-22 12:50:54,345 INFO [cloud.agent.AgentShell] (main:null) (logid:)
>>> agent.properties found at /etc/cloudstack/agent/agent.properties
>>>
>>>
>>> looking further revealed that the file is 0 bytes:
>>>
>>> -rw--- 1 root root 0 Oct 20 06:39 agent.properties
>>> -rwxr-xr-x 1 root root 8890 Jul 6 14:01 agent.properties.dpkg-dist
>>>
>>> Something has replaced the original agent.properties file. The
>>> creation/modification dates of the agent.properties file on other KVM host
>>> servers are all different (times and dates). As I always upgrade the host
>>> servers at the same time, this led me to believe that agent.properties file
>>> is automatically generated or modified by some script or service that is
>>> running on the host server or perhaps the modification is pushed from the
>>> management server to the agent.
>>>
>>> As the server is in the Disconnected state I can't migrate servers and
>>> virtual routers from that host server and I can't set it to Maintenance
>>> either.
>>>
>>> How do I manually force the creation / update of the agent.properties file
>>> on that host server? The challenge is that vms /vrs which are running on
>>> that host server are production servers and they should keep running
>>> without shutting down.
>>>
>>> Thanks for any tips/help.
>>>
> >> Andrei

Re: ACS 4.11.1.0 - agent.properties file became empty on a KVM host

2018-10-22 Thread Andrei Mikhailovsky

Hi Gabriel,

thanks for your reply. What you've suggested will create a default 
agent.properties file, which is no good. The agent will not connect to the 
server with the default agent.properties file for many reasons.

I think I can recreate most of the file content by looking at the other 
agent.properties files. However, one thing that I am missing is the:

keystore.passphrase=

Where do I get the passphrase for the keystore file? is it stored somewhere in 
db of the management server or on the KVM host itself?

Thanks

- Original Message -
> From: "Gabriel Beims Bräscher" 
> To: "users" 
> Sent: Monday, 22 October, 2018 13:54:32
> Subject: Re: ACS 4.11.1.0 - agent.properties file became empty on a KVM host

> Hi Andrei,
> 
> When upgrading the CloudStack agent you can accept or refuse to change the
> agent.properties. The default operation is to not change configuration
> files.
> 
> The agent service does not impact directly on the running VMs; thus, I
> would suggest you remove the /etc/cloudstack/agent/agent.properties file,
> uninstall the CloudStack agent service and reinstall it. I would also
> suggest you keep a copy of /etc/cloudstack/ just to have a saved state of
> your agent configurations before reinstalling and compare them if needed.
> 
> Em seg, 22 de out de 2018 às 09:36, Andrei Mikhailovsky
>  escreveu:
> 
>> Hi
>>
>> I have an issue with one of the host servers. This issue is rather
>> strange. Perhaps someone can help me with understanding how this happened
>> and how to fix it.
>>
>> About 3 days ago one of the KVM host servers ran out of disk space on its
>> root partition. I have fixed the issue and reconnected the agent by running
>> 'service cloudstack-agent restart'.
>>
>> I've noticed that the host server is still showing Disconnected status in
>> the web gui. The agent log files repeat the following 3 lines every 10 or
>> so seconds:
>>
>> 2018-10-22 12:50:54,339 INFO [cloud.agent.AgentShell] (main:null) (logid:)
>> Agent started
>> 2018-10-22 12:50:54,343 INFO [cloud.agent.AgentShell] (main:null) (logid:)
>> Implementation Version is 4.11.1.0
>> 2018-10-22 12:50:54,345 INFO [cloud.agent.AgentShell] (main:null) (logid:)
>> agent.properties found at /etc/cloudstack/agent/agent.properties
>>
>>
>> looking further revealed that the file is 0 bytes:
>>
>> -rw--- 1 root root 0 Oct 20 06:39 agent.properties
>> -rwxr-xr-x 1 root root 8890 Jul 6 14:01 agent.properties.dpkg-dist
>>
>> Something has replaced the original agent.properties file. The
>> creation/modification dates of the agent.properties file on other KVM host
>> servers are all different (times and dates). As I always upgrade the host
>> servers at the same time, this led me to believe that agent.properties file
>> is automatically generated or modified by some script or service that is
>> running on the host server or perhaps the modification is pushed from the
>> management server to the agent.
>>
>> As the server is in the Disconnected state I can't migrate servers and
>> virtual routers from that host server and I can't set it to Maintenance
>> either.
>>
>> How do I manually force the creation / update of the agent.properties file
>> on that host server? The challenge is that vms /vrs which are running on
>> that host server are production servers and they should keep running
>> without shutting down.
>>
>> Thanks for any tips/help.
>>
>> Andrei

ACS 4.11.1.0 - agent.properties file became empty on a KVM host

2018-10-22 Thread Andrei Mikhailovsky

Hi 

I have an issue with one of the host servers. This issue is rather strange. 
Perhaps someone can help me with understanding how this happened and how to fix 
it. 

About 3 days ago one of the KVM host servers ran out of disk space on its root 
partition. I have fixed the issue and reconnected the agent by running 'service 
cloudstack-agent restart'. 

I've noticed that the host server is still showing Disconnected status in the 
web gui. The agent log files repeat the following 3 lines every 10 or so 
seconds: 

2018-10-22 12:50:54,339 INFO [cloud.agent.AgentShell] (main:null) (logid:) 
Agent started 
2018-10-22 12:50:54,343 INFO [cloud.agent.AgentShell] (main:null) (logid:) 
Implementation Version is 4.11.1.0 
2018-10-22 12:50:54,345 INFO [cloud.agent.AgentShell] (main:null) (logid:) 
agent.properties found at /etc/cloudstack/agent/agent.properties 


looking further revealed that the file is 0 bytes: 

-rw--- 1 root root 0 Oct 20 06:39 agent.properties 
-rwxr-xr-x 1 root root 8890 Jul 6 14:01 agent.properties.dpkg-dist 

Something has replaced the original agent.properties file. The 
creation/modification dates of the agent.properties file on other KVM host 
servers are all different (times and dates). As I always upgrade the host 
servers at the same time, this led me to believe that agent.properties file is 
automatically generated or modified by some script or service that is running 
on the host server or perhaps the modification is pushed from the management 
server to the agent. 

As the server is in the Disconnected state I can't migrate servers and virtual 
routers from that host server and I can't set it to Maintenance either. 

How do I manually force the creation / update of the agent.properties file on 
that host server? The challenge is that vms /vrs which are running on that host 
server are production servers and they should keep running without shutting 
down. 

Thanks for any tips/help. 

Andrei

Re: VM Snapshot not removed from primary SR

2018-10-10 Thread Andrei Mikhailovsky

Hi

I can confirm that I am also having this issue on 4.11.1.0. To be honest, this 
issue was always present for me as far as I remember.

Kind of an urgent issue to fix to stop people running out of space.

Cheers

- Original Message -
> From: "Rafael Weingärtner" 
> To: "users" 
> Sent: Wednesday, 10 October, 2018 16:24:54
> Subject: Re: VM Snapshot not removed from primary SR

> Well, I am almost sure I have seen a PR fixing something like you describe,
> I thought that it went into 4.11.1.0.However, only with a more in depth
> debugging I would be able to confirm your problem.
> 
> Are you seeing any unexpected exception in your log files?
> 
> On Wed, Oct 10, 2018 at 11:54 AM Sami Rajala (FAPPS) 
> wrote:
> 
>> Hi,
>>
>> CS running on 4.11.1.0 and does not work.
>>
>> BR
>> -sami
>>
>>
>> 10.10.2018 16.49, "Rafael Weingärtner" :
>>
>> >If I am not mistaken, this has already been fixed in 4.11.1.0
>> >
>> >On Wed, Oct 10, 2018 at 10:31 AM Sami Rajala 
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >>
>> >> I have got zero advice for this.
>> >>
>> >> "I have had weird issue since I updated CS from 4.9 to 4.10.
>> >> CS does not delete/remove temporary snapshot from primary SR and I have
>> >> had to removed it by manual time to time, before 30 snapshot limit has
>> >> reached.
>> >> + doesn't CS not update secondary_storege count on snapshot_count table”
>> >>
>> >>
>> >>
>> >>
>> >> Now I have a situation where Xenserver has 19 snapshot on primary SR for
>> >> each VM + resource count for ”secondary_storege” in database increase
>> >>all
>> >> the time per account.
>> >> ACS storage cleaner removed old snapshots from secondary when they
>> >>getting
>> >> old, but thats all, not decrease resource count for ”secondary_storege”
>> >> with value of deleted snapshot ( or not remove snapshot from primary SR
>> >> after snapshot has created and copied to secondary )
>> >>
>> >> I have still delete primary snapshot by manual + give a new value for
>> >>  ”secondary_storege” before they reach limit.
>> >>
>> >> has anyone got this kind of issue? any fix
>> >>
>> >> BR
>> >> -sami
>> >>
>> >>
>> >>
>> >> 14.9.2018 12.26, "Sami Rajala (FAPPS)" :
>> >>
>> >> >Continue,
>> >> >
>> >> >I created template from VM-s latest snapshot after 3 days ( one parent
>> >>+
>> >> >2
>> >> >child snapshot) - without errors/warning
>> >> >CS create VM from this template ok, but Xen could not start VM
>> >> >
>> >> >so, I return back to situation where every snapshot is full and manual
>> >> >removing snapshot from primary
>> >> >
>> >> >Has anyone any advice?
>> >> >
>> >> >BR
>> >> >-sami
>> >> >
>> >> >
>> >> >
>> >> >13.9.2018 7.56, "Sami Rajala (FAPPS)" :
>> >> >
>> >> >>Hello
>> >> >>
>> >> >>I have had weird issue since I updated CS from 4.9 to 4.10.
>> >> >>CS does not delete/remove temporary snapshot from primary SR and I
>> >>have
>> >> >>had to removed it by manual time to time, before 30 snapshot limit has
>> >> >>reached.
>> >> >>+ doesn¹t CS not update secondary_storege count on snapshot_count
>> >>table
>> >> >>
>> >> >>It works fine on 4.9, but stop to work on 4.10 and I have wait
>> >> >>possibility
>> >> >>to update 4.11.1.
>> >> >>Update to 4.11.1 has done now and looks like this not work still
>> >> >>
>> >> >>I removed all snapshot and start over from clean table, snapshot
>> >>policy
>> >> >>is: DAILY, keep 2, delta = 5
>> >> >>There are now 3 snapshot on primary SR and 3 on secondary SR
>> >> >>
>> >> >>The Environment is Xen 7 + CS 4.11.1 + NFS storage
>> >> >>
>> >> >>Is there some other parameter I should look and any other work around
>> >>I
>> >> >>has to do to get it work ?
>> >> >>
>> >> >>Any hits?
>> >> >>
>> >> >>BR
>> >> >>-sami
>> >> >>
>> >> >>VM snapshot Log for last round:
>> >> >>
>> >> >>2018-09-13 06:02:15,983 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> >> >>(API-Job-Executor-83:ctx-cfb1cd95 job-28020) (logid:4c7a1c38)
>> >>Executing
>> >> >>AsyncJobVO {id:28020, userId: 1, accountId: 7, instanceType: Snapshot,
>> >> >>instanceId: 2656, cmd:
>> >> >>org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd,
>> >> >>cmdInfo:
>> >>
>> {"policyid":"17","ctxUserId":"1","volumeid":"238","ctxStartEventId":"1"
>> ,"
>> >> >>i
>> >> >>d
>> >> >>":"2656","ctxAccountId":"7"}, cmdVersion: 0, status: IN_PROGRESS,
>> >> >>processStatus: 0, resultCode: 0, result: null, initMsid:
>> >>19873467853209,
>> >> >>completeMsid: null, lastUpdated: null, lastPolled: null, created:
>> >>null}
>> >> >>2018-09-13 06:02:15,991 DEBUG [c.c.u.AccountManagerImpl]
>> >> >>(API-Job-Executor-83:ctx-cfb1cd95 job-28020 ctx-5b8156a6)
>> >> >>(logid:4c7a1c38)
>> >> >>Access to Acct[479c643e-9c84-41fb-9f0a-9bb999893a25-juha] granted to
>> >> >>Acct[479c643e-9c84-41fb-9f0a-9bb999893a25-juha] by DomainChecker
>> >> >>2018-09-13 06:02:16,045 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
>> >> >>(API-Job-Executor-83:ctx-cfb1cd95 job-28020 ctx-5b8156a6)
>> >> >>(logid:4c7a1c38)
>> >> >>Sync job-28021 execution on object VmWork

Re: VR DHCP issue

2018-09-27 Thread Andrei Mikhailovsky

Hi Marcelo,

could you please elaborate on the fix please? I am having this issue with a 
couple of my VRs (not all). 

Running /etc/init.d/cloud-early-config start doesn't work as there is no such 
script in the init.d directory. Running service cloud-early-config start 
doesn't fix the DHCP issue.

Thanks

- Original Message -
> From: "Lotic Lists" 
> To: "users" 
> Sent: Friday, 21 September, 2018 03:50:12
> Subject: RE: VR DHCP issue

> Workaround is run "/etc/init.d/cloud-early-config start"
> 
> Att.
> Marcelo
> 
> -Original Message-
> From: Ivan Kudryavtsev 
> Sent: quinta-feira, 20 de setembro de 2018 13:46
> To: users 
> Subject: Re: VR DHCP issue
> 
> Hello, in the past and in 4.9 too, I have met a bug like that with DHCP:
> https://github.com/fgrehm/vagrant-lxc/issues/153
> 
> I have to apply that fix in our VRs to make it work always as expected.
> 
> чт, 20 сент. 2018 г., 22:52 Dag Sonstebo :
> 
>> Hi Allessandro,
>>
>> First of all have you tried to restart the networks "with cleanup"
>> (alternatively just destroyed the VRs and let them recreate)?
>>
>> Can you check the content of the following files on your problem VRs:
>>
>> /etc/dhcphosts.txt
>> /etc/cloudstack/dhcpentry.json
>>
>> Also look through the VR /var/log/cloud.log for any hints why DHCP
>> entries are not passed or parsed.
>>
>> Regards,
>> Dag Sonstebo
>> Cloud Architect
>> ShapeBlue
>>
>> On 20/09/2018, 16:39, "Alessandro Caviglione"
>> 
>> wrote:
>>
>> Hi guys,
>> I'm experiencing an issue in our CS 4.9.
>> In fact, sonce a week ago, randomly some instances became unreachable.
>> After investigation we see that instances does not have IP address.
>> We tried to restart VR, restart instance, migrate both to another
>> host, but
>> same issue.
>> So we configured the instance with a fixed ip address and it comes back
>> online.
>> We have had this issue on about 15 VR...
>> Any idea??
>>
>>
>>
>> dag.sonst...@shapeblue.com
>> www.shapeblue.com
>> Amadeus House, Floral Street, London  WC2E 9DPUK @shapeblue
>>
>>
>>

Re: Broken guest vm consoles after upgrading to 4.11.1.0

2018-07-11 Thread Andrei Mikhailovsky

Hi Ivan,

you are right, changing the enable SSL option in the general settings does fix 
the issue. I have been doing testing from two different browsers and one of 
them didn't have the right cert installed on the client side. So, the consoles 
didn't work. However, the correct browser set up started to work after the 
enable ssl option was set to True.

Thanks for your help

Andrei


- Original Message -
> From: "Ivan Kudryavtsev" 
> To: "users" 
> Sent: Monday, 9 July, 2018 17:13:42
> Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0

> Try recreatin CPVM, it worked for me. I haven't met such problem with wrong
> ports... Have you uploaded SSL chain to ACS?
> 
> пн, 9 июл. 2018 г., 23:05 Andrei Mikhailovsky :
> 
>> Ivan, thanks.
>>
>> I have found this option and changed from the default False value to True.
>> Restarted the management server and the CPVM. I can now see that the
>> generated link has changed to the IP address + domain (inf the form of
>> x-x-x-x.domain.com). However, this did not solve the problem as it is
>> trying to connect over port 443. The CPVM is not listening on that port,
>> only on port 80. So, it is not really helping me.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Ivan Kudryavtsev" 
>> > To: "users" 
>> > Sent: Monday, 9 July, 2018 11:40:07
>> > Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0
>>
>> > Hey, Andrei. There is a parameter ib global vars about SSL and CPVM which
>> > fixes it. Don't remember the name, but met it as well as you. I suppose
>> > it's a bug.
>> >
>> > пн, 9 июл. 2018 г., 17:35 Andrei Mikhailovsky > >:
>> >
>> >> Hello everyone,
>> >>
>> >> I have upgraded ACS from 4.11.0.0 to 4.11.1.0 over the weekend and have
>> >> noticed that after performing all the usual stuff, like upgrading
>> virtual
>> >> routers and recreating console proxy / ssvm I have lost access to the vm
>> >> consoles (both guest vms and system vms). I have performed the creation
>> of
>> >> host keys by clicking the button in ACS Gui. All hosts seems to have
>> done
>> >> this successfully with the Status changing from Unsecure to Up. The
>> console
>> >> access worked just fine prior to 4.11.1.0 upgrade.
>> >>
>> >> When I click on the Console button, a new browser window pops up. The
>> page
>> >> is empty. Inspecting the source I get the following (modified a bit to
>> save
>> >> space and replaced the domain name):
>> >>
>> >>
>> >>
>> >> VM-Name> >> src="http://*.DOMAIN.com/ajax?token=qxXZQlpCi7xa-o8XgJM6Z_fb> >> STUFF HERE>“>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Looking at the above, it is obvious that the *.DOMAIN.com is not valid.
>> If
>> >> I copy the URL and change the *.DOMAIN.com to the public IP address of
>> the
>> >> console proxy, I get access to the console just fine.
>> >>
>> >> Cheers
>> >>
>> >>
>> >>

Re: Broken guest vm consoles after upgrading to 4.11.1.0

2018-07-09 Thread Andrei Mikhailovsky

Thanks Andrija, I will look into this tomorrow.

Cheers



- Original Message -
> From: "Andrija Panic" 
> To: "users" 
> Sent: Monday, 9 July, 2018 22:58:09
> Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0

> In 4.8 - to make sure you are NOT hitting the improper SSL chain build,
> after the MGMT server restart, you could grep for following line in the
> MGMT logs
> 
> "Could not find and construct a valid SSL certificate"
> 
> but in 4.11 (master) I can't find this by searching within the
> repo...strange...
> 
> 
> On Mon, 9 Jul 2018 at 23:35, Andrija Panic  wrote:
> 
>> HI Andrei,
>>
>> I will share my setup, ACS 4.8 though - we also had "similar" issue from
>> 4.5 going forward to 4.8 - there was some settings that needed to be on
>> (for whatever reason), hope this will help
>>
>> consoleproxy.url.domain *.consoleproxy.net (yes we did buy that one
>> :D )
>> secstorage.ssl.cert.domain   *.consoleproxy.net
>> secstorage.encrypt.copy  true (I believe it was this one change
>> required !)
>>
>> (Sorry if this was not helpful, I know you are fighting  with 4.11)
>>
>> Anyhow, I would suggest examining keystore DB for the records, to see if
>> they are still correct and in correct sequence - since you say that CPVM is
>> not listening on 443 - seems like SSL chain issue maybe.
>>
>> Cheers
>>
>>
>>
>>
>>
>> On Mon, 9 Jul 2018 at 18:23, Andrei Mikhailovsky 
>> wrote:
>>
>>> Hi Ivan,
>>>
>>> I have recreated the CPVM, but that didn't help. The SSL cert + chain has
>>> been uploaded a few years ago and was working just fine up to the upgrade
>>> to 4.11.1.0.
>>>
>>> So, the issue must be somewhere else I guess.
>>>
>>> Andrei
>>>
>>> - Original Message -
>>> > From: "Ivan Kudryavtsev" 
>>> > To: "users" 
>>> > Sent: Monday, 9 July, 2018 17:13:42
>>> > Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0
>>>
>>> > Try recreatin CPVM, it worked for me. I haven't met such problem with
>>> wrong
>>> > ports... Have you uploaded SSL chain to ACS?
>>> >
>>> > пн, 9 июл. 2018 г., 23:05 Andrei Mikhailovsky >> >:
>>> >
>>> >> Ivan, thanks.
>>> >>
>>> >> I have found this option and changed from the default False value to
>>> True.
>>> >> Restarted the management server and the CPVM. I can now see that the
>>> >> generated link has changed to the IP address + domain (inf the form of
>>> >> x-x-x-x.domain.com). However, this did not solve the problem as it is
>>> >> trying to connect over port 443. The CPVM is not listening on that
>>> port,
>>> >> only on port 80. So, it is not really helping me.
>>> >>
>>> >> Andrei
>>> >>
>>> >> - Original Message -
>>> >> > From: "Ivan Kudryavtsev" 
>>> >> > To: "users" 
>>> >> > Sent: Monday, 9 July, 2018 11:40:07
>>> >> > Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0
>>> >>
>>> >> > Hey, Andrei. There is a parameter ib global vars about SSL and CPVM
>>> which
>>> >> > fixes it. Don't remember the name, but met it as well as you. I
>>> suppose
>>> >> > it's a bug.
>>> >> >
>>> >> > пн, 9 июл. 2018 г., 17:35 Andrei Mikhailovsky
>>> >> >> >:
>>> >> >
>>> >> >> Hello everyone,
>>> >> >>
>>> >> >> I have upgraded ACS from 4.11.0.0 to 4.11.1.0 over the weekend and
>>> have
>>> >> >> noticed that after performing all the usual stuff, like upgrading
>>> >> virtual
>>> >> >> routers and recreating console proxy / ssvm I have lost access to
>>> the vm
>>> >> >> consoles (both guest vms and system vms). I have performed the
>>> creation
>>> >> of
>>> >> >> host keys by clicking the button in ACS Gui. All hosts seems to have
>>> >> done
>>> >> >> this successfully with the Status changing from Unsecure to Up. The
>>> >> console
>>> >> >> access worked just fine prior to 4.11.1.0 upgrade.
>>> >> >>
>>> >> >> When I click on the Console button, a new browser window pops up.
>>> The
>>> >> page
>>> >> >> is empty. Inspecting the source I get the following (modified a bit
>>> to
>>> >> save
>>> >> >> space and replaced the domain name):
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> VM-Name>> >> >> src="http://*.DOMAIN.com/ajax?token=qxXZQlpCi7xa-o8XgJM6Z_fb>> >> >> STUFF HERE>“>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> Looking at the above, it is obvious that the *.DOMAIN.com is not
>>> valid.
>>> >> If
>>> >> >> I copy the URL and change the *.DOMAIN.com to the public IP address
>>> of
>>> >> the
>>> >> >> console proxy, I get access to the console just fine.
>>> >> >>
>>> >> >> Cheers
>>> >> >>
>>> >> >>
>>> >> >>
>>>
>>
>>
>> --
>>
>> Andrija Panić
>>
> 
> 
> --
> 
> Andrija Panić

Re: Broken guest vm consoles after upgrading to 4.11.1.0

2018-07-09 Thread Andrei Mikhailovsky

Hi Ivan,

I have recreated the CPVM, but that didn't help. The SSL cert + chain has been 
uploaded a few years ago and was working just fine up to the upgrade to 
4.11.1.0.

So, the issue must be somewhere else I guess.

Andrei

- Original Message -
> From: "Ivan Kudryavtsev" 
> To: "users" 
> Sent: Monday, 9 July, 2018 17:13:42
> Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0

> Try recreatin CPVM, it worked for me. I haven't met such problem with wrong
> ports... Have you uploaded SSL chain to ACS?
> 
> пн, 9 июл. 2018 г., 23:05 Andrei Mikhailovsky :
> 
>> Ivan, thanks.
>>
>> I have found this option and changed from the default False value to True.
>> Restarted the management server and the CPVM. I can now see that the
>> generated link has changed to the IP address + domain (inf the form of
>> x-x-x-x.domain.com). However, this did not solve the problem as it is
>> trying to connect over port 443. The CPVM is not listening on that port,
>> only on port 80. So, it is not really helping me.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Ivan Kudryavtsev" 
>> > To: "users" 
>> > Sent: Monday, 9 July, 2018 11:40:07
>> > Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0
>>
>> > Hey, Andrei. There is a parameter ib global vars about SSL and CPVM which
>> > fixes it. Don't remember the name, but met it as well as you. I suppose
>> > it's a bug.
>> >
>> > пн, 9 июл. 2018 г., 17:35 Andrei Mikhailovsky > >:
>> >
>> >> Hello everyone,
>> >>
>> >> I have upgraded ACS from 4.11.0.0 to 4.11.1.0 over the weekend and have
>> >> noticed that after performing all the usual stuff, like upgrading
>> virtual
>> >> routers and recreating console proxy / ssvm I have lost access to the vm
>> >> consoles (both guest vms and system vms). I have performed the creation
>> of
>> >> host keys by clicking the button in ACS Gui. All hosts seems to have
>> done
>> >> this successfully with the Status changing from Unsecure to Up. The
>> console
>> >> access worked just fine prior to 4.11.1.0 upgrade.
>> >>
>> >> When I click on the Console button, a new browser window pops up. The
>> page
>> >> is empty. Inspecting the source I get the following (modified a bit to
>> save
>> >> space and replaced the domain name):
>> >>
>> >>
>> >>
>> >> VM-Name> >> src="http://*.DOMAIN.com/ajax?token=qxXZQlpCi7xa-o8XgJM6Z_fb> >> STUFF HERE>“>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Looking at the above, it is obvious that the *.DOMAIN.com is not valid.
>> If
>> >> I copy the URL and change the *.DOMAIN.com to the public IP address of
>> the
>> >> console proxy, I get access to the console just fine.
>> >>
>> >> Cheers
>> >>
>> >>
>> >>

Re: Broken guest vm consoles after upgrading to 4.11.1.0

2018-07-09 Thread Andrei Mikhailovsky

Ivan, thanks.

I have found this option and changed from the default False value to True. 
Restarted the management server and the CPVM. I can now see that the generated 
link has changed to the IP address + domain (inf the form of 
x-x-x-x.domain.com). However, this did not solve the problem as it is trying to 
connect over port 443. The CPVM is not listening on that port, only on port 80. 
So, it is not really helping me.

Andrei

- Original Message -
> From: "Ivan Kudryavtsev" 
> To: "users" 
> Sent: Monday, 9 July, 2018 11:40:07
> Subject: Re: Broken guest vm consoles after upgrading to 4.11.1.0

> Hey, Andrei. There is a parameter ib global vars about SSL and CPVM which
> fixes it. Don't remember the name, but met it as well as you. I suppose
> it's a bug.
> 
> пн, 9 июл. 2018 г., 17:35 Andrei Mikhailovsky :
> 
>> Hello everyone,
>>
>> I have upgraded ACS from 4.11.0.0 to 4.11.1.0 over the weekend and have
>> noticed that after performing all the usual stuff, like upgrading virtual
>> routers and recreating console proxy / ssvm I have lost access to the vm
>> consoles (both guest vms and system vms). I have performed the creation of
>> host keys by clicking the button in ACS Gui. All hosts seems to have done
>> this successfully with the Status changing from Unsecure to Up. The console
>> access worked just fine prior to 4.11.1.0 upgrade.
>>
>> When I click on the Console button, a new browser window pops up. The page
>> is empty. Inspecting the source I get the following (modified a bit to save
>> space and replaced the domain name):
>>
>>
>>
>> VM-Name> src="http://*.DOMAIN.com/ajax?token=qxXZQlpCi7xa-o8XgJM6Z_fb> STUFF HERE>“>
>>
>>
>>
>>
>>
>>
>> Looking at the above, it is obvious that the *.DOMAIN.com is not valid. If
>> I copy the URL and change the *.DOMAIN.com to the public IP address of the
>> console proxy, I get access to the console just fine.
>>
>> Cheers
>>
>>
>>

Agent logs having Exceptions after upgrading to 4.11.1.0

2018-07-09 Thread Andrei Mikhailovsky

Hello everyone, 

I have started seeing a number of Exceptions entries similar to the one below: 

2018-07-09 11:28:41,582 INFO [cloud.agent.Agent] (Agent-Handler-2:null) 
(logid:) Connected to the host: 192.168.169.13 
2018-07-09 11:33:35,631 WARN [cloud.agent.Agent] (agentRequest-Handler-2:null) 
(logid:3d56b608) Caught: 
java.lang.NumberFormatException: For input string: "iptables" 
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) 
at java.lang.Long.parseLong(Long.java:589) 
at java.lang.Long.parseLong(Long.java:631) 
at 
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.getNetworkStats(LibvirtComputingResource.java:1926)
 
at 
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtNetworkUsageCommandWrapper.execute(LibvirtNetworkUsageCommandWrapper.java:54)
 
at 
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtNetworkUsageCommandWrapper.execute(LibvirtNetworkUsageCommandWrapper.java:29)
 
at 
com.cloud.hypervisor.kvm.resource.wrapper.LibvirtRequestWrapper.execute(LibvirtRequestWrapper.java:78)
 
at 
com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.executeRequest(LibvirtComputingResource.java:1449)
 
at com.cloud.agent.Agent.processRequest(Agent.java:644) 
at com.cloud.agent.Agent$AgentRequestHandler.doTask(Agent.java:1082) 
at com.cloud.utils.nio.Task.call(Task.java:83) 
at com.cloud.utils.nio.Task.call(Task.java:29) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748) 


Not seen those in the agent.log before. The hosts are KVM with Ubuntu 16.04 
with the latest updates. Haven't seen those prior to 4.11.1.0 upgrade. 

iptables --version 
iptables v1.6.0 

Cheers 

Andrei

Broken guest vm consoles after upgrading to 4.11.1.0

2018-07-09 Thread Andrei Mikhailovsky

Hello everyone, 

I have upgraded ACS from 4.11.0.0 to 4.11.1.0 over the weekend and have noticed 
that after performing all the usual stuff, like upgrading virtual routers and 
recreating console proxy / ssvm I have lost access to the vm consoles (both 
guest vms and system vms). I have performed the creation of host keys by 
clicking the button in ACS Gui. All hosts seems to have done this successfully 
with the Status changing from Unsecure to Up. The console access worked just 
fine prior to 4.11.1.0 upgrade. 

When I click on the Console button, a new browser window pops up. The page is 
empty. Inspecting the source I get the following (modified a bit to save space 
and replaced the domain name): 



VM-Namehttp://*.DOMAIN.com/ajax?token=qxXZQlpCi7xa-o8XgJM6Z_fb“> 






Looking at the above, it is obvious that the *.DOMAIN.com is not valid. If I 
copy the URL and change the *.DOMAIN.com to the public IP address of the 
console proxy, I get access to the console just fine. 

Cheers

Re: [RESULT][VOTE] Apache CloudStack 4.11.1.0

2018-06-27 Thread Andrei Mikhailovsky

Congratulations everyone on making this happen! Well done guys!

Andrei

- Original Message -
> From: "Paul Angus" 
> To: "dev" , "users" 
> Sent: Tuesday, 26 June, 2018 17:09:52
> Subject: [RESULT][VOTE] Apache CloudStack 4.11.1.0

> Hi All,
> 
> After 72 hours, the vote for CloudStack 4.11.1.0 *passes* with
> 3 PMC + 2 non-PMC votes.
> 
> +1 (PMC / binding)
> 
> Rohit Yadav
> 
> Paul Angus
> 
> Mike Tutkowski
> 
> +1 (non binding)
> 
> Nicolas Vazquez
> 
> Boris Stoyanov
> 
> 0
> Rene Moser
> 
> -1
> none
> 
> Thanks to everyone participating.
> 
> I will now prepare the release announcement to go out after 24 hours to give 
> the
> mirrors time to catch up.
> 
> 
> Kind regards,
> 
> Paul Angus
> 
> 
> 
> paul.an...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-20 Thread Andrei Mikhailovsky

Hello everyone,

Further into the investigation, the problem was pinned down to those rules.
All the traffic from internal IP on the static NATed connection were forced to 
go to the outside interface (eth1), by setting the mark 0x1 and then using the 
matching # ip rule to direct it. 

#iptables -t mangle -L PREROUTING -vn
Chain  PREROUTING  (policyACCEPT  97   packets,  11395  bytes)
pkts   bytes   target protopt  inoutsource  
  destination
49 3644CONNMARK   all --   * *  10.1.10.100 
  0.0.0.0/0state  NEW  CONNMARK  save
37 2720MARK   all --   * *  10.1.20.100 
  0.0.0.0/0state  NEW  MARK  set  0x1
37 2720CONNMARK   all --   * *  10.1.20.100 
  0.0.0.0/0state  NEW  CONNMARK  save
1148472MARK   all --   * *  10.1.10.100 
  0.0.0.0/0state  NEW  MARK  set  0x1
1148472CONNMARK   all --   * *  10.1.10.100 
  0.0.0.0/0state  NEW  CONNMARK  save


# ip rule
0:  from all lookup local 
32761:  from all fwmark 0x3 lookup Table_eth3 
32762:  from all fwmark 0x2 lookup Table_eth2 
32763:  from all fwmark 0x1 lookup Table_eth1 
32764:  from 10.1.0.0/16 lookup static_route_back 
32765:  from 10.1.0.0/16 lookup static_route 
32766:  from all lookup main 
32767:  from all lookup default 


The acceptable solution is to delete those rules alltogether.

The problem with such approach is that the inter VPC traffic will use the 
internal IP addresses, 
so the packets going from 178.248.108.77 to 178.248.108.113
would be seen as communication between 10.1.10.100 and 10.1.20.100

thus we need to apply further two rules
# iptables -t nat -I POSTROUTING -o eth3 -s 10.1.10.0/24 -d 10.1.20.0/24 -j 
SNAT --to-source 178.248.108.77
# iptables -t nat -I POSTROUTING -o eth2 -s 10.1.20.0/24 -d 10.1.10.0/24 -j 
SNAT --to-source 178.248.108.113

in order to make sure that the packets leaving the router would have correct 
source IP.

This way it is possible to have static NAT on all of the IPS within the VPC and 
ensure a successful communication between them.


So, for a quick and dirty fix, we ran this command on the VR:

for i in iptables -t mangle -L PREROUTING -vn| awk '/0x1/ && !/eth1/ {print 
$8}'; do iptables -t mangle -D PREROUTING -s $i -m state —state NEW -j MARK  
—set-mark "0x1" ; done

Obviously, this command has to be reran every time the router is restarted / 
recreated.

I will post our findings to the dev list, so that the person(s) responsible for 
the Source NAT code in the VPC could test/fix as needed.

Cheers

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Tuesday, 17 April, 2018 22:42:48
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Hello
> 
> The VPC networking issue has been investigated further. Here are our findings
> with 4.11.0 VPC and Static NAT public IPs:
> 
> 
> 
> Problem: no connectivity between virtual machines behind two Static NAT
> networks.
> 
> Situation: When one virtual machine sends a packet to the external address of
> the another virtual machine that are handled by the same router and both are
> behind the Static NAT the traffic does not work.
> 
> 
> 
>   10.1.10.100   10.1.10.1:eth2  eth3:10.1.20.1   10.1.20.100
> virt1 <--->  router  <--->   virt2
>   178.248.108.77:eth1:178.248.108.113
> 
> 
> a single packet is send from virt1 to virt2.
> 
> 
> stage1: it arrives to the router on eth2  and enters "nat_PREROUTING"
> IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113)
> 
> goes through the "10  1K DNAT   all  --  *  *   0.0.0.0/0
> 178.248.108.113  to:10.1.20.100
> " rule and has the DST DNATED to the internal IP of the virt2
> 
> 
> stage2: Enters the FORWARDING chain and is being DROPPED by the default 
> policy.
> DROPPED:IN=eth2 OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100
> 
> The reason being is that the OUT interface is not correctly changed from eth1 
> to
> eth3 during the nat_PREROUTING
> so  the packet is not intercepted by the FORWARD rule and thus not accepted.
> "2414KACL_INBOUND_eth3all   --   * eth3  0.0.0.0/0
> 10.1.20.0/24"
> 
> 
> stage3: manually inserted rule to accept this packet for FORWARDING.
> the packet enters the "nat_POSTROUTING" chain
> IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100
> 
> and has the SRC changed to the external IP
>  16  1320 SNAT   all  --  *  eth110.1.10.100  0.0.0.0/0
>

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-17 Thread Andrei Mikhailovsky

Hello

The VPC networking issue has been investigated further. Here are our findings 
with 4.11.0 VPC and Static NAT public IPs:



Problem: no connectivity between virtual machines behind two Static NAT 
networks.

Situation: When one virtual machine sends a packet to the external address of 
the another virtual machine that are handled by the same router and both are 
behind the Static NAT the traffic does not work.



   10.1.10.100   10.1.10.1:eth2  eth3:10.1.20.1   10.1.20.100
virt1 <--->  router  <--->   virt2 
   178.248.108.77:eth1:178.248.108.113


a single packet is send from virt1 to virt2. 


stage1: it arrives to the router on eth2  and enters "nat_PREROUTING"
IN=eth2 OUT= SRC=10.1.10.100 DST=178.248.108.113)

goes through the "10  1K DNAT   all  --  *  *   0.0.0.0/0   
 178.248.108.113  to:10.1.20.100
" rule and has the DST DNATED to the internal IP of the virt2 


stage2: Enters the FORWARDING chain and is being DROPPED by the default policy.
DROPPED:IN=eth2 OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100 

The reason being is that the OUT interface is not correctly changed from eth1 
to eth3 during the nat_PREROUTING 
so  the packet is not intercepted by the FORWARD rule and thus not accepted.
"2414KACL_INBOUND_eth3all   --   * eth3  0.0.0.0/0
10.1.20.0/24"


stage3: manually inserted rule to accept this packet for FORWARDING.
the packet enters the "nat_POSTROUTING" chain
IN= OUT=eth1 SRC=10.1.10.100 DST=10.1.20.100

and has the SRC changed to the external IP 
  16  1320 SNAT   all  --  *  eth110.1.10.100  0.0.0.0/0
to:178.248.108.77

and is sent to the external network on eth1.
13:37:44.834341 IP 178.248.108.77 > 10.1.20.100: ICMP echo request, id 2644, 
seq 2, length 64


For some reason, during the nat_PREROUTING stage the DST_IP is changed, but the 
OUT interface still reflects the interface associated with the old DST_IP.

Here is the routing table
# ip route list
default via 178.248.108.1 dev eth1 
10.1.10.0/24 dev eth2 proto kernel scope link src 10.1.10.1 
10.1.20.0/24 dev eth3 proto kernel scope link src 10.1.20.1 
169.254.0.0/16 dev eth0 proto kernel scope link src 169.254.0.5 
178.248.108.0/25 dev eth1 proto kernel scope link src 178.248.108.101 

# ip rule list 
0:  from all lookup local 
32761:  from all fwmark 0x3 lookup Table_eth3 
32762:  from all fwmark 0x2 lookup Table_eth2 
32763:  from all fwmark 0x1 lookup Table_eth1 
32764:  from 10.1.0.0/16 lookup static_route_back 
32765:  from 10.1.0.0/16 lookup static_route 
32766:  from all lookup main 
32767:  from all lookup default 


The issue hasn't been seen on 4.9.3 prior to the upgrade.

Could someone please comment on how this issue could be fixed?

Thanks

Andrei





- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 16 April, 2018 22:32:25
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Hello,
> 
> I have done some more testing with the VPC network tiers and it seems that the
> Static NAT is indeed causing connectivity issues. Here is what I've done:
> 
> 
> Setup 1. I have created two test network tiers with one guest vm in each tier.
> Static NAT is NOT enabled. Each VM has a port forwarding rule (port 22) from
> its dedicated public IP address. ACLs have been setup to allow traffic on port
> 22 from the private ip addresses on each network tier.
> 
> 1. ACLs seems to work just fine. traffic between the networks flows according 
> to
> the rules. both vms can see each other's private IPs and can ping/ssh/etc
> 
> 2. From the Internet hosts can access vms on port 22
> 
> 4. The vms can also access each other and itself on their public IPs. I don't
> think this worked before, but could be wrong.
> 
> 
> 
> Setup 2. Everything the same as Setup 1, but one public IP address has been
> setup as Static NAT to one guest vm. the second guest vm and second public IP
> remained unchanged.
> 
> 1. ACLs stopped working correctly (see below)
> 
> 2. From the Internet hosts can access vms on port 22, including the Static NAT
> vm
> 
> 3. Other guest vms can access the Static NAT vm using private & public IP
> addresses
> 
> 4. Static NAT vm can NOT access other vms neither using public nor private IPs
> 
> 5. Static NAT vm can access the internet hosts (apart from the public IP range
> belonging to the cloudstack setup)
> 
> 
> The above behaviour of Setup 2 scenarios is very strange, especially points 4 
> &
> 5.
> 
> Any thoughts anyone?
> 
> Cheers
> 
> - Original Message -
>> From: "Rohit Yadav" 
>> To: "users" 
>> Sent: Thursday, 12 Apr

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-16 Thread Andrei Mikhailovsky

Hello,

I have done some more testing with the VPC network tiers and it seems that the 
Static NAT is indeed causing connectivity issues. Here is what I've done:


Setup 1. I have created two test network tiers with one guest vm in each tier. 
Static NAT is NOT enabled. Each VM has a port forwarding rule (port 22) from 
its dedicated public IP address. ACLs have been setup to allow traffic on port 
22 from the private ip addresses on each network tier.

1. ACLs seems to work just fine. traffic between the networks flows according 
to the rules. both vms can see each other's private IPs and can ping/ssh/etc

2. From the Internet hosts can access vms on port 22

4. The vms can also access each other and itself on their public IPs. I don't 
think this worked before, but could be wrong.



Setup 2. Everything the same as Setup 1, but one public IP address has been 
setup as Static NAT to one guest vm. the second guest vm and second public IP 
remained unchanged.

1. ACLs stopped working correctly (see below)

2. From the Internet hosts can access vms on port 22, including the Static NAT 
vm

3. Other guest vms can access the Static NAT vm using private & public IP 
addresses

4. Static NAT vm can NOT access other vms neither using public nor private IPs

5. Static NAT vm can access the internet hosts (apart from the public IP range 
belonging to the cloudstack setup)


The above behaviour of Setup 2 scenarios is very strange, especially points 4 & 
5.

Any thoughts anyone?

Cheers

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Sent: Thursday, 12 April, 2018 12:06:54
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Hi Andrei,
> 
> 
> Thanks for sharing, yes the egress thing is a known issue which is caused due 
> to
> failure during VR setup to create egress table. By performing a restart of the
> network (without cleanup option selected), the egress table gets created and
> rules are successfully applied.
> 
> 
> The issue has been fixed in the vr downtime pr:
> 
> https://github.com/apache/cloudstack/pull/2508
> 
> 
> - Rohit
> 
> <https://cloudstack.apache.org>
> 
> 
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Tuesday, April 3, 2018 3:33:43 PM
> To: users
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> 
> Rohit,
> 
> Following the update from 4.9.3 to 4.11.0, I would like to comment on a few
> things:
> 
> 1. The upgrade went well, a part from the cloudstack-management server startup
> issue that I've described in my previous email.
> 2. there was an issue with the virtual router template upgrade. The issue is
> described below:
> 
> VR template upgrade issue:
> 
> After updating the systemvm template I went onto the Infrastructure > Virtual
> Routers and selected the Update template option for each virtual router. The
> virtual routers were updated successfully using the new templates. However,
> this has broken ALL Egress rules on all networks and none of the guest vms.
> Port forwarding / incoming rules were working just fine. Removal and addition
> of Egress rules did not fix the issue. To fix the issue I had to restart each
> of the networks with the Clean up option ticked.
> 
> 
> Cheers
> 
> Andrei
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Monday, 2 April, 2018 21:44:27
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> 
>> Hi Rohit,
>>
>> Following some further investigation it seems that the installation packages
>> replaced the following file:
>>
>> /etc/default/cloudstack-management
>>
>> with
>>
>> /etc/default/cloudstack-management.dpkg-dist
>>
>>
>> Thus, the management server couldn't load the env variables and thus was 
>> unable
>> to start.
>>
>> I've put the file back and the management server is able to start.
>>
>> I will let you know if there are any other issues/problems.
>>
>> Cheers
>>
>> Andrei
>>
>>
>>
>> - Original Message -
>>> From: "Andrei Mikhailovsky" 
>>> To: "users" 
>>> Sent: Monday, 2 April, 2018 20:58:59
>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
>>
>>> Hi Rohit,
>>>
>>> I have just upgraded and having issues starting the service with the 
>>> following
>>> error:
>>>
>>>
>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-managem

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-16 Thread Andrei Mikhailovsky

Hi Rohit,

thanks for the information.

I have been doing some more testing and found an issue with creating vms. Not 
sure if it has been documented previously. It seems that if you are creating a 
new vm from template on KVM + ceph rbd and choose to have an additional disk 
you get an error with Insufficient Capacity. The management server logs state 
that it was trying to create the data disk using QCOW2 format but the storage 
only supports RAW (as ceph rbd only supports RAW). Creating a new vm without an 
extra data disk works just fine.

Also, I am trying to figure out what is causing VPC network acls issues in some 
of the VPCs that we have. At the moment, can't get my head around it, but will 
get to the bottom of it. It seems that under some circumstances the VPCs are 
unable to route traffic between the hosts in different network tiers despite 
having appropriate ACLs. My thoughts are that it relates to enabling the Static 
NAT option on those vms, but this needs to be tested and confirmed. I will keep 
posted on this issue, unless it has been previously reported.

Cheers

Andrei

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Sent: Thursday, 12 April, 2018 12:06:54
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Hi Andrei,
> 
> 
> Thanks for sharing, yes the egress thing is a known issue which is caused due 
> to
> failure during VR setup to create egress table. By performing a restart of the
> network (without cleanup option selected), the egress table gets created and
> rules are successfully applied.
> 
> 
> The issue has been fixed in the vr downtime pr:
> 
> https://github.com/apache/cloudstack/pull/2508
> 
> 
> - Rohit
> 
> <https://cloudstack.apache.org>
> 
> 
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Tuesday, April 3, 2018 3:33:43 PM
> To: users
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> 
> Rohit,
> 
> Following the update from 4.9.3 to 4.11.0, I would like to comment on a few
> things:
> 
> 1. The upgrade went well, a part from the cloudstack-management server startup
> issue that I've described in my previous email.
> 2. there was an issue with the virtual router template upgrade. The issue is
> described below:
> 
> VR template upgrade issue:
> 
> After updating the systemvm template I went onto the Infrastructure > Virtual
> Routers and selected the Update template option for each virtual router. The
> virtual routers were updated successfully using the new templates. However,
> this has broken ALL Egress rules on all networks and none of the guest vms.
> Port forwarding / incoming rules were working just fine. Removal and addition
> of Egress rules did not fix the issue. To fix the issue I had to restart each
> of the networks with the Clean up option ticked.
> 
> 
> Cheers
> 
> Andrei
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Monday, 2 April, 2018 21:44:27
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> 
>> Hi Rohit,
>>
>> Following some further investigation it seems that the installation packages
>> replaced the following file:
>>
>> /etc/default/cloudstack-management
>>
>> with
>>
>> /etc/default/cloudstack-management.dpkg-dist
>>
>>
>> Thus, the management server couldn't load the env variables and thus was 
>> unable
>> to start.
>>
>> I've put the file back and the management server is able to start.
>>
>> I will let you know if there are any other issues/problems.
>>
>> Cheers
>>
>> Andrei
>>
>>
>>
>> - Original Message -
>>> From: "Andrei Mikhailovsky" 
>>> To: "users" 
>>> Sent: Monday, 2 April, 2018 20:58:59
>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
>>
>>> Hi Rohit,
>>>
>>> I have just upgraded and having issues starting the service with the 
>>> following
>>> error:
>>>
>>>
>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service:
>>> Failed to load environment files: No such file or directory
>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service:
>>> Failed to run 'start-pre' task: No such file or directory
>>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: Failed to start CloudStack
>>> Management Server.
>>> -- Subject: Unit cloudstack-m

VPC issues after upgrading from 4.9.3 to 4.11.0

2018-04-05 Thread Andrei Mikhailovsky

Hello, 

I have identified a critical VPC issue after we've upgraded to 4.11.0 on KVM 
hypervisors. The problem is the connectivity between network tiers within the 
VPC stopped working after the upgrade. Doing VPC restart with the Clean Up 
doesn't help. 


It seems that the VPC's iptable rules are all messed up and they reference 
wrong interfaces. The iptable rules are all created using the eth0 interface 
and not using the tier's corresponding network interface. For example: 


0 0 SNAT all — * eth0 10.1.60.0/24 10.1.60.30 to:10.1.70.1 
0 0 SNAT all — * eth1 10.1.60.30 0.0.0.0/0 to:178.248.108.109 
0 0 SNAT all — * eth0 10.1.60.0/24 10.1.60.4 to:10.1.70.1 
0 0 SNAT all — * eth1 10.1.60.4 0.0.0.0/0 to:178.248.108.104 
0 0 SNAT all — * eth0 10.1.60.0/24 10.1.60.146 to:10.1.70.1 
4 304 SNAT all — * eth1 10.1.60.146 0.0.0.0/0 to:178.248.108.44 

The network interface that corresponds to the 10.1.60.0/24 is on eth6. The same 
happens with 

Could anyone suggest the fix for this? 

Thanks 

Andrei

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-03 Thread Andrei Mikhailovsky

Rohit,

Following the update from 4.9.3 to 4.11.0, I would like to comment on a few 
things:

1. The upgrade went well, a part from the cloudstack-management server startup 
issue that I've described in my previous email.
2. there was an issue with the virtual router template upgrade. The issue is 
described below:

VR template upgrade issue:

After updating the systemvm template I went onto the Infrastructure > Virtual 
Routers and selected the Update template option for each virtual router. The 
virtual routers were updated successfully using the new templates. However, 
this has broken ALL Egress rules on all networks and none of the guest vms. 
Port forwarding / incoming rules were working just fine. Removal and addition 
of Egress rules did not fix the issue. To fix the issue I had to restart each 
of the networks with the Clean up option ticked.


Cheers

Andrei
- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 2 April, 2018 21:44:27
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Hi Rohit,
> 
> Following some further investigation it seems that the installation packages
> replaced the following file:
> 
> /etc/default/cloudstack-management
> 
> with
> 
> /etc/default/cloudstack-management.dpkg-dist
> 
> 
> Thus, the management server couldn't load the env variables and thus was 
> unable
> to start.
> 
> I've put the file back and the management server is able to start.
> 
> I will let you know if there are any other issues/problems.
> 
> Cheers
> 
> Andrei
> 
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Monday, 2 April, 2018 20:58:59
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> 
>> Hi Rohit,
>> 
>> I have just upgraded and having issues starting the service with the 
>> following
>> error:
>> 
>> 
>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service:
>> Failed to load environment files: No such file or directory
>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service:
>> Failed to run 'start-pre' task: No such file or directory
>> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: Failed to start CloudStack
>> Management Server.
>> -- Subject: Unit cloudstack-management.service has failed
>> -- Defined-By: systemd
>> 
>> Cheers
>> 
>> Andrei
>> 
>> - Original Message -
>>> From: "Rohit Yadav" 
>>> To: "users" 
>>> Sent: Friday, 30 March, 2018 19:17:48
>>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
>> 
>>> Some of the upgrade and minor issues have been fixed and will make their way
>>> into 4.11.1.0. You're welcome to upgrade and share your feedback, but bear 
>>> in
>>> mind due to some changes a new/updated systemvmtemplate need to be issued 
>>> for
>>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases, but
>>> 4.11.0.0 users will have to register that new template).
>>> 
>>> 
>>> 
>>> - Rohit
>>> 
>>> <https://cloudstack.apache.org>
>>> 
>>> 
>>> 
>>> 
>>> From: Andrei Mikhailovsky 
>>> Sent: Friday, March 30, 2018 11:00:34 PM
>>> To: users
>>> Subject: Upgrade from ACS 4.9.3 to 4.11.0
>>> 
>>> Hello,
>>> 
>>> My current infrastructure is ACS 4.9.3 with KVM based on Ubuntu 16.04 
>>> servers
>>> for the KVM hosts and the management server.
>>> 
>>> I am planning to perform an upgrade from ACS 4.9.3 to 4.11.0 and was 
>>> wondering
>>> if anyone had any issues during the upgrades? Anything to watch out for?
>>> 
>>> I have previously seen issues with upgrading to 4.10, which required some 
>>> manual
>>> db updates from what I recall. Has this issue been fixed in the 4.11 upgrade
>>> process?
>>> 
>>> thanks
>>> 
>>> Andrei
>>> 
>>> rohit.ya...@shapeblue.com
>>> www.shapeblue.com
>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > > @shapeblue

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-02 Thread Andrei Mikhailovsky

Hi Rohit,

Following some further investigation it seems that the installation packages 
replaced the following file:

/etc/default/cloudstack-management

with

/etc/default/cloudstack-management.dpkg-dist


Thus, the management server couldn't load the env variables and thus was unable 
to start.

I've put the file back and the management server is able to start. 

I will let you know if there are any other issues/problems.

Cheers

Andrei



- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 2 April, 2018 20:58:59
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Hi Rohit,
> 
> I have just upgraded and having issues starting the service with the following
> error:
> 
> 
> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service:
> Failed to load environment files: No such file or directory
> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service:
> Failed to run 'start-pre' task: No such file or directory
> Apr 02 20:56:37 ais-cloudhost13 systemd[1]: Failed to start CloudStack
> Management Server.
> -- Subject: Unit cloudstack-management.service has failed
> -- Defined-By: systemd
> 
> Cheers
> 
> Andrei
> 
> - Original Message -
>> From: "Rohit Yadav" 
>> To: "users" 
>> Sent: Friday, 30 March, 2018 19:17:48
>> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0
> 
>> Some of the upgrade and minor issues have been fixed and will make their way
>> into 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in
>> mind due to some changes a new/updated systemvmtemplate need to be issued for
>> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases, but
>> 4.11.0.0 users will have to register that new template).
>> 
>> 
>> 
>> - Rohit
>> 
>> <https://cloudstack.apache.org>
>> 
>> 
>> 
>> 
>> From: Andrei Mikhailovsky 
>> Sent: Friday, March 30, 2018 11:00:34 PM
>> To: users
>> Subject: Upgrade from ACS 4.9.3 to 4.11.0
>> 
>> Hello,
>> 
>> My current infrastructure is ACS 4.9.3 with KVM based on Ubuntu 16.04 servers
>> for the KVM hosts and the management server.
>> 
>> I am planning to perform an upgrade from ACS 4.9.3 to 4.11.0 and was 
>> wondering
>> if anyone had any issues during the upgrades? Anything to watch out for?
>> 
>> I have previously seen issues with upgrading to 4.10, which required some 
>> manual
>> db updates from what I recall. Has this issue been fixed in the 4.11 upgrade
>> process?
>> 
>> thanks
>> 
>> Andrei
>> 
>> rohit.ya...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue

Re: Upgrade from ACS 4.9.3 to 4.11.0

2018-04-02 Thread Andrei Mikhailovsky

Hi Rohit,

I have just upgraded and having issues starting the service with the following 
error:


Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service: 
Failed to load environment files: No such file or directory
Apr 02 20:56:37 ais-cloudhost13 systemd[1]: cloudstack-management.service: 
Failed to run 'start-pre' task: No such file or directory
Apr 02 20:56:37 ais-cloudhost13 systemd[1]: Failed to start CloudStack 
Management Server.
-- Subject: Unit cloudstack-management.service has failed
-- Defined-By: systemd

Cheers

Andrei

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Sent: Friday, 30 March, 2018 19:17:48
> Subject: Re: Upgrade from ACS 4.9.3 to 4.11.0

> Some of the upgrade and minor issues have been fixed and will make their way
> into 4.11.1.0. You're welcome to upgrade and share your feedback, but bear in
> mind due to some changes a new/updated systemvmtemplate need to be issued for
> 4.11.1.0 (it will be compatible for both 4.11.0.0 and 4.11.1.0 releases, but
> 4.11.0.0 users will have to register that new template).
> 
> 
> 
> - Rohit
> 
> <https://cloudstack.apache.org>
> 
> 
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Friday, March 30, 2018 11:00:34 PM
> To: users
> Subject: Upgrade from ACS 4.9.3 to 4.11.0
> 
> Hello,
> 
> My current infrastructure is ACS 4.9.3 with KVM based on Ubuntu 16.04 servers
> for the KVM hosts and the management server.
> 
> I am planning to perform an upgrade from ACS 4.9.3 to 4.11.0 and was wondering
> if anyone had any issues during the upgrades? Anything to watch out for?
> 
> I have previously seen issues with upgrading to 4.10, which required some 
> manual
> db updates from what I recall. Has this issue been fixed in the 4.11 upgrade
> process?
> 
> thanks
> 
> Andrei
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue

Upgrade from ACS 4.9.3 to 4.11.0

2018-03-30 Thread Andrei Mikhailovsky

Hello, 

My current infrastructure is ACS 4.9.3 with KVM based on Ubuntu 16.04 servers 
for the KVM hosts and the management server. 

I am planning to perform an upgrade from ACS 4.9.3 to 4.11.0 and was wondering 
if anyone had any issues during the upgrades? Anything to watch out for? 

I have previously seen issues with upgrading to 4.10, which required some 
manual db updates from what I recall. Has this issue been fixed in the 4.11 
upgrade process? 

thanks 

Andrei

Re: VR routing issues in Advanced Mode

2018-03-05 Thread Andrei Mikhailovsky

Hi Dag,

Sorry for not being clear.

The ICMP works just fine. so the VPC network to VPC network ICMP works. 
However, from what I can see, the ping is replied by the VR itself without 
forwarding traffic to the virtual machine. So, the VPC virtual routers can see 
each other pings. The problem is with port forwarding to the vms. 

I will investigate on the VR side to see if they are actually forwarding tcp 
packets to the vm.

Cheers

- Original Message -
> From: "Dag Sonstebo" 
> To: "users" 
> Sent: Tuesday, 27 February, 2018 18:58:44
> Subject: Re: VR routing issues in Advanced Mode

> Hi Andrei,
> 
> Sorry lost you - are you saying it's now all working when you allow icmp in 
> your
> ACLs?
> 
> If not can you look at the tcpdumps on source and destination VRs as per my
> previous post? You may obviously have to run these on different interfaces
> depending on which VPC tier you are pinging from.
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> S: +44 20 3603 0540  | dag.sonst...@shapeblue.com | http://www.shapeblue.com
> <http://www.shapeblue.com/> | Twitter:@ShapeBlue
> <https://twitter.com/#!/shapeblue>
> 
> 
>On 27/02/2018, 10:27, "Andrei Mikhailovsky"  wrote:
> 
>Hi Dag,
>
>Thanks, for your reply which I've missed earlier.
>
>I have done some more digging around and would like to make some 
> corrections to
>the problem at hand.
>
>1. It seems that the problem only effects VPC networks within cloudstack.
>2. Networks which use non-VPC networking can talk to each other.
>3. VPC to VPC traffic is not working.
>4. VPC traffic can reach non-VPC network
>5. non-VPC traffic can't reach VPC network
>
>In all cases, if the icmp is allowed in the ACLs, all networks can ping 
> each
>other. So, the traffic is being routed and reaches the virtual router.
>
>Any advice?
>
>Thanks
>
>
>
>
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Dag Sonstebo" 
>> To: "users" 
>> Sent: Friday, 23 February, 2018 16:45:04
>> Subject: Re: VR routing issues in Advanced Mode
>
>> Hi Andrei,
>> 
>> Next step is to do some tcpdumping on the two VRs. Set some ping’s going 
> and
>> check:
>> 
>> Private NIC: tcpdump -i eth0 icmp
>> Public NIC: tcpdump -i eth2 icmp
>> 
>> That way you should be able to see how far your traffic reaches.
>> 
>> 
>> Regards,
>> Dag Sonstebo
>> Cloud Architect
>> ShapeBlue
>> 
>> On 23/02/2018, 16:17, "Andrei Mikhailovsky"  
> wrote:
>> 
>>Bump.
>>
>>Any ideas anyone? This issue is really annoying.
>>
>>Thanks
>>
>>
>> dag.sonst...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
>> @shapeblue
>>  
>> 
>> 
>> - Original Message -
>>> From: "Andrei Mikhailovsky" 
>>> To: "users" , "users" 
> 
>>> Sent: Wednesday, 21 February, 2018 22:10:25
>>> Subject: Re: VR routing issues in Advanced Mode
>>
>>> Dag,
>>> 
>>> 
>>> 
>>> 
>>> Yeah, we have egress traffic enabled. We also use VPCs on some of 
> the networks
>>> and they are also effected by this issue along with None VPC 
> networks.
>>> 
>>> 
>>> 
>>> 
>>> Any thoughts?
>>> 
>>> 
>>> 
>>> 
>>> Andrei Mikhailovsky
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> From: Dag Sonstebo
>>> 
>>> 
>>> Sent: Wednesday 21 February, 18:30
>>> 
>>> 
>>> Subject: Re: VR routing issues in Advanced Mode
>>> 
>>> 
>>> To: users@cloudstack.apache.org
>>> 
>>> 
>>> 
>

Re: VR routing issues in Advanced Mode

2018-02-27 Thread Andrei Mikhailovsky

Hi Dag,

Thanks, for your reply which I've missed earlier.

I have done some more digging around and would like to make some corrections to 
the problem at hand.

1. It seems that the problem only effects VPC networks within cloudstack. 
2. Networks which use non-VPC networking can talk to each other. 
3. VPC to VPC traffic is not working. 
4. VPC traffic can reach non-VPC network
5. non-VPC traffic can't reach VPC network

In all cases, if the icmp is allowed in the ACLs, all networks can ping each 
other. So, the traffic is being routed and reaches the virtual router.

Any advice? 

Thanks



- Original Message -
> From: "Dag Sonstebo" 
> To: "users" 
> Sent: Friday, 23 February, 2018 16:45:04
> Subject: Re: VR routing issues in Advanced Mode

> Hi Andrei,
> 
> Next step is to do some tcpdumping on the two VRs. Set some ping’s going and
> check:
> 
> Private NIC: tcpdump -i eth0 icmp
> Public NIC: tcpdump -i eth2 icmp
> 
> That way you should be able to see how far your traffic reaches.
> 
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> 
> On 23/02/2018, 16:17, "Andrei Mikhailovsky"  wrote:
> 
>Bump.
>
>Any ideas anyone? This issue is really annoying.
>
>Thanks
>
>
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" , "users" 
> 
>> Sent: Wednesday, 21 February, 2018 22:10:25
>> Subject: Re: VR routing issues in Advanced Mode
>
>> Dag,
>> 
>> 
>> 
>> 
>> Yeah, we have egress traffic enabled. We also use VPCs on some of the 
> networks
>> and they are also effected by this issue along with None VPC networks.
>> 
>> 
>> 
>> 
>> Any thoughts?
>> 
>> 
>> 
>> 
>> Andrei Mikhailovsky
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: Dag Sonstebo
>> 
>> 
>> Sent: Wednesday 21 February, 18:30
>> 
>> 
>> Subject: Re: VR routing issues in Advanced Mode
>> 
>> 
>> To: users@cloudstack.apache.org
>> 
>> 
>> 
>> 
>> 
>> 
>> Hi Andrei,
>> 
>> 
>> 
>> 
>> Understand. To get all the obvious things out the way – have you allowed 
> egress
>> traffic on the two networks (you mention ACLs which we only use on VPCs 
> and
>> basic networks)?
>> 
>> 
>> 
>> 
>> Regards,
>> 
>> 
>> Dag Sonstebo
>> 
>> 
>> Cloud Architect
>> 
>> 
>> ShapeBlue
>> 
>> 
>> 
>> 
>> On 21/02/2018, 14:51, "Andrei Mikhailovsky"  
> wrote:
>> 
>> 
>> 
>> 
>> Hi Dag,
>> 
>> 
>> 
>> 
>> Please see my comments below:
>> 
>> 
>> 
>> 
>>> Hi Andrei,
>> 
>> 
>>> 
>> 
>> 
>>> You’re confusing the matter with your masking of public IP ranges. You 
> said you
>> 
>> 
>>> have “2 x Public IP ranges with /26 netmask” – but since you are 
> masking them
>> 
>> 
>>> out with X’s your email doesn’t make sense. If all the X’s are the same 
> then a
>> 
>> 
>>> .10 and a .20 IP address would be on the same /26 network.
>> 
>> 
>>> 
>> 
>> 
>>> I will assume that you do in fact have 2 x 26-bit networks, e.g.:
>> 
>> 
>>> 
>> 
>> 
>>> 192.168.0.0/26 – with default gateway 192.168.0.1
>> 
>> 
>>> 192.168.0.64/26 – with default gateway 192.168.0.65
>> 
>> 
>>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> That is correct. I do have two separate /26 networks similar to what 
> you've
>> described above. However, one /26 is used for direct public IP service
>> offering, where VRs are not involved in networking at all and the second 
> /26 is
>> used for the service offer

Re: VR routing issues in Advanced Mode

2018-02-23 Thread Andrei Mikhailovsky

Bump.

Any ideas anyone? This issue is really annoying.

Thanks

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" , "users" 
> 
> Sent: Wednesday, 21 February, 2018 22:10:25
> Subject: Re: VR routing issues in Advanced Mode

> Dag,
> 
> 
> 
> 
> Yeah, we have egress traffic enabled. We also use VPCs on some of the networks
> and they are also effected by this issue along with None VPC networks.
> 
> 
> 
> 
> Any thoughts?
> 
> 
> 
> 
> Andrei Mikhailovsky
> 
> 
> 
> 
> 
> 
> 
> From: Dag Sonstebo
> 
> 
> Sent: Wednesday 21 February, 18:30
> 
> 
> Subject: Re: VR routing issues in Advanced Mode
> 
> 
> To: users@cloudstack.apache.org
> 
> 
> 
> 
> 
> 
> Hi Andrei,
> 
> 
> 
> 
> Understand. To get all the obvious things out the way – have you allowed 
> egress
> traffic on the two networks (you mention ACLs which we only use on VPCs and
> basic networks)?
> 
> 
> 
> 
> Regards,
> 
> 
> Dag Sonstebo
> 
> 
> Cloud Architect
> 
> 
> ShapeBlue
> 
> 
> 
> 
> On 21/02/2018, 14:51, "Andrei Mikhailovsky"  wrote:
> 
> 
> 
> 
> Hi Dag,
> 
> 
> 
> 
> Please see my comments below:
> 
> 
> 
> 
>> Hi Andrei,
> 
> 
>> 
> 
> 
>> You’re confusing the matter with your masking of public IP ranges. You said 
>> you
> 
> 
>> have “2 x Public IP ranges with /26 netmask” – but since you are masking them
> 
> 
>> out with X’s your email doesn’t make sense. If all the X’s are the same then 
>> a
> 
> 
>> .10 and a .20 IP address would be on the same /26 network.
> 
> 
>> 
> 
> 
>> I will assume that you do in fact have 2 x 26-bit networks, e.g.:
> 
> 
>> 
> 
> 
>> 192.168.0.0/26 – with default gateway 192.168.0.1
> 
> 
>> 192.168.0.64/26 – with default gateway 192.168.0.65
> 
> 
>> 
> 
> 
> 
> 
> 
> 
> That is correct. I do have two separate /26 networks similar to what you've
> described above. However, one /26 is used for direct public IP service
> offering, where VRs are not involved in networking at all and the second /26 
> is
> used for the service offering where VRs are used to provide the networking
> function.
> 
> 
> 
> 
> 
> 
>> If your two guest networks have VRs on separate public IP ranges you will 
>> have
> 
> 
>> e.g.
> 
> 
>> 
> 
> 
>> VR1: public IP 192.168.0.10
> 
> 
>> VR2: public IP 192.168.0.70
> 
> 
>> 
> 
> 
> 
> 
> 
> 
> Nope, the guest vms with VRs that can't talk to each other are on the same /26
> network. (in your example that would be on the same 192.168.0.0/26)
> 
> 
> 
> 
> 
> 
>> For a VM hosted behind VR1 to reach a service NAT’ed on VR2 you need to set 
>> up
> 
> 
>> routing and possibly firewalling on the data centre device which handles the
> 
> 
>> default gateway for the two networks – i.e. the top of rack switch or router
> 
> 
>> which hosts default gateways 192.168.0.1 and 192.168.0.65. The fact that you
> 
> 
>> can reach services on both networks from outside this range makes sense.
> 
> 
>> 
> 
> 
> 
> 
> These has been set up and vms between separate /26 networks CAN talk to each
> other. The VMs on the same /26 network that doesn't use the VR service can 
> also
> talk to each other. The problem with VMs on the same /26 that use VRs can't
> talk to each other using their public IP addresses.
> 
> 
> 
> 
> 
> 
>> So once you have fixed this you will have VM1 > VR1 > DC_SWITCH_OR_ROUTER > 
>> VR2
> 
> 
>> > VM2.
> 
> 
>> 
> 
> 
>> 
> 
> 
>> Regards,
> 
> 
>> Dag Sonstebo
> 
> 
>> Cloud Architect
> 
> 
>> ShapeBlue
> 
> 
>> 
> 
> 
>> On 21/02/2018, 12:27, "Andrei Mikhailovsky"  
>> wrote:
> 
> 
>> 
> 
> 
>> Hello
> 
> 
>> 
> 
> 
>> Could someone help me to identify the routing issues that we have. The 
>> problem
> 
> 
>> is the traffic from different guest networks can not reach each other via the
> 
> 
>> public IPs.
> 
> 
>> 
> 
> 
>> Here is my ACS setup:
> 
> 
>> ACS 4.9.3.0 (both management and agents)
> 
> 
>> KVM Hypervisor based on Ubuntu 16.04
> 
> 
>> Ceph as primary storage. NFS as secondary storage
> 
> 
>> Advance

Re: VR routing issues in Advanced Mode

2018-02-21 Thread Andrei Mikhailovsky

Dag,




Yeah, we have egress traffic enabled. We also use VPCs on some of the networks 
and they are also effected by this issue along with None VPC networks.




Any thoughts?




Andrei Mikhailovsky







From: Dag Sonstebo


Sent: Wednesday 21 February, 18:30


Subject: Re: VR routing issues in Advanced Mode


To: users@cloudstack.apache.org






Hi Andrei,




Understand. To get all the obvious things out the way – have you allowed egress 
traffic on the two networks (you mention ACLs which we only use on VPCs and 
basic networks)?




Regards,


Dag Sonstebo


Cloud Architect


ShapeBlue




On 21/02/2018, 14:51, "Andrei Mikhailovsky"  wrote:




Hi Dag,




Please see my comments below:




> Hi Andrei,


> 


> You’re confusing the matter with your masking of public IP ranges. You said 
> you


> have “2 x Public IP ranges with /26 netmask” – but since you are masking them


> out with X’s your email doesn’t make sense. If all the X’s are the same then a


> .10 and a .20 IP address would be on the same /26 network.


> 


> I will assume that you do in fact have 2 x 26-bit networks, e.g.:


> 


> 192.168.0.0/26 – with default gateway 192.168.0.1


> 192.168.0.64/26 – with default gateway 192.168.0.65


> 






That is correct. I do have two separate /26 networks similar to what you've 
described above. However, one /26 is used for direct public IP service 
offering, where VRs are not involved in networking at all and the second /26 is 
used for the service offering where VRs are used to provide the networking 
function.






> If your two guest networks have VRs on separate public IP ranges you will have


> e.g.


> 


> VR1: public IP 192.168.0.10


> VR2: public IP 192.168.0.70


> 






Nope, the guest vms with VRs that can't talk to each other are on the same /26 
network. (in your example that would be on the same 192.168.0.0/26)






> For a VM hosted behind VR1 to reach a service NAT’ed on VR2 you need to set up


> routing and possibly firewalling on the data centre device which handles the


> default gateway for the two networks – i.e. the top of rack switch or router


> which hosts default gateways 192.168.0.1 and 192.168.0.65. The fact that you


> can reach services on both networks from outside this range makes sense.


> 




These has been set up and vms between separate /26 networks CAN talk to each 
other. The VMs on the same /26 network that doesn't use the VR service can also 
talk to each other. The problem with VMs on the same /26 that use VRs can't 
talk to each other using their public IP addresses.






> So once you have fixed this you will have VM1 > VR1 > DC_SWITCH_OR_ROUTER > 
> VR2


> > VM2.


> 


> 


> Regards,


> Dag Sonstebo


> Cloud Architect


> ShapeBlue


> 


> On 21/02/2018, 12:27, "Andrei Mikhailovsky"  wrote:


> 


> Hello


> 


> Could someone help me to identify the routing issues that we have. The problem


> is the traffic from different guest networks can not reach each other via the


> public IPs.


> 


> Here is my ACS setup:


> ACS 4.9.3.0 (both management and agents)


> KVM Hypervisor based on Ubuntu 16.04


> Ceph as primary storage. NFS as secondary storage


> Advanced Networking with vlan separation


> 2 x Public IP ranges with /26 netmask.


> 


> 


> 


> Here is an example when routing DOES NOT work:


> 


> Case 1 - Advanced Networking, vlan separation, VRs route all traffic and 
> provide


> all networking services (dhcp, fw, port forwarding, load balancing, etc)


> 


> Guest Network 1:


> 


> Public IP: XXX.XXX.XXX.10/26


> Private IP range: 10.1.1.0/24


> guest vm1 IP: 10.1.1.100/24


> 


> Guest Network 2:


> Public IP: XXX.XXX.XXX.20/26


> Private IP range: 10.1.1.0/24


> guest vm2 IP: 10.1.1.200/24


> 


> 


> I've created ACLs on both guest networks to allow traffic from 0.0.0.0/0 on 
> port


> 80. I've created the port forwarding rules to forward port 80 from public


> XXX.XXX.XXX.10 and XXX.XXX.XXX.XXX.20 onto 10.1.1.100 and 10.1.1.200


> respectively.


> 


> This setup works perfectly well when I am initiating the connections from


> outside of our CloudStack. However, vm2 can't reach vm1 on port 80 using the


> public IP XXX.XXX.XXX.10 and vice versa, vm1 can't reach vm2 on public IP


> XXX.XXX.XXX.20.


> 


> 


> 


> 


> Here is an example when the routing DOES work:


> 


> Case 2 - Advanced Networking, vlan separation, VRs are not used. Public IPs 
> are


> given directly to a guest vm


> 


> Guest Network 1:


> 


> guest vm1 Public IP: XXX.XXX.XXX.100/26


> 


> Guest Network 2:


> 


> guest vm2 Public IP:

Re: VR routing issues in Advanced Mode

2018-02-21 Thread Andrei Mikhailovsky

Andrija, 

the vms are trying to reach each other using the public IP addresses, not the 
private addresses.

Cheers

Andrei
- Original Message -
> From: "Andrija Panic" 
> To: "users" 
> Sent: Wednesday, 21 February, 2018 12:48:57
> Subject: Re: VR routing issues in Advanced Mode

> Hi Andrei,
> 
> you dont have typo in your input, right ?
> 
> if I read this correctly, the case that don't work for you is as following:
> 
> VR1 ( XXX.XXX.XXX.10/26) --> Guest1 Network / VM  10.1.1.100/24
> 
> VR2 ( XXX.XXX.XXX.20/26)-- Guest1 Network / VM  10.1.1.200/24
> 
> Is this correct ?
> 
> If so, it's normal that VM1 can reach VM2 via following path VM1-->VR1 --->
> VR2 --> VM2:80 because both VM1 and VM2 are on the "same" subnet (
> 10.1.1.0/24) so the VM1 decides to BROADCAST traffic over "switch" to reach
> IP in the same network (VM2 IP 10.1.1.0). If this IP would be in the i.e.
> 10.2.1.0 netowrk, then VM would decided to send packet to it's default gtw
> (VR) and than things would work fine.
> 
> Otherwise, if this is single VR, you actually can not even create 2
> networks with same subnet since both are (per your input, if not typo)
> 10.1.1.0/24 subnets
> 
> ?
> 
> Cheers
> Andrija
> 
> On 21 February 2018 at 13:27, Andrei Mikhailovsky > wrote:
> 
>> Hello
>>
>> Could someone help me to identify the routing issues that we have. The
>> problem is the traffic from different guest networks can not reach each
>> other via the public IPs.
>>
>> Here is my ACS setup:
>> ACS 4.9.3.0 (both management and agents)
>> KVM Hypervisor based on Ubuntu 16.04
>> Ceph as primary storage. NFS as secondary storage
>> Advanced Networking with vlan separation
>> 2 x Public IP ranges with /26 netmask.
>>
>>
>>
>> Here is an example when routing DOES NOT work:
>>
>> Case 1 - Advanced Networking, vlan separation, VRs route all traffic and
>> provide all networking services (dhcp, fw, port forwarding, load balancing,
>> etc)
>>
>> Guest Network 1:
>>
>> Public IP: XXX.XXX.XXX.10/26
>> Private IP range: 10.1.1.0/24
>> guest vm1 IP: 10.1.1.100/24
>>
>> Guest Network 2:
>> Public IP: XXX.XXX.XXX.20/26
>> Private IP range: 10.1.1.0/24
>> guest vm2 IP: 10.1.1.200/24
>>
>>
>> I've created ACLs on both guest networks to allow traffic from 0.0.0.0/0
>> on port 80. I've created the port forwarding rules to forward port 80 from
>> public XXX.XXX.XXX.10 and XXX.XXX.XXX.XXX.20 onto 10.1.1.100 and 10.1.1.200
>> respectively.
>>
>> This setup works perfectly well when I am initiating the connections from
>> outside of our CloudStack. However, vm2 can't reach vm1 on port 80 using
>> the public IP XXX.XXX.XXX.10 and vice versa, vm1 can't reach vm2 on public
>> IP XXX.XXX.XXX.20.
>>
>>
>>
>>
>> Here is an example when the routing DOES work:
>>
>> Case 2 - Advanced Networking, vlan separation, VRs are not used. Public
>> IPs are given directly to a guest vm
>>
>> Guest Network 1:
>>
>> guest vm1 Public IP: XXX.XXX.XXX.100/26
>>
>> Guest Network 2:
>>
>> guest vm2 Public IP: XXX.XXX.XXX.110/26
>>
>> In the Case 2, the guest vm has a public IP address directly assigned to
>> its network interface. VRs are not used for this networking. Each guest has
>> a fw rule to allow incoming traffic on port 80 from 0.0.0.0/0. Both vm1
>> and vm2 can access each other on port 80. Also, vms from Case 1 above can
>> access port 80 on vms from Case 2, similarly, vms from Case 2 can access
>> port 80 on vms from Case 1.
>>
>>
>>
>> So, it seems that the rules on the VR in Case 1 do not allow traffic that
>> originates from other VRs within the same public network range. The trace
>> route shows the last hop being the VR's private IP address. How do I change
>> that behaviour and fix the networking issue?
>>
>> Thanks
>>
>> Andrei
>>
> 
> 
> 
> --
> 
> Andrija Panić

Re: VR routing issues in Advanced Mode

2018-02-21 Thread Andrei Mikhailovsky

Hi Dag,

Please see my comments below:

> Hi Andrei,
> 
> You’re confusing the matter with your masking of public IP ranges. You said 
> you
> have “2 x Public IP ranges with /26 netmask” – but since you are masking them
> out with X’s your email doesn’t make sense. If all the X’s are the same then a
> .10 and a .20 IP address would be on the same /26 network.
> 
> I will assume that you do in fact have 2 x 26-bit networks, e.g.:
> 
> 192.168.0.0/26 – with default gateway 192.168.0.1
> 192.168.0.64/26 – with default gateway 192.168.0.65
> 


That is correct. I do have two separate /26 networks similar to what you've 
described above. However, one /26 is used for direct public IP service 
offering, where VRs are not involved in networking at all and the second /26 is 
used for the service offering where VRs are used to provide the networking 
function.


> If your two guest networks have VRs on separate public IP ranges you will have
> e.g.
> 
> VR1: public IP 192.168.0.10
> VR2: public IP 192.168.0.70
> 


Nope, the guest vms with VRs that can't talk to each other are on the same /26 
network. (in your example that would be on the same 192.168.0.0/26)


> For a VM hosted behind VR1 to reach a service NAT’ed on VR2 you need to set up
> routing and possibly firewalling on the data centre device which handles the
> default gateway for the two networks – i.e. the top of rack switch or router
> which hosts default gateways  192.168.0.1 and 192.168.0.65. The fact that you
> can reach services on both networks from outside this range makes sense.
> 

These has been set up and vms between separate /26 networks CAN talk to each 
other. The VMs on the same /26 network that doesn't use the VR service can also 
talk to each other. The problem with VMs on the same /26 that use VRs can't 
talk to each other using their public IP addresses.


> So once you have fixed this you will have VM1 > VR1 > DC_SWITCH_OR_ROUTER > 
> VR2
> > VM2.
> 
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> 
> On 21/02/2018, 12:27, "Andrei Mikhailovsky"  wrote:
> 
>Hello
>
>Could someone help me to identify the routing issues that we have. The 
> problem
>is the traffic from different guest networks can not reach each other via 
> the
>public IPs.
>
>Here is my ACS setup:
>ACS 4.9.3.0 (both management and agents)
>KVM Hypervisor based on Ubuntu 16.04
>Ceph as primary storage. NFS as secondary storage
>Advanced Networking with vlan separation
>2 x Public IP ranges with /26 netmask.
>
>
>
>Here is an example when routing DOES NOT work:
>
>Case 1 - Advanced Networking, vlan separation, VRs route all traffic and 
> provide
>all networking services (dhcp, fw, port forwarding, load balancing, etc)
>
>Guest Network 1:
>
>Public IP: XXX.XXX.XXX.10/26
>Private IP range: 10.1.1.0/24
>guest vm1 IP: 10.1.1.100/24
>
>Guest Network 2:
>Public IP: XXX.XXX.XXX.20/26
>Private IP range: 10.1.1.0/24
>guest vm2 IP: 10.1.1.200/24
>
>
>I've created ACLs on both guest networks to allow traffic from 0.0.0.0/0 
> on port
>80. I've created the port forwarding rules to forward port 80 from public
>XXX.XXX.XXX.10 and XXX.XXX.XXX.XXX.20 onto 10.1.1.100 and 10.1.1.200
>respectively.
>
>This setup works perfectly well when I am initiating the connections from
>outside of our CloudStack. However, vm2 can't reach vm1 on port 80 using 
> the
>public IP XXX.XXX.XXX.10 and vice versa, vm1 can't reach vm2 on public IP
>XXX.XXX.XXX.20.
>
>
>
>
>Here is an example when the routing DOES work:
>
>Case 2 - Advanced Networking, vlan separation, VRs are not used. Public 
> IPs are
>given directly to a guest vm
>
>Guest Network 1:
>
>guest vm1 Public IP: XXX.XXX.XXX.100/26
>
>Guest Network 2:
>
>guest vm2 Public IP: XXX.XXX.XXX.110/26
>
>In the Case 2, the guest vm has a public IP address directly assigned to 
> its
>network interface. VRs are not used for this networking. Each guest has a 
> fw
>rule to allow incoming traffic on port 80 from 0.0.0.0/0. Both vm1 and vm2 
> can
>access each other on port 80. Also, vms from Case 1 above can access port 
> 80 on
>vms from Case 2, similarly, vms from Case 2 can access port 80 on vms from 
> Case
>1.
>
>
>
>So, it seems that the rules on the VR in Case 1 do not allow traffic that
>originates from other VRs within the same public network range. The trace 
> route
>shows the last hop being the VR's private IP address. How do I change that
>behaviour and fix the networking issue?
>
>Thanks
>
>Andrei
>
> 
> 
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue

VR routing issues in Advanced Mode

2018-02-21 Thread Andrei Mikhailovsky

Hello 

Could someone help me to identify the routing issues that we have. The problem 
is the traffic from different guest networks can not reach each other via the 
public IPs. 

Here is my ACS setup: 
ACS 4.9.3.0 (both management and agents) 
KVM Hypervisor based on Ubuntu 16.04 
Ceph as primary storage. NFS as secondary storage 
Advanced Networking with vlan separation 
2 x Public IP ranges with /26 netmask. 



Here is an example when routing DOES NOT work: 

Case 1 - Advanced Networking, vlan separation, VRs route all traffic and 
provide all networking services (dhcp, fw, port forwarding, load balancing, 
etc) 

Guest Network 1: 

Public IP: XXX.XXX.XXX.10/26 
Private IP range: 10.1.1.0/24 
guest vm1 IP: 10.1.1.100/24 

Guest Network 2: 
Public IP: XXX.XXX.XXX.20/26 
Private IP range: 10.1.1.0/24 
guest vm2 IP: 10.1.1.200/24 


I've created ACLs on both guest networks to allow traffic from 0.0.0.0/0 on 
port 80. I've created the port forwarding rules to forward port 80 from public 
XXX.XXX.XXX.10 and XXX.XXX.XXX.XXX.20 onto 10.1.1.100 and 10.1.1.200 
respectively. 

This setup works perfectly well when I am initiating the connections from 
outside of our CloudStack. However, vm2 can't reach vm1 on port 80 using the 
public IP XXX.XXX.XXX.10 and vice versa, vm1 can't reach vm2 on public IP 
XXX.XXX.XXX.20. 




Here is an example when the routing DOES work: 

Case 2 - Advanced Networking, vlan separation, VRs are not used. Public IPs are 
given directly to a guest vm 

Guest Network 1: 

guest vm1 Public IP: XXX.XXX.XXX.100/26 

Guest Network 2: 

guest vm2 Public IP: XXX.XXX.XXX.110/26 

In the Case 2, the guest vm has a public IP address directly assigned to its 
network interface. VRs are not used for this networking. Each guest has a fw 
rule to allow incoming traffic on port 80 from 0.0.0.0/0. Both vm1 and vm2 can 
access each other on port 80. Also, vms from Case 1 above can access port 80 on 
vms from Case 2, similarly, vms from Case 2 can access port 80 on vms from Case 
1. 



So, it seems that the rules on the VR in Case 1 do not allow traffic that 
originates from other VRs within the same public network range. The trace route 
shows the last hop being the VR's private IP address. How do I change that 
behaviour and fix the networking issue? 

Thanks 

Andrei

Re: kvm/ceph volume snapshots cause other jobs to fail

2017-12-24 Thread Andrei Mikhailovsky

Hi Rohit,

the issue that I am facing is with every single volume. I have only noticed it 
on 4.9.3.0 and I don't think it was present in the previous releases. At least 
I've not seen it before.

It would be challanging to downgrade a live environment at the moment. Perhaps 
I can later upgrade to 4.10.x when the next point release is out. By the way, 
any ideal when the next point release of 4.10 is going out?

Thanks

Andrei

- Original Message -
> From: "Rohit Yadav" 
> To: "users" 
> Sent: Friday, 22 December, 2017 11:11:45
> Subject: Re: kvm/ceph volume snapshots cause other jobs to fail

> Hi Andrei,
> 
> 
> I think it's because snapshots jobs block the job-queue for other items for 
> the
> KVM agent (host), other jobs don't get the opportunity to finish. Are you
> facing this with a particular VM/volume or in general with any VM/host?
> 
> 
> If you think the issue is related to the CloudStack version, you may downgrade
> to 4.9.2.0 and retry. Alternatively, compare against a test 4.9.2.0 and 
> 4.9.3.0
> environment and help report a ticket/bug with more details. Thanks.
> 
> 
> Regards,
> 
> Rohit Yadav
> 
> Software Architect, ShapeBlue
> 
> http://rohityadav.cloud | @rhtyd
> 
> 
>  __?.o/  Apache CloudStack
> (    )#     The best IaaS cloud platform
> (___(_)   https://cloudstack.apache.org
> 
> 
> 
> From: Andrei Mikhailovsky 
> Sent: Thursday, December 21, 2017 6:11:22 PM
> To: users
> Subject: kvm/ceph volume snapshots cause other jobs to fail
> 
> Hello everyone,
> 
> I have noticed after the recent upgrade to 4.9.3.0 I started having a problem.
> While the volume snapshots (kvm with ceph primary storage) take place, I am
> unable to do most things within ACS. For example, stopping / starting /
> migrating vms simply time out. I have done some testing and this seems to be
> related to the volume snapshots. If I wait for the snapshot to finish, or if I
> manually kill the qemu-img process on the host server, the operations resume 
> to
> normal. VMs operations can work just as before. However, as soon as the
> snapshot schedule kicks in the next snapshot job, ACS becomes unfunctional
> again.
> 
> Could you please let me know if there is a workaround for this bug?
> 
> thanks
> 
> Andrei
> 
> rohit.ya...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue

kvm/ceph volume snapshots cause other jobs to fail

2017-12-21 Thread Andrei Mikhailovsky

Hello everyone, 

I have noticed after the recent upgrade to 4.9.3.0 I started having a problem. 
While the volume snapshots (kvm with ceph primary storage) take place, I am 
unable to do most things within ACS. For example, stopping / starting / 
migrating vms simply time out. I have done some testing and this seems to be 
related to the volume snapshots. If I wait for the snapshot to finish, or if I 
manually kill the qemu-img process on the host server, the operations resume to 
normal. VMs operations can work just as before. However, as soon as the 
snapshot schedule kicks in the next snapshot job, ACS becomes unfunctional 
again. 

Could you please let me know if there is a workaround for this bug? 

thanks 

Andrei

Account notifications for snapshot job failures and other

2017-11-02 Thread Andrei Mikhailovsky

Hello, 

I have recently had a case for one of the cloudstack domains where they have 
ran out of secondary storage quota and were not notified of this. In addition, 
the more serious issue is that their snapshot schedules were failing without 
them knowing about it. As the cloudstack admin, I have also not been notified 
of both issues, even though I do receive notifications of host server failures, 
etc. from cloudstack. 

My question is that how do I enable notifications of serious failures, such as 
snapshot failures or quota overruns? This is a serious issue as one can imagine 
- a client is thinking that their snapshots are being executed just to find out 
that they were not at the time when the data has to be recovered. 

Thanks 
Andrei

Re: Unable to remove host server

2017-03-20 Thread Andrei Mikhailovsky

Hi Dag,

I understand that now, but a gui message stating this fact would be very 
helpful, similar to the one I've received when I tried to remove the host via 
the cloudmonkey. The gui didn't give me an error message, instead it showed me 
the spinning circle indefinitely, giving false hope that the job is being 
processed. This part of the GUI definitely a bug (or a broken workflow) and not 
a feature.

Cheers

Andrei

- Original Message -
> From: "Dag Sonstebo" 
> To: "users" 
> Sent: Monday, 20 March, 2017 12:08:13
> Subject: Re: Unable to remove host server

> Hi Andrei,
> 
> This isn’t a bug – this is how CloudStack works. You can’t delete a host 
> unless
> it’s in maintenance mode (even though the delete button shows). The disable
> function is different – this simply removes the host from possible hosts when
> starting a new VM etc., i.e. you would use this function when you don’t want
> any more workload assigned to that host. As a result putting the host into
> maintenance mode evacuates all VMs, putting into disabled state leaves all VMs
> running.
> 
> Regards,
> Dag Sonstebo
> Cloud Architect
> ShapeBlue
> 
> On 20/03/2017, 11:55, "Andrei Mikhailovsky"  wrote:
> 
>Hi Gabriel,
>
>I have found out what was the problem and managed to delete the host from 
> the
>GUI and the cloudmonkey helped me with it. There seems to be a bug in the 
> GUI
>version 4.9.2.0 and perhaps earlier.
>
>The way that I tried to delete the host before was to Disable the host 
> first and
>click on the Delete (X) icon. This has presented me with a message window
>asking to confirm the deletion and an option to use force delete. I have 
> tried
>to delete with and without the force delete option checked, and both 
> options
>didn't work.
>
>However, when I tried to delete via the cloudmonkey, it has given me an 
> error
>message stating that I can only delete the host which is in the Maintenance
>Mode. There was no such message in the GUI and the Delete option becomes
>available after you switch the host into the Disabled mode.
>
>Thus, I have tried to delete the host by first making it in the 
> Maintenance Mode
>and pressing the Delete button.
>
>It seems that there is a logic bug in the ACS gui where it gives me the 
> Delete
>option after switching the host to the Disabled state, even though 
> according to
>the backend, you can't delete the host in that mode.
>
>Andrei
>
>
> dag.sonst...@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> - Original Message -
>> From: "Andrei Mikhailovsky" 
>> To: "users" 
>> Sent: Monday, 20 March, 2017 10:22:58
>> Subject: Re: Unable to remove host server
>
>> Hi
>> 
>> Gabriel,
>> 
>> I am using KVM and all servers are running on Ubuntu 14.04 LTS
>> 
>> I will try the cloudmonkey route and let you know.
>> 
>> Cheers
>    > 
>> - Original Message -
>>> From: "Rafael Weingärtner" 
>>> To: "users" 
>>> Sent: Friday, 17 March, 2017 15:50:59
>>> Subject: Re: Unable to remove host server
>> 
>>> Could you try to remove the host using cloudmonkey?
>>> What hypervisor are you using?
>>> 
>>> On Fri, Mar 17, 2017 at 11:49 AM, Andrei Mikhailovsky <
>>> and...@arhont.com.invalid> wrote:
>>> 
>>>> Hi,
>>>>
>>>> There is absolutely nothing related to host removal on the management
>>>> server end. So, not even a job registration for the removal. I have 
> tried
>>>> to use several browsers to execute this. Nothing helps.
>>>>
>>>> Thus I thought to change db to indicate that the host have been 
> removed.
>>>> But what tables do I need to modify to remove the host properly and 
> cleanly?
>>>>
>>>> It used to work for me prior to upgrading to 4.9.x.x.
>>>>
>>>> Andrei
>>>>
>>>> - Original Message -
>>>> > From: "Gabriel Beims Bräscher" 
>>>> > To: "users" 
>>>> > Sent: Wednesday, 8 March, 2017 13:29:13
>>>> > Subject: Re: Unable to remove host server
>>>>
>&g

Re: Unable to remove host server

2017-03-20 Thread Andrei Mikhailovsky

Hi Gabriel,

I have found out what was the problem and managed to delete the host from the 
GUI and the cloudmonkey helped me with it. There seems to be a bug in the GUI 
version 4.9.2.0 and perhaps earlier. 

The way that I tried to delete the host before was to Disable the host first 
and click on the Delete (X) icon. This has presented me with a message window 
asking to confirm the deletion and an option to use force delete. I have tried 
to delete with and without the force delete option checked, and both options 
didn't work.

However, when I tried to delete via the cloudmonkey, it has given me an error 
message stating that I can only delete the host which is in the Maintenance 
Mode. There was no such message in the GUI and the Delete option becomes 
available after you switch the host into the Disabled mode.

Thus, I have tried to delete the host by first making it in the Maintenance 
Mode and pressing the Delete button.

It seems that there is a logic bug in the ACS gui where it gives me the Delete 
option after switching the host to the Disabled state, even though according to 
the backend, you can't delete the host in that mode.

Andrei

- Original Message -
> From: "Andrei Mikhailovsky" 
> To: "users" 
> Sent: Monday, 20 March, 2017 10:22:58
> Subject: Re: Unable to remove host server

> Hi
> 
> Gabriel,
> 
> I am using KVM and all servers are running on Ubuntu 14.04 LTS
> 
> I will try the cloudmonkey route and let you know.
> 
> Cheers
> 
> - Original Message -
>> From: "Rafael Weingärtner" 
>> To: "users" 
>> Sent: Friday, 17 March, 2017 15:50:59
>> Subject: Re: Unable to remove host server
> 
>> Could you try to remove the host using cloudmonkey?
>> What hypervisor are you using?
>> 
>> On Fri, Mar 17, 2017 at 11:49 AM, Andrei Mikhailovsky <
>> and...@arhont.com.invalid> wrote:
>> 
>>> Hi,
>>>
>>> There is absolutely nothing related to host removal on the management
>>> server end. So, not even a job registration for the removal. I have tried
>>> to use several browsers to execute this. Nothing helps.
>>>
>>> Thus I thought to change db to indicate that the host have been removed.
>>> But what tables do I need to modify to remove the host properly and cleanly?
>>>
>>> It used to work for me prior to upgrading to 4.9.x.x.
>>>
>>> Andrei
>>>
>>> - Original Message -
>>> > From: "Gabriel Beims Bräscher" 
>>> > To: "users" 
>>> > Sent: Wednesday, 8 March, 2017 13:29:13
>>> > Subject: Re: Unable to remove host server
>>>
>>> > Hi Andrei,
>>> >
>>> > I would avoid manual changes in the DB.
>>> > Have you checked the Apache CloudStack log
>>> > (/var/log/cloudstack/management/management-server.log)?
>>> >
>>> > Please, share your log so we can help you with the troubleshooting.
>>> >
>>> > Cheers,
>>> > Gabriel.
>>> >
>>> > 2017-03-08 9:48 GMT-03:00 Andrei Mikhailovsky >> >:
>>> >
>>> >> Hello everyone,
>>> >>
>>> >> I am running ACS 4.9.0.2 on Ubuntu 14.04 server. I have tried to remove
>>> >> one of the host servers from the cluster, but I am not able to do so.
>>> After
>>> >> pressing the Remove button, I can see the spinning circle, but nothing
>>> is
>>> >> happening. I've also tried to do it with the Force remove option ticked
>>> >> without much luck either. Tried to do it on a few browsers, still the
>>> same
>>> >> issue. Nothing is happening in the management server logs. No errors,
>>> >> exceptions ,etc.
>>> >>
>>> >> How can I remove the host server? Can I simply marked it as removed in
>>> the
>>> >> db? If so, apart from the host table, do I need to make changes to any
>>> >> other tables?
>>> >>
>>> >> Thanks
>>>
>> 
>> 
>> 
>> --
> > Rafael Weingärtner

Re: Unable to remove host server

2017-03-20 Thread Andrei Mikhailovsky

Hi 

Gabriel,

I am using KVM and all servers are running on Ubuntu 14.04 LTS

I will try the cloudmonkey route and let you know.

Cheers

- Original Message -
> From: "Rafael Weingärtner" 
> To: "users" 
> Sent: Friday, 17 March, 2017 15:50:59
> Subject: Re: Unable to remove host server

> Could you try to remove the host using cloudmonkey?
> What hypervisor are you using?
> 
> On Fri, Mar 17, 2017 at 11:49 AM, Andrei Mikhailovsky <
> and...@arhont.com.invalid> wrote:
> 
>> Hi,
>>
>> There is absolutely nothing related to host removal on the management
>> server end. So, not even a job registration for the removal. I have tried
>> to use several browsers to execute this. Nothing helps.
>>
>> Thus I thought to change db to indicate that the host have been removed.
>> But what tables do I need to modify to remove the host properly and cleanly?
>>
>> It used to work for me prior to upgrading to 4.9.x.x.
>>
>> Andrei
>>
>> - Original Message -
>> > From: "Gabriel Beims Bräscher" 
>> > To: "users" 
>> > Sent: Wednesday, 8 March, 2017 13:29:13
>> > Subject: Re: Unable to remove host server
>>
>> > Hi Andrei,
>> >
>> > I would avoid manual changes in the DB.
>> > Have you checked the Apache CloudStack log
>> > (/var/log/cloudstack/management/management-server.log)?
>> >
>> > Please, share your log so we can help you with the troubleshooting.
>> >
>> > Cheers,
>> > Gabriel.
>> >
>> > 2017-03-08 9:48 GMT-03:00 Andrei Mikhailovsky > >:
>> >
>> >> Hello everyone,
>> >>
>> >> I am running ACS 4.9.0.2 on Ubuntu 14.04 server. I have tried to remove
>> >> one of the host servers from the cluster, but I am not able to do so.
>> After
>> >> pressing the Remove button, I can see the spinning circle, but nothing
>> is
>> >> happening. I've also tried to do it with the Force remove option ticked
>> >> without much luck either. Tried to do it on a few browsers, still the
>> same
>> >> issue. Nothing is happening in the management server logs. No errors,
>> >> exceptions ,etc.
>> >>
>> >> How can I remove the host server? Can I simply marked it as removed in
>> the
>> >> db? If so, apart from the host table, do I need to make changes to any
>> >> other tables?
>> >>
>> >> Thanks
>>
> 
> 
> 
> --
> Rafael Weingärtner

Re: Unable to remove host server

2017-03-17 Thread Andrei Mikhailovsky

Hi,

There is absolutely nothing related to host removal on the management server 
end. So, not even a job registration for the removal. I have tried to use 
several browsers to execute this. Nothing helps. 

Thus I thought to change db to indicate that the host have been removed. But 
what tables do I need to modify to remove the host properly and cleanly?

It used to work for me prior to upgrading to 4.9.x.x.

Andrei

- Original Message -
> From: "Gabriel Beims Bräscher" 
> To: "users" 
> Sent: Wednesday, 8 March, 2017 13:29:13
> Subject: Re: Unable to remove host server

> Hi Andrei,
> 
> I would avoid manual changes in the DB.
> Have you checked the Apache CloudStack log
> (/var/log/cloudstack/management/management-server.log)?
> 
> Please, share your log so we can help you with the troubleshooting.
> 
> Cheers,
> Gabriel.
> 
> 2017-03-08 9:48 GMT-03:00 Andrei Mikhailovsky :
> 
>> Hello everyone,
>>
>> I am running ACS 4.9.0.2 on Ubuntu 14.04 server. I have tried to remove
>> one of the host servers from the cluster, but I am not able to do so. After
>> pressing the Remove button, I can see the spinning circle, but nothing is
>> happening. I've also tried to do it with the Force remove option ticked
>> without much luck either. Tried to do it on a few browsers, still the same
>> issue. Nothing is happening in the management server logs. No errors,
>> exceptions ,etc.
>>
>> How can I remove the host server? Can I simply marked it as removed in the
>> db? If so, apart from the host table, do I need to make changes to any
>> other tables?
>>
>> Thanks

Unable to remove host server

2017-03-08 Thread Andrei Mikhailovsky

Hello everyone, 

I am running ACS 4.9.0.2 on Ubuntu 14.04 server. I have tried to remove one of 
the host servers from the cluster, but I am not able to do so. After pressing 
the Remove button, I can see the spinning circle, but nothing is happening. 
I've also tried to do it with the Force remove option ticked without much luck 
either. Tried to do it on a few browsers, still the same issue. Nothing is 
happening in the management server logs. No errors, exceptions ,etc. 

How can I remove the host server? Can I simply marked it as removed in the db? 
If so, apart from the host table, do I need to make changes to any other 
tables? 

Thanks

Re: KVM Live VM Snapshots

2016-12-27 Thread Andrei Mikhailovsky

Hi Marco,

While i totally agree with you on the design and resiliance to failures of 
newly developed apps, I disagree with you on the necessity to have a vm 
snapshot feature with kvm. and let me explain why.

At the moment, I am running a small acs + kvm + ceph cluster with about a 
hundred or so vms. Currently, it is a huge problem for me to perform vm 
snapshots in a reasonably good state, especially if vm has multiple volumes. 
The problem with ACS implementation (which is not present in openstack for 
example) is that there is no way I can keep the snapshots on the primary 
storage (using fast interconnect). The snapshots are copied over a slow link to 
the nfs secondary storage. It takes ages to perform any operation with the 
secondary storage due to its silly design (even though I see the reason for its 
existence in some cases). Now, consider that I have to take a daily snapshot of 
every volume and some volumes have to snapped on an hourly basis. This totally 
overloads the network + nfs storage. Imagine that you have to recover a few 
volumes from snapshots or create new volumes or templates from them while the 
daily/hourly snapshot cycle is taking place. You will have to wait hours before 
you get anywhere even on smaller size volumes.

However, when I was testing the XenServer setup with the VM snapshot 
capability, the snapshot creation and roll back was pretty quick. and it worked 
for all disk volumes in that vm. I don't remember seeing much network traffic 
or load on the nfs server either. This feature is a must imho regardless of the 
application design. Besides, as far as I remember, the KVM have that 
capability, so, why not implement it within ACS, just like it's done for vmware 
and xenserver?

Andrei

- Original Message -
> From: "Marc-Aurèle Brothier" 
> To: "users" 
> Sent: Monday, 19 December, 2016 08:31:26
> Subject: Re: KVM Live VM Snapshots

> Hi Asai,
> 
> In my opinion, doing a VM snapshot is making a step in the wrong direction.
> Your applications/system running inside your VMs should be designed to
> handle an OS crash. Then a new VM, freshly installed, should be able to get
> back into your application setup so that you have again an appropriate
> number of healthy nodes.
> 
> Marco
> 
> On Mon, Dec 19, 2016 at 4:34 AM, Asai  wrote:
> 
>> Greetings,
>>
>> Is it correct that currently there is no support in Cloudstack for KVM
>> live VM snapshots? I see that Volume snapshots are available for running
>> VMs, but that makes me wonder what everyone is doing to get a disaster
>> recovery backup of a KVM based VM?  I did ask this question a few weeks
>> back, but only one person responded with one solution, and I am really
>> trying to figure out what the best solutions are here.
>>
>> Has anybody seen this script? https://gist.github.com/ringe/
>> 334ee88ba5451c8f5732
>>
>> What is the community's opinion of scripts like this?  And also, big
>> question, if this script is good, why isn't it integrated into Cloudstack?
>>
>> Thanks,
>> Asai
>>

1 2 3 4 >

1 - 100 of 372 matches

Mail list logo