JoaoJandre commented on PR #8737: URL: https://github.com/apache/cloudstack/pull/8737#issuecomment-1999668050
> @JoaoJandre, could you make some tests listing and removing data with the proposed index? @GutoVeronezi, @vishesh92, @rohityadavcloud, @sureshanaparti, @mlsorensen, @kohrar I've made some tests listing and deleting data from the vm_stats table using the proposed index, I've also taken the liberty to reproduce the tests using a index on (`vm_id`) only. I've used the same assumptions from the tests that I've done here: https://github.com/apache/cloudstack/pull/8737#issuecomment-1992241242. First, I tested two listings: Select all the data on a VM; select the data on a VM between two specific dates. Without any indexes: ``` MariaDB [teste]> analyze select * from vm_stats where vm_id = 368; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40198741.00 | 100.00 | 0.10 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (24.905 sec) ``` ``` MariaDB [teste]> analyze select * from vm_stats where vm_id = 368 AND (timestamp between '2024-03-02' AND '2024-03-04'); +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40198741.00 | 100.00 | 0.03 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (24.528 sec) ``` Using the (`vm_id`,`timestamp`) index: ``` MariaDB [teste]> analyze select * from vm_stats where vm_id = 368; +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------+ | 1 | SIMPLE | vm_stats | ref | index2 | index2 | 8 | const | 87578 | 40866.00 | 100.00 | 100.00 | | +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------+ 1 row in set (3.687 sec) ``` ``` MariaDB [teste]> analyze select * from vm_stats where vm_id = 368 AND (timestamp between '2024-03-02' AND '2024-03-04'); +------+-------------+----------+-------+---------------+--------+---------+------+-------+----------+----------+------------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+-------+---------------+--------+---------+------+-------+----------+----------+------------+-----------------------+ | 1 | SIMPLE | vm_stats | range | index2 | index2 | 13 | NULL | 23986 | 11581.00 | 0.06 | 100.00 | Using index condition | +------+-------------+----------+-------+---------------+--------+---------+------+-------+----------+----------+------------+-----------------------+ 1 row in set (1.025 sec) ``` Using the (`vm_id`) index: ``` MariaDB [teste]> analyze select * from vm_stats where vm_id = 368; +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------+ | 1 | SIMPLE | vm_stats | ref | index1 | index1 | 8 | const | 87528 | 40866.00 | 100.00 | 100.00 | | +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------+ 1 row in set (3.381 sec) ``` ``` MariaDB [teste]> analyze select * from vm_stats where vm_id = 368 AND (timestamp between '2024-03-02' AND '2024-03-04'); +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ref | index1 | index1 | 8 | const | 87528 | 40866.00 | 100.00 | 28.34 | Using where | +------+-------------+----------+------+---------------+--------+---------+-------+-------+----------+----------+------------+-------------+ 1 row in set (3.365 sec) ``` Again, listing with indexes is much faster, we can see that the select with indexes are an order of magnitude faster then the select without an index. Furthermore, the (`vm_id`,`timestamp`) index is slighty faster than the (`vm_id`) index on the select with a condition on `timestamp`. Then, I did three delete tests: deleting one minute of data, deleting a whole day of data, and deleting a whole day of data with limit 100000. Without any indexes: ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-01 00:03:00'; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40190453.00 | 100.00 | 0.01 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (30.399 sec) ``` ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-02 00:00:00'; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39281076 | 40186342.00 | 100.00 | 14.26 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (2 min 50.000 sec) ``` ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-02 00:00:00' limit 100000; +------+-------------+----------+------+---------------+------+---------+------+----------+-----------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-----------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 697806.00 | 100.00 | 14.33 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-----------+----------+------------+-------------+ 1 row in set (2.023 sec) ``` Using the (`vm_id`,`timestamp`) index: ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-01 00:01:00'; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40198741.00 | 100.00 | 0.01 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (30.866 sec) ``` ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-02 00:00:00'; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40198741.00 | 100.00 | 14.29 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (22 min 6.165 sec) ``` ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-02 00:00:00' limit 100000; +------+-------------+----------+------+---------------+------+---------+------+----------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39185187 | 1900366.00 | 100.00 | 5.26 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+------------+----------+------------+-------------+ 1 row in set (14.068 sec) ``` Using the (`vm_id`) index: ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-01 00:02:00'; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40194609.00 | 100.00 | 0.01 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (31.899 sec) ``` ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-02 00:00:00'; +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39285187 | 40198741.00 | 100.00 | 14.29 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+-------------+----------+------------+-------------+ 1 row in set (7 min 28.040 sec) ``` ``` MariaDB [teste]> ANALYZE delete from vm_stats where `timestamp` <= '2024-03-02 00:00:00' limit 100000; +------+-------------+----------+------+---------------+------+---------+------+----------+------------+----------+------------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra | +------+-------------+----------+------+---------------+------+---------+------+----------+------------+----------+------------+-------------+ | 1 | SIMPLE | vm_stats | ALL | NULL | NULL | NULL | NULL | 39185187 | 1299394.00 | 100.00 | 7.70 | Using where | +------+-------------+----------+------+---------------+------+---------+------+----------+------------+----------+------------+-------------+ 1 row in set (5.265 sec) ``` From the delete tests, we can once again see that, when deleting small amounts of data, the index overhead is unnoticeable. However, when deleting larger parts of the table, the indexes make a large difference, with the deletion using (`vm_id`,`timestamp`) being an order of magnitude slower the deletion without any indexes. However, the deletion with the (`vm_id`) index is in the same order of magnitude as the no index deletion, only being about two times slower. Looking at the test results, we can make the following conclusions: 1. Selects without any indexes take too long (who would've thought?); 2. While the (`vm_id`,`timestamp`) index makes the select faster when filtering for both `vm_id` and `timestamp`, the deletion is much slower; 3. The (`vm_id`) index lowers the select time considerably, being on the same order of magnitude of the (`vm_id`,`timestamp`) speed up, while being only a bit slower when selecting using timestamps. Moreover, the deletion is slower than the no index deletion; however, it's still on the same order of magnitude of the no index deletion. Therefore, based on these results, I believe that we could change the proposed (`vm_id`,`timestamp`) index to a (`vm_id`) index, as the (`vm_id`) offers a good speed boost for listing, and minimal slowdown when deleting, especially when used with the feature proposed in #8740. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@cloudstack.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org