The problem with many single-statement parallel updates/inserts is that  each 
statement creates an individual implicit transaction, and this transaction will 
stuck on commit, waiting for its changes to be persistent, i.e  written into 
Innodb redo log (by default also followed with fsync). There is a group commit 
feature n Innodb, which aims to fix that, but I’m not too familiar with it, not 
sure how efficient it is. Innodb_flush_log_at_trx_commit=2 mentioned by 
Guillaume can do wonders, because it omits the slowest part of commit, the 
fsync. 

On the other hand, large multivalued updates , or transactions will do a single 
write on commit for many updates, instead of multiple tiny writes. That’s the 
whole trick, which for example makes mysqldump-made logical backups fast to 
restore , without doing any work in parallel on the client side (Alright, 
mysqldump has couple more tricks, such as disabling indexes, but multivalued 
updates are essential).

You can experiment with the number of connections, and check whether increasing 
them brings much more throughput, but my guess more than say number_of_cores *2 
would not give improvement (once you start combining the updates). Definitely 
not thousands of connections. Perhaps a single client connection will suffice.

From: JCA
Sent: Monday, 7 October 2019 20:52
To: Vladislav Vaintroub
Cc: [email protected]
Subject: Re: [Maria-discuss] Performance tuning sought for MariaDB



On Mon, Oct 7, 2019 at 11:06 AM Vladislav Vaintroub <[email protected]> 
wrote:
 
You do not share many details how exactly your application exactly interacts 
with the server.
 
Do you work with large batches, I.e generate big (say 1MB) multi-valued 
statements like
 
INSERT INTO t(a,b)  VALUES(a1,b1),(a2,b2)......,(aN, bN)
ON DUPLICATE KEY UPDATE counter=counter+1

    It will be mostly individual insertions, amounting to a small amount of 
data every time.

 
The mass-deletion is straightforward
DELETE FROM t WHERE id in (id1,.......idN)

    This I do every so often, but far less frequently than insertions and 
modifications.
 
 
Make sure your multi-valued inserts/deletes do not exceed the value of  
“max_allowed_packet” session variable.
And work preferably with a single connection or small amount of connections.

      Thanks. I do the latter, actually -- if I have have several 
insertions/modifications to do, I strive to submit them in a single connection. 
I do have a separate connection per thread though. Do you think  it might be 
worth the while using a single connection, or a small connection pool, 
throughout?
 
 
If this sounds complicated, you can combine multiple updates in large 
transactions instead, though this could be slightly less efficient, since there 
is more interaction between the application and DB.

      That, in fact, has been my observation.
 
 
Rather than parallelizing single updates, it is  usually better to combine 
updates in large-ish transactions. Most of the update-related work will happen 
in background anyway, at least for innodb.
 
There is some info in the documentation that mentions multi-value inserts 
https://mariadb.com/kb/en/library/how-to-quickly-insert-data-into-mariadb/

    Thanks; I'll check that out.
 
 
 
From: JCA
Sent: Monday, 7 October 2019 18:17
To: [email protected]
Subject: [Maria-discuss] Performance tuning sought for MariaDB
 
I am running MariaDB 10.0.34 on a Slackware 14.2 system. I have a C application 
that interacts with MariaDB in the following way:
 
1. Read data from a FIFO.
2. Insert  the data into a table in a MariaDB database, if absent, or modify it 
according to certain specific criteria otherwise.
 
This works as expected.
 
The problem that I  have is that data are being written to the FIFO at a fast 
rate. In order to be able to keep up, at any given time my application reads 
the data available at the FIFO, and spawns a thread to process the chunk of 
data just read. It is in this thread that all the database interaction takes 
place.  In order to deal with this, I have the following entries in my 
/etc/my.cnf file:
 
# this is read by the standalone daemon and embedded servers
[server]

# this is only for the mysqld standalone daemon
[mysqld]
# thread_handling=pool-of-threads
# log=/var/log/mysqld.log

# this is only for embedded server
[embedded]

# This group is only read by MariaDB servers, not by MySQL.
# If you use the same .cnf file for MySQL and MariaDB,
# you can put MariaDB-only options here
[mariadb]
# log=/var/log/mysqld.log
general_log_file        = /var/lib/mysql/mysql.log
# general_log             = 1

# transaction-isolation = READ-COMMITTED 
# key_buffer = 1280M                              # 128MB for every 1GB of RAM 
# sort_buffer_size =  1M                          # 1MB for every 1GB of RAM 
# read_buffer_size = 1M                           # 1MB for every 1GB of RAM 
# read_rnd_buffer_size = 1M                       # 1MB for every 1GB of RAM 
# thread_concurrency = 24                         # Based on the number of CPUs 
                                                  # so make it CPU*2 
# thread-handling=pool-of-threads 
# innodb_flush_log_at_trx_commit != 1 
# open_files_limit = 50000

thread-handling=pool-of-threads 
max_connections = 1000
table_open_cache = 800
query_cache_type = 0
innodb_buffer_pool_size = 512M
innodb_buffer_pool_instances = 10
innodb_adaptive_hash_index_partitions = 20
innodb_lock_wait_timeout = 5000
With this, my application can keep up with the FIFO writer, but - depending on 
the circumstances - my database can't. As I am writing this, there are over 
1300 threads connected to my database; any command that I issue at the mysql 
CLI takes over one minute to return. I am keeping track on how long each thread 
takes to complete, and that is of the order of hundreds of seconds - sometimes 
thousands. Each thread is itself simple, in that it just issues a couple of 
simple MariaDB commands.  Currently my table consists of 1.6 million entries, 
and growing - on this basis, I expect that things will get only worse. Each 
entry,however, will never require more than a couple of hundred bytes of 
storage. The operations that can be undertaken on entries are insertion, 
deletion and modification, the latter being straightforward - like e.g. 
incrementing a counter or replacing a short string.
 
My system has 24 GB of  RAM and 12 cores. Occasionally all the cores are fully 
busy with MariaDB activity, but most of the time barely one or two are.
 
I am a newbie when it comes to interacting with MariaDB - please,  bear with 
me. I know I must use a single database and a single table. I also know - 
because of the nature of the data that are being written to the FIFO - that the 
 probability for two different threads to be operating on the same entry in the 
table at the same time is negligible - i.e. for all practical purposes, that 
will not happen.
 
What I need is advice on how to configure my instance of MariaDB to perform 
optimally in the scenario above. In particular, I would like for it to make 
better use of all the cores available - in essence, to parallelize the database 
operations as much as possible.
 
Feedback from the experts will be much appreciated.
 
 

_______________________________________________
Mailing list: https://launchpad.net/~maria-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to