from:"Marko Mäkelä"

Re: [Maria-discuss] Innodb_buffer_pool_read_requests is zero

2023-05-19 Thread Marko Mäkelä

Hi Daniel,

I filed https://jira.mariadb.org/browse/MDEV-31308 for the
accidentally disabled-by-default trx_rseg_history_len.

The counter innodb_buffer_pool_read_requests was refactored in
MDEV-26827 (MariaDB Server 10.6.13). The monitor counter and the
status variable are supposed to get the data straight from the global
variable buf_pool.stat.n_page_gets. For the monitor counter, a
function is called to copy the value. The status variable is directly
pointing to the variable. However, this is a sharded variable, so the
status variable would only reflect the first shard. Based on
https://jira.mariadb.org/browse/MDEV-21212 we must retain this
variable as sharded and the reporting of the status variable must be
corrected. I filed https://jira.mariadb.org/browse/MDEV-31309 for
this.

Thank you for your input!

Marko

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Undo logs aren't truncating

2023-05-14 Thread Marko Mäkelä

Hi,

The latest quarterly releases of MariaDB Server 10.5 and later include
a fix of https://jira.mariadb.org/browse/MDEV-30671 which fixes
corruption due to premature purging of history.

It has turned out that this fix causes another bug where some undo log
pages will never be freed
(https://jira.mariadb.org/browse/MDEV-31234). I am working on a fix.

Marko

On Sat, May 13, 2023 at 9:37 PM  wrote:
>
> On a recently updated MariaDB 10.6.13 server (Debian Bullseye) I noticed
> that my ibdata1 file was continuously growing; it had gotten a little
> over 3G after running a couple days. I would rather keep this growth in
> check so used it as an excuse to setup undo tablespaces that could be
> truncated (I had wanted to try this for some time anyway). So: Dump the
> whole DB, delete all the InnoDB tables, stop the server, delete the
> innodb system tables, reset the appropriate undo variables (with
> innodb_max_undo_log_size set to 750M, innodb_undo_tablespaces set to 4,
> and innodb_undo_log_truncate set to ON), restart the server and reload
> the data.
>
> I now have FOUR undo files that appear to be continuously growing; as of
> this writing they are over 1G and still increasing.
>
> This is an active read-only replica (processing updates very frequently;
> at times continuously but not constantly as such). How do I get it to
> actually reclaim the space? Or am I misunderstanding how this is
> supposed to work?
>
> Below are my current relevant settings (or at least what I think are
> relevant). Normally I run with innodb_purge_rseg_truncate_frequency =
> 128; but for testing I have it now set to 1 (but with no noticeable
> difference).
>
> Things I have tried (no luck on any of them):
>
> Set innodb_purge_rseg_truncate_frequency = 1.
>
> Set innodb_max_purge_lag_wait=0 (it returned nearly immediately).
>
> Stop both slave threads and wait for several minutes.
>
> Set innodb_max_purge_lag and innodb_max_purge_lag_delay (see below).
>
> MariaDB [(none)]> show variables like '%trunc%';
> +--+---+
> | Variable_name| Value |
> +--+---+
> | innodb_purge_rseg_truncate_frequency | 1 |
> | innodb_undo_log_truncate | ON|
> +--+---+
> 2 rows in set (0.000 sec)
>
> MariaDB [(none)]> show variables like '%purge%';
> +--++
> | Variable_name| Value  |
> +--++
> | aria_log_purge_type  | immediate  |
> | innodb_max_purge_lag | 1000   |
> | innodb_max_purge_lag_delay   | 5000   |
> | innodb_max_purge_lag_wait| 4294967295 |
> | innodb_purge_batch_size  | 300|
> | innodb_purge_rseg_truncate_frequency | 1  |
> | innodb_purge_threads | 4  |
> | relay_log_purge  | ON |
> +--++
> 8 rows in set (0.000 sec)
>
> MariaDB [(none)]> show variables like '%undo%';
> +--+---+
> | Variable_name| Value |
> +--+---+
> | innodb_max_undo_log_size | 786432000 |
> | innodb_undo_directory| ./|
> | innodb_undo_log_truncate | ON|
> | innodb_undo_tablespaces  | 4 |
> +--+---+
> 4 rows in set (0.000 sec)
>
> _______
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp



-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Innodb_buffer_pool_read_requests is zero

2023-05-12 Thread Marko Mäkelä

 Innodb_buffer_pool_read_ahead_evicted   | 0 |
> >> | Innodb_buffer_pool_read_requests| 0 |
> >> | Innodb_buffer_pool_reads| 604346 |
> >> | Innodb_buffer_pool_wait_free| 0 |
> >> | Innodb_buffer_pool_write_requests   | 3639341 |
> >> +-+--+
> >> 25 rows in set (0.000 sec)
> >>
> >> MariaDB [(none)]> SHOW VARIABLES LIKE 'innodb_buffer_pool%';
> >> +-++
> >> | Variable_name   | Value  |
> >> +-++
> >> | innodb_buffer_pool_chunk_size   | 134217728  |
> >> | innodb_buffer_pool_dump_at_shutdown | ON |
> >> | innodb_buffer_pool_dump_now | OFF|
> >> | innodb_buffer_pool_dump_pct | 80 |
> >> | innodb_buffer_pool_filename | ib_buffer_pool |
> >> | innodb_buffer_pool_load_abort   | OFF|
> >> | innodb_buffer_pool_load_at_startup  | ON |
> >> | innodb_buffer_pool_load_now | OFF|
> >> | innodb_buffer_pool_size | 15032385536|
> >> +-++
> >> 9 rows in set (0.000 sec)
> >>
> >> ___
> >> Mailing list: https://launchpad.net/~maria-discuss
> >> Post to : maria-discuss@lists.launchpad.net
> >> Unsubscribe : https://launchpad.net/~maria-discuss
> >> More help   : https://help.launchpad.net/ListHelp
> >
> > ___
> > Mailing list: https://launchpad.net/~maria-discuss
> > Post to : maria-discuss@lists.launchpad.net
> > Unsubscribe : https://launchpad.net/~maria-discuss
> > More help   : https://help.launchpad.net/ListHelp
> >
>
> ___
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp



-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] mysql.innodb-table-stats-and mysql.innodb-index-stats are not updating properly

2023-04-19 Thread Marko Mäkelä

On Wed, Apr 19, 2023 at 8:04 AM ragul rangarajan
 wrote:
> I am facing a weird issue with my DB table where its Index and table stats 
> not getting updated with values automatically in MariaDB 10.6.7

Hi Ragul!
The same question had been posted to
https://stackoverflow.com/questions/76047397/mysql-innodb-table-stats-and-mysql-innodb-index-stats-are-not-updating-properly
where I posted a reply. I think that this is most likely due to
https://jira.mariadb.org/browse/MDEV-27805 and less likely
https://jira.mariadb.org/browse/MDEV-28327. Both bugs have been fixed
after the release of MariaDB Server 10.6.7.

I recommend an upgrade to MariaDB Server 10.6.12 or the upcoming
10.6.13 that should be out within a couple of weeks.

Best regards,
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] DELETE query hanging on Statistics (10.6.12)

2023-04-13 Thread Marko Mäkelä

Hi Dan,

I see one thread that is doing a re-entrant call to
btr_cur_pessimistic_delete() on a secondary index tree when purging
the history of a committed transaction (such as a DELETE or an UPDATE
of an indexed column). It matches the hang
https://jira.mariadb.org/browse/MDEV-29835 that was actually
introduced in MySQL 5.7 already and has been present in MariaDB Server
starting with 10.2.2.

The thread right below that is executing
btr_estimate_n_rows_in_range(), which was improved in
https://jira.mariadb.org/browse/MDEV-21136 in MariaDB 10.6.9. After
that change, we started to see much more InnoDB hangs. For the 10.6.12
release, I tried to fix MDEV-29835. Because I ran out of time, I fixed
only part of it, in https://jira.mariadb.org/browse/MDEV-30400. I have
the feeling that this partial fix made the hangs in the remaining
cases much more likely.

Yasufumi Kinoshita introduced in MySQL 5.7 a latch mode that sits
between exclusive (X) and shared (S), called SX by him, and called U
(Update) by me in https://jira.mariadb.org/browse/MDEV-24142. At most
one X or SX lock can be granted on an object at a time. While X locks
conflict with S locks, the SX lock allows any number of S locks to be
acquired concurrently.

The problem is lock order inversion because Yasufumi’s implementation
violates his own design constraints
https://dev.mysql.com/worklog/task/?id=6326 High Level Architecture. I
helped formulate those rules back then, but I was not otherwise
involved with the design, implementation or review of the change.
Unfortunately, one section heading "(2) When holding index->lock
SX-latch:" is missing.

The purge thread that is doing the re-entrant call to
btr_cur_pessimistic_delete() is holding an index tree SX-latch and
some leaf page latches. As part of the page merge, it has to access
some non-leaf pages on which it did not acquire latches upfront.
According to the design rules, this is the wrong order of acquiring
latches. The btr_estimate_n_rows_in_range() thread is holding an index
S-latch and following the correct order for that case.

Without having the output of "thread apply all backtrace full", I
cannot say for sure that this is a case of MDEV-29835, but I think
that it is extremely likely. Based on other cases that I have
analyzed, I expect that the btr_cur_pessimistic_delete() is holding a
page latch that btr_estimate_n_rows_in_range() is waiting for, and it
is waiting for a higher-level page latch that
btr_estimate_n_rows_in_range() is holding.

The simple fix to this would be to never use the index SX-lock mode,
and always escalate to exclusive locking. We actually tried that years
back in https://jira.mariadb.org/browse/MDEV-14637 but it would have
caused a significant performance regression. The upcoming quarterly
releases (within a month or so) includes a fix of MDEV-29835 that only
escalates to exclusively locking the index tree when it is really
needed. In debug builds, we have assertions that would fire if index
page latches are being acquired in the wrong order while not holding
an exclusive index latch. This fix was tested both for correctness
(lack of debug assertions) and performance.

This is not the only bug that is related to SX-locks.
https://jira.mariadb.org/browse/MDEV-29883 is another example.

Some users are successfully using a development snapshot that includes
the fix of MDEV-29835. In https://jira.mariadb.org/browse/MDEV-30481
you can find one example.

With best regards,
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB plc

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB Online-DDL

2023-03-17 Thread Marko Mäkelä

Hi Rhys,

On Fri, Mar 17, 2023 at 1:04 PM  wrote:
> I'm back working with MariaDB again after a break of some years. While InnoDB 
> Online-DDL had appeared when I was working with it, it was still sometimes a 
> challenge to manage schema changes for very large tables.

One pet hate already back in the MySQL 5.6.8 times when I had
implemented WL#5526 (online ADD INDEX) and WL#6255 (online table
rebuild) was something that was finally fixed in MariaDB 10.8 as
https://jira.mariadb.org/browse/MDEV-11675 (Lag free ALTER). I think
that this was the main reason to have tools like
pt-online-schema-table, to reduce the replication lag. With
MDEV-11675, the replicas will start ALTER operation simultaneously
with the primary server, and eventually apply a "commit" or "rollback"
event based on the final outcome.

Another old problem was "fake duplicates", fixed in
https://jira.mariadb.org/browse/MDEV-15250 (MariaDB 10.6.8), by
replicating concurrent DML during InnoDB transaction commit and not
earlier. The implementation was great work by my colleague Thiru.

You should also be aware of the instant ADD column (MDEV-11369,
MariaDB 10.3) and instant drop/reorder column (MDEV-15562, MariaDB
10.4). Something similar to this was implemented in MySQL 8.0 a few
years later. I would not necessarily make extensive use of these
(especially instant DROP), because of additional storage and access
overhead. In a perfect world, there would be resources to develop an
InnoDB-like storage engine with a saner file format that would allow
the data file to be eventually converted into a canonical format with
no "instantness overhead".

Another motivation for online schema change tools might be that many
operations were not supported natively. In MariaDB 10.4, much of this
was fixed (there is a table in
https://jira.mariadb.org/browse/MDEV-11424). For some changes, such as
changing column data types in a way that requires some data
conversion, the non-native ALGORITHM=COPY is still needed. We do have
an online variant of that under development; see
https://jira.mariadb.org/browse/MDEV-16329.

> Is it comparable to MSSQL (which I found to be pretty good for tables in the 
> many tens of GB)?

I hope that others can say something about that.

Best regards,
Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] can not kill process

2023-03-17 Thread Marko Mäkelä

On Thu, Mar 16, 2023 at 11:10 PM Radisson  wrote:
> if it happens again and if possible i will try gdb to see where its
> stuck. Its a production machine, so do not hold your breath :)

You can enable core dumps, so if the built-in InnoDB watchdog triggers
an abort of the server process (a lock wait time exceeds
innodb_fatal_semaphore_wait_threshold), you should get a core dump of
the crash. Starting with 10.3, the core dump does not include some
large buffers, such as the InnoDB buffer pool, so hopefully it should
not be unacceptably large.

Yesterday, I pushed the fix of
https://jira.mariadb.org/browse/MDEV-29835 (a hang that was introduced
in MySQL 5.7 and MariaDB 10.2.2) to the 10.6 branch. Today, I have
pushed a merge of it to all 10.x branches. The 11.0 and 11.1 branches
should follow within an hour.

The next scheduled quarterly releases should be around April 27.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] can not kill process

2023-03-13 Thread Marko Mäkelä

Hi,

On Sun, Mar 12, 2023 at 12:45 PM Radisson  wrote:
> Am 11.03.23 um 10:53 schrieb radisso...@gmx.de:
> > hi i have a strange effect
> > a simple insert got stuck (why ?)

I am afraid that the question could only be answered if you had
attached a debugger to the mariadbd process and produced a stack trace
of all threads.
In https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
you can read how this is done. Installing the -debuginfo or -dbgsym
package for the server makes the stack traces much more useful.

> > any ideas what may the problem ?

We have observed the InnoDB hang
https://jira.mariadb.org/browse/MDEV-29835 on MariaDB 10.6 and later
versions in our internal testing. I believe that this type of a hang
is possible starting with MariaDB 10.2 and MySQL 5.7.

Furthermore, I suspect that the InnoDB "os_event" that was removed in
MariaDB Server 10.6 could occasionally lose signals. We occasionally
observe test failures on our CI systems due to that.

It is hard to tell which type of hang would be more likely, and your
hang could also have occurred due to something completely different.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MySQLTuner-perl 2.0.10 with MariaDB 10.3.35 + Plesk

2023-01-18 Thread Marko Mäkelä

On Wed, Jan 18, 2023 at 3:10 PM Gordan Bobic  wrote:
>
> A lot of MySQL Tuner's advice is based on extensive misconceptions and
> questionable understanding of MySQL internals. Realistically - the
> best option is to just pretend it doesn't exist.

Ignorance is bliss; I did not even know that it existed.

> innodb_log_file_size should be (=16M) if possible, so InnoDB total
> log files size equals 25% of buffer pool size.
>
> I have no idea where this notion that innodb_log_file_size is in any
> way related to the buffer pool size, but it is completely
> non-sensical.

It might be motivated by the fact that MySQL as well as MariaDB Server
before 10.5 before MDEV-21351 and some follow-up fixes could easily
run out of memory on recovery.

Yes, there is some logic to run recovery in multiple batches, but that
does not work reliably. For MariaDB before 10.5, we have a known bug
MDEV-22512 that I am not going to fix.

I think that the two by far most important InnoDB parameters are
innodb_buffer_pool_size and innodb_log_file_size. If you can afford
potentially longer crash recovery times as well as the space usage, it
could make sense to set innodb_log_file_size even multiple times
innodb_buffer_pool_size. That can help avoid write amplification if
the same data pages are being overwritten many times. Page writes only
have to take place as part of a log checkpoint flush, or when the
buffer pool is full and the least recently used pages are dirty.

It should be worth noting that starting with MariaDB 10.5, there will
no longer be writes of garbage pages that were freed from the
underlying file, for example, as part of DROP INDEX, TRUNCATE TABLE,
or a massive DELETE.

In https://jira.mariadb.org/browse/MDEV-19895 I would welcome
constructive feedback on how to set sane default values of some
parameters, based on a small number of parameters, say, the available
size of memory, CPU cores, storage, and the size of the database.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] DBMS vs Storage Engine Alone

2022-12-20 Thread Marko Mäkelä

On Fri, Dec 16, 2022 at 3:46 AM Jonathan M. Wilbur
 wrote:
> Sorry to ask this question here, since it is not strictly MariaDB-related, 
> but I need the knowledge of people that develop with databases: is there 
> significant overhead to using a DBMS versus a storage engine directly?

I was thinking about this recently. For some predetermined access
patterns, it might pay off to translate the SQL statements directly
into executable code that would directly access the buffer pool (page
cache) of the storage engine. In terms of InnoDB, the generated code
could work on the level of mini-transactions (atomic access to
multiple buffer pages). The "class handler" interface that
HandlerSocket uses still involves conversions between row formats, for
example.

I did not look into details, but
https://github.com/KowalskiThomas/LLVMSQLite seemed interesting to me.
Of course, we have to keep in mind that SQLite is a much simpler
architecture, maybe not at all suitable for high-concurrency
applications, but it could make sense in the embedded space or in some
read-mostly settings.

In a general purpose database, some just-in-time compilation could
help with more complex queries. It would be a trade-off between memory
management, time to compile, and time to run the compiled vs.
interpreted code path. For a stream of simple arbitrary queries,
interpreting might win.

Best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Maria-db refuses to start

2022-12-09 Thread Marko Mäkelä

On Thu, Dec 8, 2022 at 6:42 PM Reindl Harald  wrote:
> WTF - but that don't change anything - you just can't dump+restore multi
> GB large databases in production environments and i expect from a new
> software version that it simply can read it's older data

Yes, I fully agree with that sentiment. Being able to upgrade without
logical dump and restore has always been part of the design philosophy.

> and when you are at it get rid of "ib_logfile0" and "ib_logfile1" which
> hold data even in file-per-table mode you can't cleanup from crap caused
> by a crash 13 years ago

In MySQL 5.6.8 I fixed the bug that you could not change
innodb_log_file_size or innodb_log_files_in_group afterwards. With that
fix, we would safely create new logs on server startup.

MariaDB 10.2 changed the minimum value of innodb_log_files_in_group to 1,
and 10.5 hard-wired it to that value.

Starting with MariaDB 10.9, you can SET GLOBAL innodb_log_file_size while
the server is running. A second log file will be created, to replace the
ib_logfile0 at the earliest opportunity (completion of a log checkpoint).

On upgrades from earlier versions to 5.7 or 10.2 or later, to 10.5 or later
(MDEV-12353), or 10.8 or later (MDEV-14425), the log files will be rebuilt.

> and what happens over the long with data that has no checksums at all
> because it was possible to completly disable them in the past?

It is anyone's guess. For added benefit, one might also disable the
doublewrite buffer without testing that torn writes actually are
impossible, and then complain that InnoDB crashes on a corrupted page.
MDEV-13542 and some related changes recently fixed (most of) that in 10.6.

Files that were created while innodb_checksum_algorithm=full_crc32 was in
effect will always use that checksum format, even if you try to disable
checksums afterwards. OK, you can only do that in 10.4 or 10.5; MDEV-25105
in 10.6 removed the innodb_checksum_algorithm values "none" and "innodb".

Regarding the dbmail/#sql2-704-271.ibd in your other message, in MariaDB
10.5 or later you should be able to drop the table even if a .frm file does
not exist, with
DROP TABLE dbmail.`#mysql50##sql2-704-271`;
If you are using an older version, you could try to copy the .frm file of
another InnoDB table to that name, and then remove the file.

DDL operations should be crash-safe and most of them are atomic starting
with MariaDB 10.6.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Maria-db refuses to start

2022-12-08 Thread Marko Mäkelä

On Thu, Dec 8, 2022 at 5:41 PM Reindl Harald  wrote:
> > 
> > I think you mean mysql, not postgresql...
>
> no, i mean postgresql, that piece of crap breaking for years after
> dist-upgrades because you need to do dump+restore at every version
> change - real fun because you need to remember about the dump *before*

The other side of the coin is a questionable InnoDB "optimization"
that was finally fixed in MySQL 5.1.48 (in 2010) due to an external
bug report. Heikki Tuuri firmly believed that zeroing out pages when
they are initialized burns too many CPU cycles, so it is better to
just write whatever garbage to any unused bytes in data pages. He
actually forbade me to fix it back in 2004 or 2005, and forbade me to
spend any time to prove what kind of bad things could happen. (Back
then, even the FIL_PAGE_TYPE was written uninitialized to anything
other than B-tree index pages. So, if the page type field said
FIL_PAGE_INDEX, you could not say if it actually was an index page.)

I have tried to keep the implications in mind, but there has been at
least one serious bug regarding this:
https://jira.mariadb.org/browse/MDEV-27800

If a database with InnoDB had been originally initialized before MySQL
5.1.48 and you kept upgrading the binary files, you could have a
similar ticking time-bomb in some other data structure, which might be
forgotten when a future minor change to the data file format is made.
This would not be caught in normal upgrade tests, which would
initialize a database using a previous server version (say, 10.2) and
then upgrade to 10.3. To catch something like this, you would have to
initialize and populate the database in MySQL 5.1.47 or earlier, with
a suitable usage pattern that actually causes some unused to contain
suitably unlucky garbage bytes. (On a freshly initialized database
they could easily be 0.)

MariaDB Server 10.4 introduced a new file format
innodb_checksum_algorithm=full_crc32, and MariaDB Server 10.5 made it
the default. Any files that were created when that setting is active
are guaranteed to write any unused bytes as zeroes. It also fixes a
peculiar design decision that some bytes of the page are not covered
by any checksum, and that a page is considered valid if any of the
non-full_crc32 checksums happen to produce a match. This includes the
magic 0xdeadbeef for innodb_checksum_algorithm=none.

Maybe we should consider eventually deprecating write support for the
non-full_crc32 format, to force a fresh start.

Note: We still spend effort to allow in-place binary upgrades. My last
example of that was https://jira.mariadb.org/browse/MDEV-29694,
removing one of the innovative InnoDB features: the change buffer
a.k.a. insert buffer, because it has been the source of many
hard-to-debug index corruption bugs.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Maria-db refuses to start

2022-11-08 Thread Marko Mäkelä

Hi Jogchum,

I hope that someone else on this list can help you further with this.

One thing that you could try is to install the newer (10.8+) MariaDB
server package and see if InnoDB in the 10.6.10 happened to perform a
clean shutdown anyway. I do not see any messages about InnoDB crash
recovery, so it could be the case. If the upgrade still fails, you
will have to try to get the 10.6 server to start up and shut down.

Best regards,

Marko

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Maria-db refuses to start

2022-11-08 Thread Marko Mäkelä

Hi Jogchum,

Sorry, I did not notice that you had a question for me.

On Sun, Oct 16, 2022 at 11:02 AM Jogchum Reitsma
 wrote:
>> Op 14-10-2022 om 12:45 schreef Marko Mäkelä:
>>
>>Hi Jogchum,
>>
>>On Wed, Oct 12, 2022 at 4:30 PM Jogchum Reitsma
>> wrote:
>>[...]
>>
>>When trying to start it manually, I got the message
>
>>[ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was
>>created with MariaDB 10.5.8.
>
>>Searching that message, I saw a post on
>>https://bbs.archlinux.org/viewtopic.php?id=259364
>
> <...>
>
>>The correct advice is to install MariaDB Server (any version between
>>10.5 and 10.7 should do), shut down the database, and then upgrade to
>>10.8.
>>
>>Before 10.5 (MDEV-12353), the last time the redo log format was
>>changed was in MariaDB Server 10.2. If you got that error message when
>>attempting to upgrade from something older than 10.5, then you should
>>use the exact same major version of MariaDB to perform the shutdown,
>>because there were some changes to the redo log format in each major
>>release between 10.1 and 10.5.
>>
>>I hope that someone can post a correction to that forum.
[...]
> I uninstalled the mariadb version from opensuse Tumbleweed,
> and downloaded and installed mariadb-10.6.10-linux-systemd-x86_64 (init 
> system systemd)in /usr/local, according to the steps explained in the 
> INSTALL_BINARY document.
>
> My assumption of the next steps to take is
>
> - create /usr/local/mysql/data
> - chown:chgrp it to mysql:mysql
> - in /etc/my.cnf, set /usr/local/mysql/data being the data directory (with 
> opensuse, that is normally /var/lib/mysql, by the way)
> - do a mount --bind from my real data directory to /usr/local/mysql/data
> - then start the database with /usr/local/bin/mysqld_safe --user=mysql &
>
> Are the steps correct?
>
> If so, I would think the next thing to do is to stop the server with kill -15 
> , remove the /usr/local/mariadb-10.6.10-linux-systemd-x86_64 
> directory,
> and install the current TW-version.

Yes, the steps look roughly correct, possibly except for the part of
shutting down the 10.6.10 server.

As far as the server is concerned, SIGTERM (signal 15) or SIGQUIT
should initiate a shutdown.

I would think that on systemd, the preferred way to start and stop
services would be the following:

systemctl start mariadb
systemctl stop mariadb

Systemd could actually take the role of mysqld_safe.

It could be simplest to start the server directly as "mariadbd" so
that nothing will keep restarting it after you initiated the shutdown.
If you run the process in the foreground, then ctrl-\ should be
SIGQUIT and it should initiate a shutdown.

You'd better check with "pgrep mariadbd" or "pgrep mysqld" that
nothing is running, both before starting and after shutting down the
server.

I will clarify the error message. For 10.3 and 10.4, it will look like this:
https://github.com/MariaDB/server/commit/9ac8be4e2980aa995117147e39ae5b7ad79fc980
Starting with 10.5, it may additionally suggest to use a version not
later than MariaDB 10.4.
Starting with 10.8, it may additionally suggest to use a version not
later than MariaDB 10.7.

With best regards,

Marko Mäkelä
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Maria-db refuses to start

2022-10-14 Thread Marko Mäkelä

Hi Jogchum,

On Wed, Oct 12, 2022 at 4:30 PM Jogchum Reitsma
 wrote:
[...]
> When trying to start it manually, I got the message
>
> [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was
> created with MariaDB 10.5.8.
>
> Searching that message, I saw a post on
> https://bbs.archlinux.org/viewtopic.php?id=259364

The advice given there is plain wrong, because it will cause the
contents of the write-ahead log to be ignored. The 10.8 startup
failure already proves that the log is nonempty, that is, some changes
must be applied. It will result in a database whose state does not
correspond to a single given point of time. Some pages are older, some
newer. That approach could cause various crashes, potentially a long
time afterwards. Many of the crashes on corrupted InnoDB data were
recently fixed in MDEV-13542.

I changed the InnoDB redo log format in MariaDB Server 10.8 in
MDEV-14425. At the same time, in
https://jira.mariadb.org/browse/MDEV-27199 I implemented a check that
prevents corruption if such incorrect and dangerous "advice" is being
followed.

The correct advice is to install MariaDB Server (any version between
10.5 and 10.7 should do), shut down the database, and then upgrade to
10.8.

Before 10.5 (MDEV-12353), the last time the redo log format was
changed was in MariaDB Server 10.2. If you got that error message when
attempting to upgrade from something older than 10.5, then you should
use the exact same major version of MariaDB to perform the shutdown,
because there were some changes to the redo log format in each major
release between 10.1 and 10.5.

I hope that someone can post a correction to that forum.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Increase in IOPS between 5.5 and 10.4

2020-11-13 Thread Marko Mäkelä

If you are using the InnoDB storage engine, the explanation could be
MDEV-12288. A fix is underway in MDEV-515, for speeding up inserts
into empty tables or partitions.

The recent MariaDB 10.5.8 release includes significant improvements to
write performance (MDEV-23399 and MDEV-23855). If you do not mind some
variance in the throughput and latency, you might want to configure
the database so that InnoDB will only perform 'emergency flushing',
and basically only perform sequential writes to ib_logfile0 most of
the time.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] InnoDB: background jobs & fsyncs

2020-09-21 Thread Marko Mäkelä

Hi Julien,

On Mon, Sep 21, 2020 at 6:13 PM Julien Muchembled  wrote:
> I hoped to get an answer about how to tell InnoDB to complete its background 
> tasks as soon as possible (even if background activity has no impact on perf, 
> not having to wait so long to make a clean backup of the datadir remains 
> useful) but I guess there's no way. Should I open a task issue on JIRA ?

We already have https://jira.mariadb.org/browse/MDEV-22958 filed for
adding a SET GLOBAL statement that would wait for the purge to be
finished. (Admittedly, the title of the bug is misleading, referring
to a test timeout on our CI system.)

> After further analysis, we found that the deadlocks are due to a problem in 
> our application. We have a scheduler that uses MariaDB to store tasks and the 
> performance profile of the new hardware is so unusual that some kinds of 
> tasks accumulated to a point that the scheduler couldn't handle.

So, is there anything that you would still want to diagnose? I would
recommend reading
https://mysqlentomologist.blogspot.com/2020/09/dynamic-tracing-of-pthreadmutexlock-in.html
and watching the MariaDB Fest 2020 recording of Valerii’s talk. There
is also the command "offcputime" which should be similar to "perf" but
counting waits. http://www.brendangregg.com/offcpuanalysis.html shows
how to combine it with flame graphs. I do not have any hands-on
experience on these techniques yet.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] InnoDB: background jobs & fsyncs

2020-09-15 Thread Marko Mäkelä

Hi Julien,

First of all, is the version really 10.2.23, which was released in
March 2019? If you are actually using the latest 10.2.33 release, you
might be affected by MDEV-20638, which surprisingly introduced a
performance regression. Based on benchmarks, we reverted that change
from the upcoming 10.2, 10.3, and 10.4 releases (but not 10.5).

A significant source of background activity is the purge of version
history of old transactions. If SHOW ENGINE INNODB STATUS is reporting
a nonzero "History list length", some purge activity will be needed.
Before MariaDB 10.5, there was also a background merge of buffered
changes to secondary indexes.

I would suggest checking with http://poormansprofiler.org/ or "sudo
perf top" what is causing the I/O, and posting the stack traces. You
might even try attaching GDB to the server and setting a breakpoint on
fsync(), with breakpoint commands "finish" and "continue" so that you
can collect interesting stack traces.

Also, I would recommend you to try a later major version. The InnoDB
version in MariaDB 10.3 should scale better thanks to a lock-free hash
table for maintaining the set of active transactions. MariaDB 10.5
included further improvements. But, be aware that we are working on a
10.5 performance regression in page flushing, in MDEV-23399.

With best regards,

Marko Mäkelä

On Tue, Sep 15, 2020 at 1:25 PM Julien Muchembled  wrote:
>
> Hello,
>
> We are evaluating new hardware by reproducing real-life workload on
> real-life data which works fine on an existing server.
> We copied over the software used to ensure an apple-to-apple
> comparison, which is based on MariaDB 10.2.23 and uses primarily
> InnoDB tables, with a few Mroonga tables which do not seem involved in
> the problem.
>
> On the new hardware, we are seeing catastrophically bad performance,
> especially chain-deadlocks happening and being resolved but producing
> virtually no useful work at our usual parallelism level of 64 active
> connections.
>
> If we reduce parallelism to 2 active connections, queries start to
> succeed in any meaningful proportion.
>
> The database was freshly restored from a mysqldump (as we cannot
> interrupt the original database), could this have any effect on
> deadlocks ?
> The main tables involved in the deadlocking queries (job queues) are
> initially (in the test, and periodically in reality) empty, so it would
> seem surprising.
>
> What should I check to debug further ?
>
> Note that these deadlocks happen in production but they're negligible.
>
> See attachment for mariadb configuration file.
>
> Some more details:
> When we restore from mysqldump, we use the following extra options to
> speed things up:
>   innodb_flush_log_at_trx_commit = 0
>   innodb_flush_method = nosync
>   innodb_doublewrite = 0
>   sync_frm = 0
> as during that period nothing of value can be lost (worst case we
> restart the restore from scratch).
> BTW, despite these settings, we are still noticing a lot of fsyncs. Is
> this expected ? Are we missing some other option ?
>
> Also, after the import and without any connection, MariaDB was still
> producing a non-trivial amount of activity on the machine: 5% CPU, read
> <1MB/s, write 10MB/s, 60 fsync/s.
> I could not identify what is causing these, where should I look ? How
> can I tell when it will stabilise back to idle ?
> When I interrupt the benchmark workload, I see a similar resource usage.
>
> I saw https://jira.mariadb.org/browse/MDEV-18698
> And InnoDB seems to throttle its background activity: is there a way to
> tell InnoDB to perform its background tasks at maximum speed ?
> (the goal is that after the import, we can make a clean tarball that we
> extract when we want to launch the test case again)
>
> Regards,
> Julien
> _______
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp

-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] 回复: Some questions about the Aarch64 CI

2019-12-10 Thread Marko Mäkelä

Hi Kevin,

On Wed, Dec 11, 2019 at 3:45 AM Zheng Zhenyu  wrote:
>
> Hi Marko,
>
> First off, sorry for the delay of reply of your previous question, I was on 
> travel and I saw few people already replied so I didn't do it.
>
> As for your suggestion, yes, it will be no problem, I think Vlad take cares 
> of the Docker images. The resource are donated to the community, and the 
> community can do whatever is required.

Thank you. I have asked Vlad to diagnose this problem. I am not
completely aware how the Buildbot integration works and who is
responsible for what, but now I have a slightly better idea of it.

Gordan Bobic: I apologize for my brain-fart. Obviously we cannot
simply abandon less widely used platforms for stable release branches.
If we did that, GNU/Linux distributions could hit the same problems
and could be unable to update their packages for some architectures.
My personal preference would be to *additionally* have some
bleeding-edge compilers and environments running on the continuous
integration environment.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] 回复: Some questions about the Aarch64 CI

2019-12-10 Thread Marko Mäkelä

Hi Kevin,

On Mon, Nov 25, 2019 at 8:23 AM Zheng Zhenyu  wrote:
> Thanks alot for re quick response, the results looks cool. And Also, I 
> noticed that the ARM job sometimes are slower than others in the 
> fetch_tarball phase, this might due to that our machine is in China and the 
> network connection is a little bit slow, I just got the info that our machine 
> will be available in Singapore latter this month or ealier next month, maybe 
> then we can then provide a machine with faster network which can speed up the 
> jobs.

Related to this, I wonder if it would be possible to install a newer
operating system (or Docker image), such as CentOS 8 or Debian 10 or
the most recent Fedora.

What prompts me to ask is that I just noticed a compilation failure of
MariaDB 10.2 that might be addressed by upgrading to a newer compiler:

/buildbot/aarch64-centos-7/build/storage/innobase/row/row0log.cc: In
function 'dberr_t
_ZL17row_log_apply_opsPK5trx_tP12dict_index_tP15row_merge_dup_tP16ut_stage_alter_t.isra.94(const
trx_t*, dict_index_t*, row_merge_dup_t*)':
/buildbot/aarch64-centos-7/build/storage/innobase/row/row0log.cc:3734:1:
error: could not split insn

More information is available in
https://buildbot.mariadb.org/#/builders/33/builds/260 (including a
request to submit a compiler bug report).

I remember seeing that kind of an error for some 64-bit atomic
operation on a very old GCC targeting x86 (on CentOS 5 maybe?). While
we have older compilers than GCC 4.8.5 for other instruction set
architectures, I do not think that we run into internal compiler
errors very often.

On my AMD64 desktop, I am currently using GCC 9.2.1 and clang 9.0.0.
As a developer, I prefer to have the most recent versions of tools
whenever it is possible, for better diagnostics and possibly better
optimizations.

Best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Performance lags with MariaDB built from source

2019-12-04 Thread Marko Mäkelä

Hi Nisarg,

You mentioned that you compiled a debug build. Based on the
configuration parameters that you mention, I assume that your tables
are in the InnoDB storage engine.
InnoDB contains very many debug assertions that make it notoriously
slow. To make matters worse, by default, the debug builds are
unoptimized.

If you are only interested in debugging symbols and not debugging
instrumentation, then I suggest using
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo

In that way, you should see meaningful output in tools like 'perf
top'. The release build should be compiling with something like -O2 by
default, but it could be improved by setting -march=native
-mtune=native.

For what it is worth, for developing InnoDB, I normally use the
following build options, so that tests would run faster:

cmake -DCMAKE_{C_COMPILER=clang,CXX_COMPILER=clang++}-9 \
-DCMAKE_C{,XX}_FLAGS='-O2 -march=native -mtune=native
-Wno-unused-command-line-argument -fdebug-macro' \
-DCONC_WITH_{UNITTEST,SSL}=OFF -DWITH_EMBEDDED_SERVER=OFF
-DWITH_UNIT_TESTS=OFF \
-DCMAKE_BUILD_TYPE=Debug \
-DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DWITH_SAFEMALLOC=OFF -DWITH_{ZLIB,SSL}=system \
/path/to/source

For gcc, my spell is similar, except for the first two lines, which
are replaced by the following:

cmake -DCMAKE_C{,XX}_FLAGS='-O2 -m'{arch,tune}=native \

Above, I am disabling most other components than InnoDB, because I am
working on InnoDB internals. It is important to set
WITH_SAFEMALLOC=OFF if you are using WITH_ASAN=ON, because it would
only add unnecessary overhead and could hide some errors from ASAN. If
you do not know what ASAN is, I suggest visiting
https://github.com/google/sanitizers/wiki/.

Best regards,

Marko

On Wed, Dec 4, 2019 at 11:21 AM Nisarg Shah  wrote:
>
> System config:
> OS: Ubuntu 18.04
> CPU: 2x Intel Xeon E5-2660 (40 logical cores)
> RAM: 160GB
>
> I built a debug build of MariaDB server 10.5 from the Github source
> and tried to load and execute the TPC-H benchmarks on it with a
> database size of 1GB. While loading the data and executing queries, I
> observed that MariaDB takes an unusually large amount of time (more
> than 10x compared to other databases like QuickStep or MonetDB) to
> load the data and execute queries (more than 10 minutes per query,
> compared to less than a minute for other databases). I'm running the
> server with following options -
>
> ./bin/mysqld --user=nisargs
> --datadir=/fastdisk/mariadb-server-install-2/data/
> --innodb-read-io-threads=40 --innodb-write-io-threads=40
> --innodb-buffer-pool-size=16G --innodb-buffer-pool-instances=40
>
> I just wanted to get some eyes on this and check if there is something
> wrong with the command line options (or whether I can tweak some
> options to improve performance). I'm just trying to conduct some
> benchmarking experiments and not very familiar with databases.
>
> Thanks
> Nisarg
>
> ___
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp



-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Some questions about the Aarch64 CI

2019-11-29 Thread Marko Mäkelä

On Thu, Nov 28, 2019 at 3:24 PM Gordan Bobic  wrote:
> For what it's worth, unaligned access does come with a performance penalty, 
> typically somewhere in the 1-10% range on x86, depending on the generation of 
> chip used. It has been _mostly_ mitigated on recent x86 chips, and IIRC 
> Intel's C compiler does have an option to align all structs and arrays to a 
> 16 byte boundary.

Yes, there is overhead, and there are some unfortunate design choices
(or problems) with the InnoDB page format. Luckily, most page header
and footer fields are reasonably aligned.

> I would be very interested to see some tests data on unalighed access cost on 
> various aarch64 chips. On various 32-bit ARM chips (including those >= ARMv6) 
> the unaligned access performance hit was quite dramatic.

I wonder if the unaligned access could ever end up costing more than
the instruction decoding overhead for implementing multi-byte access
via single-byte operations. (In the past, when unaligned access could
have been supported by an interrupt to the operating system, like
Digital UNIX on the Alpha, I could easily believe it. But, now we are
talking about hardware-supported unaligned access.)

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Some questions about the Aarch64 CI

2019-11-27 Thread Marko Mäkelä

Hi Daniel,

You seem to be right that the compilers are already mostly doing the
right thing. Here is a notable exception where GCC lags behind clang
(unnecessary use of stack):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89804

I created another test program, checking how mach_read_from_4() gets
compiled. It turns out that on Aarch64 and POWER, unaligned reads are
being used by default:
https://godbolt.org/z/ZcavM4

For 32-bit ARM, -march=armv6 seems to enable unaligned reads. For
RISC-V and WebAssembly, the code is rather ugly. :-)

So, indeed, there does not appear to be much to micro-optimize here.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Some questions about the Aarch64 CI

2019-11-26 Thread Marko Mäkelä

Hi Daniel,

On Tue, Nov 26, 2019 at 2:02 AM Daniel Black  wrote:
>
> On Mon, 25 Nov 2019 11:32:07 +0200
> Marko Mäkelä  wrote:
>
>
> > I also found a claim that POWER8 supports unaligned access,
>
> This is correct (for the normal cacheable memory (i.e. not device IO mapped - 
> so not applicable to mariadb))
>
> > and I seem
> > to remember that the latest version of the SPARC introduced support
> > for that as well. (IA-32 and AMD64 have always supported unaligned
> > access, except for some SIMD operations.)
> >
> > Last, I believe that we could get some performance benefits if
> > include/byte_order_generic.h was rewritten in a suitable way. Ideally,
> > include/byte_order_generic_x86_64.h would be replaced with a portable
> > version of both, and compilers could simply perform the optimizations.
> > I have been told that replacing the + in the macros with | could
> > already be a good start. I would welcome patches in this area.
>
> I've never managed to get the time to look at these however a non-aligned 
> version for non-common arches seems a better way to model this.

I pushed my micro-optimization to 10.5:
https://github.com/MariaDB/server/commit/25e2a556de2e125784d52a0c7ccda4fa6595df50

If there really is no compiler flag that would allow any memcpy(),
memset(), memcmp() of 2,4,8 bytes to be translated into simple
(possibly unaligned) multi-byte instructions, then we might add
further MY_ASSUME_ALIGNED() assertions here and there, to allow gcc
and clang to generate better code for POWER and ARM.

If the compiler is smart enough, it might suffice to implement an
accessor for buf_block_t or buf_block_t::frame that would
MY_ASSUME_ALIGNED(frame, 4096). Then the compiler might correctly
infer the alignment of (block->frame + some_compile_time_constant) and
enable the optimization. I would be unwilling to pepper such hints all
over the code.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Some questions about the Aarch64 CI

2019-11-25 Thread Marko Mäkelä

Hello Kevin,

It was my pleasure to meet you in Shanghai.

On my flight back, I worked on a micro-optimization, trying to make
sure that native loads or stores are being used instead of memcpy(),
memset(), memcmp(), when the data is known to be aligned. I filed a
ticket for it:
https://jira.mariadb.org/browse/MDEV-21133 Optimize access to InnoDB
page header fields

My colleague Eugene Kosov pointed out that such loads or stores are
undefined behaviour (and cmake -DWITH_UBSAN=ON would likely agree).
But, he showed that wrapping the arguments of  functions
with __builtin_assume_aligned() actually works:
https://godbolt.org/z/jCF_6q
Eugene also pointed out to some related work:
http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1774r1.pdf

I found a claim that Aarch64 does support unaligned access in practice:
https://stackoverflow.com/questions/38535738/does-aarch64-support-unaligned-access
Can you provide a more authoritative answer? Is there some flag that
should be passed to gcc or clang to enable it to generate simpler
code?
I also found a claim that POWER8 supports unaligned access, and I seem
to remember that the latest version of the SPARC introduced support
for that as well. (IA-32 and AMD64 have always supported unaligned
access, except for some SIMD operations.)

Last, I believe that we could get some performance benefits if
include/byte_order_generic.h was rewritten in a suitable way. Ideally,
include/byte_order_generic_x86_64.h would be replaced with a portable
version of both, and compilers could simply perform the optimizations.
I have been told that replacing the + in the macros with | could
already be a good start. I would welcome patches in this area.

Related note: Maybe a year ago, I was positively surprised to learn
that the InnoDB monster function mach_read_from_4() is being
translated into a single 80486 BSWAP instruction, or an AMD64 MOVBE
instruction.

With best regards,

Marko

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Cannot open datafile for read-only: './dbmail/#sql2-704-271.ibd' OS error: 71

2019-09-16 Thread Marko Mäkelä

On Mon, Sep 16, 2019 at 5:48 PM Reindl Harald  wrote:
> Am 16.09.19 um 16:43 schrieb Marko Mäkelä:
> > Ultimately, for new DDL statements, this should be fixed when
> > MDEV-17567 implements Atomic DDL. I do not think that we can even then
> > safely remove #sql2 tables from old installations, because we must
> > think of upgrade scenarios.
>
> that all don't justify why the drop table simply can't remove the
> reference to something that literally don#t exist for a whole decade
>
> either that idiotic table is known or it's unknown

Like I wrote, we have to think of upgrade scenarios.

Specifically, we must keep in mind that before MDEV-14717 (in MariaDB
Server 10.3 and 10.2.19+), the InnoDB-internal RENAME operations were
not crash-safe at all. So, even if the InnoDB data dictionary thinks
that the table name should start with #sql2, it is perfectly possible
that the data file is known to the file system as something else. The
file is not necessarily entirely missing. And we cannot afford to read
the first page of each data file at startup, to find such orphan
files. That is simply too slow. In fact, I removed such code (which I
believed to be unnecessary) in 10.2.24 and 10.3.15: MDEV-18733 MariaDB
slow start after crash recovery

> if it's unknow get rid of it or at least allow the admin to do so with
> doing some crazy dance introducing way more room for troubles than just
> forget about something which isn't there anyways

You are welcome to submit a well designed, implemented and tested
patch to address this.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Cannot open datafile for read-only: './dbmail/#sql2-704-271.ibd' OS error: 71

2019-09-16 Thread Marko Mäkelä

On Mon, Sep 16, 2019 at 5:26 PM Reindl Harald  wrote:
> Am 16.09.19 um 15:58 schrieb Sergei Golubchik:
> > It looks like the only way to get rid of this warning would be to drop
> > this table manually. Like, `create table t1 ... engine=innodb`
> > then copy t1.idb and t1.frm to #sql2-704-271.* and then drop t1 and
> > `#mysql50#sql2-704-271` tables.

Actually, you should not copy the .ibd file, because that could
trigger an assertion failure in InnoDB. (Heikki's opinion was that the
best thing to do when encountering corruption is to take down the
whole server. I disagree, but it would be a lot of work to fix all the
code.)

InnoDB should allow DROP TABLE just fine even if the .ibd file does not exist.

> it's a shame that you simply can't get rid of garbage from the global
> tablespace

Starting with MDEV-14585 InnoDB actually does drop #sql- tables during
startup. The #sql2 tables are intentionally preserved, because during
ALTER TABLE…ALGORIHTM=COPY there are multiple internal commits, and if
the server is killed at the right moment, then the user table will be
known only by #sql- and #sql2 names. We do not want to remove the only
copy of the table.

Ultimately, for new DDL statements, this should be fixed when
MDEV-17567 implements Atomic DDL. I do not think that we can even then
safely remove #sql2 tables from old installations, because we must
think of upgrade scenarios.

Best regards,

Marko

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] fsync alternative

2019-09-12 Thread Marko Mäkelä

Linux distributions; with luck, something will happen soon and be
backported to the kernels in stable or long-term-support
distributions.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] 10.3.x: unknown variable 'innodb_support_xa=1'

2019-09-02 Thread Marko Mäkelä

On Mon, Sep 2, 2019 at 8:48 PM Reindl Harald  wrote:
> and yes, i assumed that a core developer other than you would look at
> that startup log, came to the conslusion "indeed, no warning there"
> and while read the few lines "indeed, the latest 10.2 version"

You are right that a warning should be issued for deprecated variables
if any value was specified in the configuration file or by a command
line option.

Unfortunately, I am not aware how a storage engine could distinguish
the compile-time default value from a user-specified identical value.
That is why InnoDB did not issue the deprecation warning for you.
Maybe you can submit a fix that would address this deficiency?

Best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] 10.3.x: unknown variable 'innodb_support_xa=1'

2019-09-02 Thread Marko Mäkelä

Hi Faustin,

On Mon, Sep 2, 2019 at 7:05 PM Faustin Lammler  wrote:
[snip]
> Anyway, there maybe some useful information here:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920608
>
> Marko, I believe we can expect these kinds of deprecation with people
> upgrading from Debian Stretch to Buster.

Yes, it is possible. That Debian bug was special in that the options
used to be part of the Debian-shipped configuration file. The user who
reported the bug had edited the configuration file that was shipped
with the older version, so the Debian upgrade would notice a conflict
and ask the user what to do. Apparently the user chose to preserve the
original configuration (with the modifications), and as a result,
MariaDB Server 10.3 would fail to start up. I addressed this by adding
back these as string parameters, and issuing a deprecation warning
whenever any value is specified.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] 10.3.x: unknown variable 'innodb_support_xa=1'

2019-09-02 Thread Marko Mäkelä

On Mon, Sep 2, 2019 at 5:43 PM Reindl Harald  wrote:
> unknown variable 'innodb_support_xa=1'
>
> would you funny guys consider things like deperectaion warnings in
> previous releases

A deprecation warning was added in the MariaDB Server 10.2.2, well
before it was Generally Available.
That said, maybe there is a nontrivial amount of users who skipped the
10.2 release and upgraded straight from 10.1 to 10.3.

> or at least offer an option to change all these
> fucking "i don't know a option because it was removed or you did not
> compile something in and hence i refuse to start the server" to warnings?

That we do if you specify the loose_ prefix to the options.

With best regards,

Marko Mäkelä
Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Issues cross-compiling MariaDB for ARM

2019-06-28 Thread Marko Mäkelä

Hamish,

The error comes from the Connect storage engine, which is a virtual
storage engine that connects to other servers. If you do not need
that, you can disable it. Similar storage engines are Federated,
FederatedX, Spider.

cmake -DPLUGIN_CONNECT=NO

(which was part of my initial reply).

Best regards,
Marko

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Issues cross-compiling MariaDB for ARM

2019-06-28 Thread Marko Mäkelä

I think that the earliest GCC that supports enough of C++11 is 4.8.
So, you should be out of luck compiling 10.4 with GCC 4.7.

You might also try clang, if that is available for your platform.

Marko

On Fri, Jun 28, 2019 at 5:23 PM Hamish MB  wrote:
>
> Hi,
>
> Maybe I should try 10.3 then - the newest cross compiler I can build for
> this platform is GCC 4.7, which has only experimental support for C++11.
> I could also try the bundled Zlib, seeing as I'm trying to use the
> system one at the moment. It gets fairly far, so I imagine it should be
> possible.
>
> Hamish
>
> On 28/06/2019 15:19, Marko Mäkelä wrote:
> > Hi Harnish,
> >
> > It can be WITH_SSL=bundled or WITH_SSL=system. Possibly the bundled
> > zlib is missing some recent change.
> > For WITH_SSL=system to work, you should have installed a package like
> > zlib1g-dev.
> >
> > I think that there are good chances that MariaDB 10.4 can be compiled
> > for ARMv5 (or at least ARMv7), because 10.3 is available in the Debian
> > GNU/Linux repository:
> > https://packages.debian.org/buster/mariadb-server-core-10.3
> >
> > A main difference between 10.3 and 10.4 is that 10.4 requires a
> > compiler that supports C++11.
> >
> > Marko
> >
> > On Fri, Jun 28, 2019 at 5:06 PM Hamish MB  wrote:
> >> Hi Marko,
> >>
> >> I found that option, but eventually ran into a different issue. It turns
> >> out that zlib isn't being found by the linker, even if I explicitly set
> >> the path. Are there any examples of working toolchain.cmake files
> >> you/anyone else knows of?
> >>
> >> I think it's mostly because I'm misconfiguring that file somehow, but
> >> I'm not sure how.
> >>
> >> Hamish
> >>
> >>
> >> On 28/06/2019 15:04, Marko Mäkelä wrote:
> >>> Hi Harnish,
> >>>
> >>> Yes, you can disable storage engines that you do not need, like this:
> >>> cmake -DPLUGIN_MROONGA=NO
> >>>
> >>> This syntax works starting with MariaDB 10.1.
> >>>
> >>> For example, I am developing InnoDB, and to cut the time for
> >>> compilation and running tests, I disable a bunch of storage engines:
> >>>
> >>> cmake -DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO
> >>>
> >>> Side note: I would not be surprised if you ran into problems with
> >>> TokuDB. I hope that we can finally remove it in MariaDB 10.5:
> >>> https://jira.mariadb.org/browse/MDEV-19780
> >>>
> >>> With best regards,
> >>>
> >>> Marko Mäkelä
> >>> Lead Developer InnoDB
> >>> MariaDB Corporation
> >> ___
> >> Mailing list: https://launchpad.net/~maria-discuss
> >> Post to : maria-discuss@lists.launchpad.net
> >> Unsubscribe : https://launchpad.net/~maria-discuss
> >> More help   : https://help.launchpad.net/ListHelp
> >
> >
> ___
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp



-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Issues cross-compiling MariaDB for ARM

2019-06-28 Thread Marko Mäkelä

Hi Harnish,

It can be WITH_SSL=bundled or WITH_SSL=system. Possibly the bundled
zlib is missing some recent change.
For WITH_SSL=system to work, you should have installed a package like
zlib1g-dev.

I think that there are good chances that MariaDB 10.4 can be compiled
for ARMv5 (or at least ARMv7), because 10.3 is available in the Debian
GNU/Linux repository:
https://packages.debian.org/buster/mariadb-server-core-10.3

A main difference between 10.3 and 10.4 is that 10.4 requires a
compiler that supports C++11.

Marko

On Fri, Jun 28, 2019 at 5:06 PM Hamish MB  wrote:
>
> Hi Marko,
>
> I found that option, but eventually ran into a different issue. It turns
> out that zlib isn't being found by the linker, even if I explicitly set
> the path. Are there any examples of working toolchain.cmake files
> you/anyone else knows of?
>
> I think it's mostly because I'm misconfiguring that file somehow, but
> I'm not sure how.
>
> Hamish
>
>
> On 28/06/2019 15:04, Marko Mäkelä wrote:
> > Hi Harnish,
> >
> > Yes, you can disable storage engines that you do not need, like this:
> > cmake -DPLUGIN_MROONGA=NO
> >
> > This syntax works starting with MariaDB 10.1.
> >
> > For example, I am developing InnoDB, and to cut the time for
> > compilation and running tests, I disable a bunch of storage engines:
> >
> > cmake -DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO
> >
> > Side note: I would not be surprised if you ran into problems with
> > TokuDB. I hope that we can finally remove it in MariaDB 10.5:
> > https://jira.mariadb.org/browse/MDEV-19780
> >
> > With best regards,
> >
> > Marko Mäkelä
> > Lead Developer InnoDB
> > MariaDB Corporation
> ___
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp



-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Issues cross-compiling MariaDB for ARM

2019-06-28 Thread Marko Mäkelä

Hi Harnish,

Yes, you can disable storage engines that you do not need, like this:
cmake -DPLUGIN_MROONGA=NO

This syntax works starting with MariaDB 10.1.

For example, I am developing InnoDB, and to cut the time for
compilation and running tests, I disable a bunch of storage engines:

cmake -DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO

Side note: I would not be surprised if you ran into problems with
TokuDB. I hope that we can finally remove it in MariaDB 10.5:
https://jira.mariadb.org/browse/MDEV-19780

With best regards,

Marko Mäkelä
Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2019-03-15 Thread Marko Mäkelä

Hi Michael,
> Hope it is okay to resurrect this thread.  Restart times where great with 
> 10.2.14, until I recently enabled FTS on thousands of tables.  Shutdown and 
> startup both take over an hour.  This, coupled with MariaDB eating up far 
> greater sums of allocated memory in my.cnf, I need to restart every few few 
> days or risk hitting the OOM Killer.
>
> Any ideas?

I am sorry, but there have been several problems with InnoDB fulltext
indexes. My personal opinion is that the fulltext search in InnoDB was
badly designed and is lacking essential features. These observations
still apply to it:
https://mariadb.com/resources/blog/initial-impressions-innodb-fulltext
I hope that some day, MariaDB will gain a redesigned fulltext search
that works with any storage engine.

Some synchronization problems or hangs have been fixed since 10.2.14,
but I am not convinced that the code is free of race conditions or
hangs. Like Daniel Black pointed out, stack traces of all threads
would be very helpful.

MDEV-16557 (10.2.17) might help on shutdown, because it would allow
some fulltext search operations to be interrupted inside InnoDB. (All
currently running queries are supposed to be aborted on shutdown.)

> 2019-03-07 20:20:35 140185550460672 [ERROR] InnoDB: (Duplicate key) writing 
> word node to FTS auxiliary index table.
This basically reports a bug in InnoDB fulltext indexing. We could
find this one by running a server (in a testing environment) inside
gdb, and setting a breakpoint to where this message is reported.

Because of this bug, I suppose that your fulltext searches can miss
some results that should have matched the search pattern. (Please file
this bug separately with some more details, preferably with a
repeatable test case.)

While searching our bug database for the above message, I noticed that
you had filed MDEV-18867 for these slow start/shutdown issues. I would
suggest to continue the diagnosis there.

> Startup (close to 1 hour):
[snip]
> 2019-03-07 20:38:38 140573848283328 [Note] InnoDB: Highest supported file 
> format is Barracuda.
> 2019-03-07 21:31:01 140573848283328 [Note] InnoDB: 128 out of 128 rollback 
> segments are active.

According to the gdb-startup.txt that you attached to MDEV-18867 the
problem is fts_check_and_drop_orphaned_tables(), which we can remove
once MDEV-18518 implements atomic multi-table transactions in InnoDB,
possibly in MariaDB 10.5.

When it comes to the slow shutdown, I would recommend to try with the
newest MariaDB 10.2 release, and report details if the problem remains
with that.

One more source of "slow start" remains in the case when some redo log
was applied in recovery (which is not the case in the log that you
posted). In this case, InnoDB is reading every data file, to validate
that all the .ibd files have correct names and tablespace IDs. I
believe that this check is redundant and can be removed, now that
MariaDB 10.2.19+ (and 10.3 or 10.4) have crash-safe RENAME operations
inside InnoDB. I have filed that as
https://jira.mariadb.org/browse/MDEV-18733 MariaDB slow start after
crash recovery

With best regards,

Marko

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB 10.3 and largepages on Linux

2018-10-11 Thread Marko Mäkelä

On Tuesday, September 25, 2018 at 11:48 Kenneth Penza (kpe...@gmail.com) wrote:
> Startup messages:
>
> Sep 25 10:40:53 srv1 mysqld: 2018-09-25 10:40:52 0 [Note] InnoDB: Opened 8 
> undo tablespaces
> Sep 25 10:40:53 srv1 mysqld: 2018-09-25 10:40:53 0 [Warning] InnoDB: Failed 
> to set memory to DODUMP: Invalid argument ptr 0x2aaac540 size 2097152
> Sep 25 10:40:53 srv1 mysqld: 2018-09-25 10:40:53 0 [Note] InnoDB: 128 out of 
> 128 rollback segments in 8 undo tablespaces are active.
>
> Shutdown messages:
>
> Sep 25 10:41:18 srv1 mysqld: 2018-09-25 10:41:18 0 [Note] InnoDB: Buffer 
> pool(s) dump completed at 180925 10:41:18
> Sep 25 10:41:19 srv1 mysqld: 2018-09-25 10:41:19 0 [Warning] InnoDB: Failed 
> to set memory to DODUMP: Invalid argument ptr 0x2aaac340 size 33554432
> Sep 25 10:41:19 srv1 mysqld: 2018-09-25 10:41:19 0 [Note] InnoDB: Shutdown 
> completed; log sequence number 1522725; transaction id 22
>
> Do you have ideas on what is causing "InnoDB: Failed to set memory to DODUMP: 
> Invalid argument ptr .. " warning?

Thanks to your report, Daniel Black (the author of a contribution that
excludes large InnoDB buffers from core dumps) found out that it is a
bug in the Linux kernel. His fix will be in the 4.19 kernel, and it
was also included in a backport queue for the maintenance updates of
older kernels:

https://lwn.net/Articles/767784/
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=c386d043e8e7718b218731b2868453335245ceeb

For future reference, I have updated
https://jira.mariadb.org/browse/MDEV-10814 with this information.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB 10.3 and largepages on Linux

2018-09-25 Thread Marko Mäkelä

Hi Kenneth,

> Do you have ideas on what is causing "InnoDB: Failed to set memory to DODUMP: 
> Invalid argument ptr .. " warning?

The message that you are seeing is due to
MDEV-10814: Innodb large allocations - madvise - Don't dump
https://jira.mariadb.org/browse/MDEV-10814

I have informed the contributor about this.

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] How does InnoDB delete rows?

2018-09-13 Thread Marko Mäkelä

Hi Arlindo,

2018-07-27 22:06 GMT+03:00 Arlindo Neto :
> Hi everyone.
>
> I have some questions regarding InnoDB deletion process.
>
> As far as I know, when we issue a DELETE command to a row stored in a InnoDB
> table, the row contents are stored in the UNDO Log, and the contents of the
> row are only deleted when the last transaction after the DELETE transaction
> has finished.

That is true. History will be preserved in undo logs and index trees,
so that any active read views associated with transactions that
started before the transaction (which made changes) committed, will
have a consistent read view, as if the DELETE did not take place. So,
DELETE is a special form of UPDATE, which will only update the
delete-mark flag in the index records.

The oldest active transaction read view is also called the "purge
view". Anything older than that may be purged, but purge might also be
lagging a little bit behind.

The only case when records can actually be deleted outside purge is
transaction rollback. A "fresh insert" can be rolled back by
immediately deleting the index records. But, an insert may also be
performed by updating a delete-marked purgeable record. In that case,
the rollback will have to do a purge-like check: delete the record if
it is not visible in any purge view. (This is a bit tricky for
secondary indexes, because secondary index records do not contain a
per-record transaction identifier.)

I explained some of this in the "Deep Dive" talk at M18:
https://vimeo.com/258533150
https://docs.google.com/presentation/d/1bP3yh57B58yfGDd-34TZ5MeiacIdaseaTLpd_Ov5NI8/

> But what actually happens to the row that was deleted? Is any> disk space 
> ever released to the OS, or are the contents simply erased?

InnoDB keeps a PAGE_FREE stack of purged records. An insert would
allocate space from the top of that stack unless deleted record was
smaller than the one that is about to be inserted. It would fall back
to allocating new space from the "page heap". If the page would get
too full, and if the record would fit after removing the garbage from
the page, then there would be a btr_page_reorganize() operation that
would rebuild the page, followed by an insert. If the insert does not
fit, then the page would be split.

Page merges will typically be done lazily. MySQL 5.7 (and MariaDB
10.2) introduced the table comment keyword MERGE_THRESHOLD for
controlling it.

> Defragmenting a table is only done through OPTIMIZE TABLE, or are there any 
> automatic threads responsible for rearranging the rows?

In MariaDB 10.1 and later, there is an option. After
innodb_defragment=ON, OPTIMIZE TABLE would not rebuild the table, but
instead invoke a defragmentation operation. This would not shrink the
data file, however.
By default, OPTIMIZE TABLE does rebuild the table. It is supported as
an online operation, but the log of concurrent DML operations would
increase the storage requirements

> And if the are no such threads, how do we maintain tables which suffer from 
> many deletions?

Generally, it could be good to avoid mass deletions with InnoDB when
possible. Alternatives could be to partition the table in a suitable
way if possible, and then use DROP PARTITION for the mass deletions.
Or, you could copy the "surviving" records to a new table and then
drop the old one.

I hope that this helps.

Best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Is disabling doublewrite safe on ZFS?

2018-08-21 Thread Marko Mäkelä

2018-08-21 11:16 GMT+03:00 Gionatan Danti :
> On 20/08/2018 21:40, Reinis Rozitis wrote:
>> Those atomic write hardware bits the same as ZFS [1] just ensure that any
>> write either lands on the metal (which also tend to lie - write caches on
>> drives/controllers, silent errors, bit rot etc) or not. If the application
>> itself issues a 4KB write when it actually needed to write 8KB.
>
> What puzzles me it that ext4 with data=journal *should* guarantee atomic
> writes also, but disabling doublewrites led to InnoDB corruption in case of
> mysqld crash/kill

I believe that the Linux kernel can interrupt any write at 4096-byte
boundaries when a signal is delivered to the process.
I am curious: Where was it claimed that data=journal guarantees atomic
writes (other than [1])?
I would expect it to only guarantee that anything that was written to
the journal will be durable.
Whether the actual write request was honored in full is a separate matter.

>> It has been covered in some recent Percona blog comments [2] (by some
>> authorities in mysql world)
>>
>> [2]
>> https://www.percona.com/blog/2017/12/07/hands-look-zfs-with-mysql/#comment-10968768
>
>
> Thanks for the link. The problem I have with such blog is that, in the past,
> it gave incorrect information on doublewrite safety guarantees [1]
> Even in the comments section of the link you posted there are users warning
> against disabling checksum (which is another can of worms, for sure).
>
> Well, it seems I just need to do some in-house testing... :p
> I'll report back any interesting findings.

> [1]
> https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-transactions/

Please report back any findings, whether or not you consider them to
be interesting.

I believe that it is technically possible for a copy-on-write
filesystem like ZFS
to support atomic writes, but for that to be possible in practice, the
interfaces
inside the kernel must be implemented in an appropriate way.
Disclaimer: I have no knowledge of the implementation details of any kernel.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Upgrade paths through more major versions

2018-06-06 Thread Marko Mäkelä

2018-06-06 15:20 GMT+03:00 Guillaume Lefranc :
> Hi Michal,
>
> Le mer. 6 juin 2018 à 12:02, Michal Schorm  a écrit :
>> I see a raise of users (of RH products - RHEL and RHSCL mainly) that would
>> like to jump several versions.
>> They had an application build on, let's say, 5.5 (RHEL7 default) and would
>> like to get the features from 10.2 (available in RH software collections -
>> we provide plenty of versions this way).
>> So far, they have to upgrade one by one 5.5->10.0->10.1->10.2; and solve
>> the conflicts at each stage.
>
> That's a waste of time. Such an upgrade is perfectly possible through the
> means of mysqldump and reload. In-place upgrades might cause some issues
> because of innodb version changes, but for example, from 10.0 to 10.1, they
> are rarely an issue because the version of InnoDB hasn't changed. The
> recommended upgrade path from any version is always the same e.g. dump and
> reload.

The InnoDB redo log format changed in an incompatible way between
MariaDB 10.1 and 10.2, reflecting the change between MySQL 5.6 and
5.7. Due to this change, you cannot kill the old database server and
upgrade to a newer one. The new server will refuse to start up on old
log file if recovery would be needed.

(While you would probably not normally kill the database, there have
been some shutdown hang bugs, which could cause an upgrade script to
kill the server.)

For MariaDB 10.3, the redo and undo log formats were changed again,
but we do test in-place upgrades. From MariaDB 10.2 to 10.3, even
crash-upgrades have been tested. But I would not guarantee the ability
of future versions to upgrade after a crash, because we have plans to
change the InnoDB redo log format. We can check that the old log is in
a clean state, but I would not want to keep all the code for parsing
and applying old logs.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Why MyISAM as default engine for system tables ?

2018-06-06 Thread Marko Mäkelä

2018-06-06 18:48 GMT+03:00 Reindl Harald :
> 2018-06-06 17:45:27 140034181385856 [ERROR] InnoDB: Could not find a
> valid tablespace file for `dbmail/#sql2-704-271`. Please refer to
> http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html
> for how to resolve the issue.

MariaDB 10.3 would drop #sql tables inside InnoDB at startup, so you
should not need to do that manually.
That was fixed in MDEV-14585. I felt that to do this safely, we also
need MDEV-14717 (making RENAME TABLE crash-safe inside InnoDB), which
required an undo log format change. That one was not possible in a GA
version, so I did both in 10.3.

I don't think that having system tablespaces in InnoDB would be a good
option. Some users might run with MyRocks instead of InnoDB, for
example.
The Aria engine should always be there, because it can be used for
temporary storage during query execution.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Why MyISAM as default engine for system tables ?

2018-06-06 Thread Marko Mäkelä

2018-06-06 17:21 GMT+03:00 Roberto Spadim :
>
> Last time i had a innodb problem i changed all innodb to one file per table

Unfortunately, there still is the single point of failure for InnoDB:
the system tablespace. It contains the InnoDB data dictionary and the
state of the transaction system. If it becomes corrupted, you can be
really out of luck. You can try to import your .ibd files to an empty
server instance and hope for the best.

I hope that some day we will be able to eliminate it in MariaDB, and
have a transactional, crash-safe data dictionary implemented above the
storage engine layer. At the minimum, create/drop/rename operations of
.frm files would be logged like transactions.

I have written down some plans regarding this:
https://jira.mariadb.org/browse/MDEV-11633
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] ALTER TABLE ENIGNE=InnoDB performance

2018-01-17 Thread Marko Mäkelä

Hi Conor,

Did you try the following?
ALTER TABLE t1 EXCHANGE PARTITION p1 WITH TABLE t2;

It might allow you to work around the problem. You'd first need to
convert the partitions to tables, then export and import the .ibd
files, and finally exchange with partitions again.

Disclaimer: I only know InnoDB, not partitioning. I hope that this
will actually only rename the tables inside InnoDB. (In InnoDB, each
partition is treated as a separate table.)

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

DON’T MISS

M|18

MariaDB User Conference

February 26 - 27, 2018

New York City

https://m18.mariadb.com/

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] ALTER TABLE ENIGNE=InnoDB performance

2018-01-17 Thread Marko Mäkelä

Hi Conor,

There is a bug report for the error messages now:

https://jira.mariadb.org/browse/MDEV-14941
MDEV-14941 Timeouts on persistent statistics tables after upgrade

On Sat, Jan 13, 2018 at 1:30 AM, Conor Murphy
 wrote:
> However, the whole exercise has reached a dead-end as we need to be able to
> replicate the database and as far as I can tell we need "ALTER TABLE t
> DISCARD PARTITION ALL TABLESPACE" which isn't available.

MariaDB is indeed missing the ALTER TABLE t DISCARD|IMPORT
PARTITION…TABLESPACE syntax.
MariaDB is also missing the "MySQL 5.7 native InnoDB partitioning"
which probably means that their code change cannot be easily ported.

The following commit in MySQL 5.7 added the the syntax:

commit 9c4d999381bb8bfcc48921c0ece378f877e47bb3
Author: Mattias Jonsson 
Date:   Thu Feb 6 14:02:06 2014 +0100

WL#6868: Support transportable tablespaces for single innodb partition.

(Based on WL#6867.)

Made alter_info.partition_names a List instead of List,
so it is the same as table_list.partition_names.

Added syntax support for:
ALTER TABLE t DISCARD PARTITION  TABLESPACE
ALTER TABLE t IMPORT PARTITION  TABLESPACE
which will only DISCARD/IMPORT the listed partitions.

Note that FLUSH FOR EXPORT still works on table level!

The following is the merge commit to MySQL 5.7 (which was called
mysql-trunk at that time).

commit 6be6f3e3820183f315c153dafa12465e6832851d
Merge: d168602867c e7fc4dce3ef
Author: Mattias Jonsson 
Date:   Wed Feb 26 10:32:57 2014 +0100

WL#6867: Support transportable tablespaces for partitioned innodb tables.

Merge into mysql-trunk.

I would not be keen to update the DISCARD/IMPORT TABLESPACE in
MariaDB, unless a customer is asking for it. I would rather have
faster ALTER TABLE…ALGORITHM=COPY (MDEV-11415) and bulk inserts
(MDEV-515) and more easily transportable InnoDB files (just FLUSH
TABLES…FOR EXPORT, copy the files, and let the server discover them,
similar to how it works with MyISAM). The DISCARD/IMPORT feels like a
crude hack to me. DISCARD is breaking referential integrity of the
data dictionary, and the adjustments during IMPORT (which could be
avoided with some file format changes) are neither atomic nor
crash-safe.
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

DON’T MISS

M|18

MariaDB User Conference

February 26 - 27, 2018

New York City

https://m18.mariadb.com/

___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2018-01-16 Thread Marko Mäkelä

Hi Mike,

I will use MDEV-13869 MariaDB slow start
 for tracking this fix. The fix
is not as trivial as I thought; I got some of my corruption-injection tests
failing.
Furthermore, I think that in MariaDB 10.2+, we could remove the extended
checks also when crash recovery is needed, to avoid unnecessarily long
downtime. That one is the already mentioned MDEV-9843
 InnoDB hangs on startup…
. Related to that, there is
also MDEV-14481
Execute InnoDB crash recovery in the background
 which we could have in MariaDB
10.4 or maybe even earlier.

Best regards,

Marko
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2018-01-16 Thread Marko Mäkelä

Hi Mike,

Sorry, I was busy with other development tasks.

In the error log that you attached, the first startup upgrades the redo log
from the pre-MySQL 5.7.9/MariaDB 10.2.2 format:

> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: Upgrading redo log:
> 2*536870912 bytes; LSN=1395657325222
> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: Upgrading redo log:
> 2*536870912 bytes; LSN=1395657325222
> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: Starting to delete and
> rewrite log files.
> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: Setting log file
> ./ib_logfile101 size to 536870912 bytes
> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: Setting log file
> ./ib_logfile1 size to 536870912 bytes
> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: Renaming log file
> ./ib_logfile101 to ./ib_logfile0
> 2018-01-08  5:39:50 140390495905984 [Note] InnoDB: New log files created,
> LSN=1395657325222
> 2018-01-08  6:01:49 140390495905984 [Note] InnoDB: 128 out of 128 rollback
> segments are active.


This should not count as a crash recovery, because we would refuse startup
unless the redo log was logically empty (crash-upgrade). So, something is
wrong already here. There is one more strange thing from that startup:

> 2018-01-08  6:01:49 140390495905984 [Note] InnoDB: Waiting for purge to
> start
> 2018-01-08  6:01:49 140390495905984 [Note] InnoDB: 5.7.20 started; log
> sequence number 0
>
Why is it reporting 0 instead of 1395657325222 or more? We are missing an
assignment srv_start_lsn = log_sys->lsn when upgrading the redo log.

Then, there is an apparently clean shutdown and subsequent startup with no
sign of any crash recovery taking place. But still we have the following
delay:

> 2018-01-08  8:23:03 140364886096064 [Note] InnoDB: Highest supported file
> format is Barracuda.
> 2018-01-08  8:45:14 140364886096064 [Note] InnoDB: 128 out of 128 rollback
> segments are active.
>

There are only two assignments recv_needed_recovery=true. Together with the
assignments, we should output one of the following messages:
>
> [Note] InnoDB: Starting crash recovery from checkpoint LSN=_
> [Note] InnoDB: The log sequence number _ in the system tablespace does not
> match the log sequence number _ in the ib_logfiles!
>
But, I do not see either message in the log excerpt.

Oh, now I realize that I was misled by the following in the gdb trace that
you previously attached:

> #11 0x558a37dd1c24 in fil_ibd_open (validate=, 
> validate@entry=true, fix_dict=, 
> purpose=purpose@entry=FIL_TYPE_TABLESPACE, id=1233316, flags=,
> space_name=, path_in=0x558a8d79af60 
> "./ddx_lab_801002/pickups.ibd") at 
> /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/fil/fil0fil.cc:4211
> #12 0x558a37db15c9 in dict_check_sys_tables (validate=true) at 
> /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/dict/dict0load.cc:1482
> #13 dict_check_tablespaces_and_store_max_id (validate=) at 
> /home/buildbot/buildbot/build/mariadb-10.2.12/storage/innobase/dict/dict0load.cc:1537
>
> Actually, we do not know what the value of the parameter was in stack
frame 13. It might very well have been passed as 'false'.
In dict_check_sys_tables() I see this code which is not present in MySQL
5.7:

/* Check that the .ibd file exists. */
validate = true; /* Encryption */

dberr_terr = fil_ibd_open(
validate,

Thie above assignment 'validate = true' is the smoking gun! It was added in
MariaDB 10.2.2
<https://github.com/MariaDB/server/commit/2e814d4702d71a04388386a9f591d14a35980bfe>,
apparently by mistake. I just checked the differences in the function
dict_check_sys_tables() between MariaDB 10.2 and MySQL 5.7, and this
assignment is the only one that does not make any sense whatsoever.

I will shortly push a fix.

Thank you for your patience (and sorry for the pun)!

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

DON’T MISS

M|18

MariaDB User Conference

February 26 - 27, 2018

New York City

https://m18.mariadb.com/
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] ALTER TABLE ENIGNE=InnoDB performance

2018-01-11 Thread Marko Mäkelä

Hi Conor,

The error message for persistent statistics is due to something that I
recently changed in MariaDB 10.2.12 and 10.3.3:

commit 7dc6066dead562e70a68e6727fe4ee65d0bd0c72
> <https://github.com/MariaDB/server/commit/7dc6066dead562e70a68e6727fe4ee65d0bd0c72>
> Author: Marko Mäkelä 
> Date:   Fri Dec 1 16:51:24 2017 +0200
>
> MDEV-14511 <https://jira.mariadb.org/browse/MDEV-14511> Use fewer
> transactions for updating InnoDB persistent statistics
>

This patch tries to avoid some consistency problems related to InnoDB
persistent statistics. It is a long story, but the persistent statistics
are being written by an InnoDB internal SQL interpreter that requires the
InnoDB data dictionary cache to be locked.
Before the above change, the statistics were written during DDL in separate
transactions, which could unnecessarily reduce performance (each commit
would require a redo log flush) and break atomicity, because the statistics
would be updated separately from the dictionary transaction.

However, because it is unacceptable to hold the data dictionary cache
locked while suspending the execution for waiting for a record lock (in the
mysql.innodb_index_stats or mysql.innodb_table_stats tables) to be
released, any lock conflict will immediately be reported as "lock wait
timeout".

As far as I understand, such failures to update statistics did occur
earlier as well, but maybe less frequently, because we would really suspend
the thread while holding the dictionary lock. It is not the end of the
world if the statistics are missing or a bit off. As a matter of fact, I
noticed a change in MySQL 5.7.11
<https://github.com/mysql/mysql-server/commit/8bae549264f174a926cd3e52561cad4f31e90f07>
(and MariaDB 10.2) that will purposely drop the statistics in a certain
case of ALTER TABLE, and rely on a subsequent ha_innobase::open() to update
them. It could actually make sense to do this on all ALTER TABLE
operations. If we did this, then we would also rename tables in the
statistics tables less often (only on RENAME TABLE), and would be less
prone to get these errors. I do not think that it makes sense to ever
update or rename statistics for the #sql table names.

I think that the likelihood of the problem is increased by executing
multiple ALTER TABLE statements in parallel. It also "helps" to use
partitioning.

So, in summary, MariaDB 10.2.12 should not be more broken than it was
before, and I would like you to submit an issue at https://jira.mariadb.org/
so that this can be fixed in an upcoming release.

On Tue, Jan 9, 2018 at 3:00 PM, Conor Murphy 
wrote:

> Now when we run the "ALTER TABLE .. ENIGNE=InnoDB", it takes hours from
> the ALTERs to complete and we're getting the following log entries
>
>
> 2018-01-09 12:36:03 140132036818688 [ERROR] InnoDB: Cannot save table
> statistics for table `statsdb`.`#sql-6d1a_12#P#P20160101` /* Partition
> `P20160101` */: Lock wait timeout
>

On a related note, the ALTER TABLE…ALGORITHM=COPY performance for InnoDB
should be improved by MDEV-11415
<https://jira.mariadb.org/browse/MDEV-11415> and MDEV-515
<https://jira.mariadb.org/browse/MDEV-515> in later major MariaDB releases.
One thing that you could try would be to:

ALTER TABLE … DROP INDEX i, DROP INDEX j, …, ENGINE=InnoDB;
ALTER TABLE … ADD INDEX (i), ADD INDEX(i), ALGORITHM=INPLACE;

Until we have MDEV-515, the ENGINE-converting ALTER will be very slow for
tables that contain secondary indexes. So, you should copy the table to
InnoDB without those indexes, and then create them separately, using a much
faster algorithm. MySQL 5.7 (and MariaDB 10.2) got a more efficient
algorithm for creating the indexes page by page. In earlier versions,
starting with the InnoDB Plugin for MySQL 5.1, the ADD INDEX would only
pre-sort the records of each index, and insert them one by one. Even that
is much faster than what the engine-conversion does: insert each row one by
one, to each index, without any pre-sorting.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

DON’T MISS

M|18

MariaDB User Conference

February 26 - 27, 2018

New York City

https://m18.mariadb.com/
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2018-01-10 Thread Marko Mäkelä

Hi Michael,

Thanks! In the stack trace, I see fsp_flags_try_adjust(), a function that I
had to introduce to fix a file format compatibility problem that had been
introduced in MariaDB 10.1.0:
MDEV-11623 <https://jira.mariadb.org/browse/MDEV-11623>, commit ab1e6fefd
<https://github.com/MariaDB/server/commit/ab1e6fefd869242d962cb91a006f37bb9ad534a7>
in MariaDB 10.1.21, 10.2.4. The format compatibility problem mostly affects
a non-default innodb_page_size.

However, the "root cause" for this slowdown is the validate=true in the
following code in the function innobase_start_or_create_for_mysql():

bool validate = recv_needed_recovery
&& srv_force_recovery == 0;

dict_check_tablespaces_and_store_max_id(validate);

Normally, we should have validate=false, because no crash recovery should
be needed. If the server was not shut down cleanly, then there should be a
startup message "Starting crash recovery", and recv_needed_recovery should
hold.
The variable srv_force_recovery is tied to the parameter
innodb_force_recovery, which is 0 by default.

To me, this looks like it could be 'working as designed'. But, you did not
share the server error log from the very start. Was there a crash recovery
message? If not, then it is an interesting bug.

If you copied the data files from a file system snapshot from a running
server, then it is very likely that crash recovery was needed. You could
work around this by setting --innodb-force-recovery=1 when starting up. But
beware that this will cause InnoDB to ignore certain errors.
Maybe a better idea would be to invoke "mariabackup --prepare" and
"mariabackup --copy-back" in order to apply the redo log before starting up
the server on the copied files. And while you are at it, you could also use
"mariabackup --backup" and skip the file system snapshot.

Best regards,

Marko

On Tue, Jan 9, 2018 at 4:32 PM, Michael Caplan  wrote:

> Hi Marko,
>
> Thanks for your patience with me.  I think I have better material for you
> to look at now.
> Best,
>
> Mike
>
>
>
> On 2018-01-09 03:17 AM, Marko Mäkelä wrote:
>
> Michael,
>
> Sorry, I now realize that you must likely have used a stripped executable,
> and therefore the symbols for the InnoDB functions are missing (displayed
> as ?? in the stack trace output from gdb). As far as I understand, gdb
> should be able to resolve the symbols if you install the separate debug
> symbol package:
>
> sudo apt install mariadb-server-core-10.2-dbgsym
>
> I tried to find the installation package corresponding to
> '10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to
> resolve the stack trace manually. As a developer, normally I work with code
> that I build myself from the source code, and therefore I am a bit bad at
> giving advice that applies to the packaged code.
>
> I downloaded and extracted the files from the following packages:
> mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb
> <http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main/m/mariadb-10.2/mariadb-server-10.2-dbgsym_10.2.11+maria%7Exenial_amd64.deb>
> mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb
> <http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main/m/mariadb-10.2/mariadb-server-10.2_10.2.11+maria%7Exenial_amd64.deb>
> mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb
> <http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main/m/mariadb-10.2/mariadb-server-core-10.2-dbgsym_10.2.11+maria%7Exenial_amd64.deb>
> mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb
> <http://mirror.netinch.com/pub/mariadb//mariadb-10.2.11/repo/ubuntu/pool/main/m/mariadb-10.2/mariadb-server-core-10.2_10.2.11+maria%7Exenial_amd64.deb>
>
> The server executable /usr/sbin/mysqld is included in the
> mysql-server-core package. Unfortunately I was unable to match that
> executable with the addresses from your gdb output. I did the following:
>
> file mysqld
> mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux),
> dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux
> 2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped
>
> gdb mysqld
> symbol-file ../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3
> b9160bed.debug
>
> followed by a number of gdb commands. I expected the least significant 12
> bits of the addresses to match in the disassembly of
> ha_initialize_handlerton() and plugin_init(), but that was not the case.
> Maybe I did something wrong.
>
> So, can you please install the debug symbol package and try again?
>
> Best regards,
>
> Marko
>
>
>

-- 
Marko Mäkelä, Lead Devel

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2018-01-08 Thread Marko Mäkelä

Michael,

Sorry, I now realize that you must likely have used a stripped executable,
and therefore the symbols for the InnoDB functions are missing (displayed
as ?? in the stack trace output from gdb). As far as I understand, gdb
should be able to resolve the symbols if you install the separate debug
symbol package:

sudo apt install mariadb-server-core-10.2-dbgsym

I tried to find the installation package corresponding to
'10.2.11-MariaDB-10.2.11+maria~xenial-log' so that I would be able to
resolve the stack trace manually. As a developer, normally I work with code
that I build myself from the source code, and therefore I am a bit bad at
giving advice that applies to the packaged code.

I downloaded and extracted the files from the following packages:
mariadb-server-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb

mariadb-server-10.2_10.2.11+maria~xenial_amd64.deb

mariadb-server-core-10.2-dbgsym_10.2.11+maria~xenial_amd64.deb

mariadb-server-core-10.2_10.2.11+maria~xenial_amd64.deb


The server executable /usr/sbin/mysqld is included in the mysql-server-core
package. Unfortunately I was unable to match that executable with the
addresses from your gdb output. I did the following:

file mysqld
mysqld: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux),
dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux
2.6.32, BuildID[sha1]=7649cc717c0ee820185a633fd27692f3b9160bed, stripped

gdb mysqld
symbol-file
../lib/debug/.build-id/76/49cc717c0ee820185a633fd27692f3b9160bed.debug

followed by a number of gdb commands. I expected the least significant 12
bits of the addresses to match in the disassembly of
ha_initialize_handlerton() and plugin_init(), but that was not the case.
Maybe I did something wrong.

So, can you please install the debug symbol package and try again?

Best regards,

Marko
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2018-01-08 Thread Marko Mäkelä

Hi Michael,

OK, all the interesting functions (InnoDB functions) are displayed as ??.

In the "perf top" it would have helped to drill into the mysqld executable,
and only show its top functions. Apparently a lot of pages are being read;
that is all that I can tell from the screenshots.

Can you try the following:

1. Find the proces ID, e.g., ps ax|grep mysqld
2. gdb /usr/sbin/mysqld
set log file /tmp/gdb.txt
set log on
attach [pid]
thr a a bt
detach
quit

With this we should get detailed stack traces. I am interested in the
thread whose stack trace includes plugin_init and mysqld_main.

Could you please try again?

Marko
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] MariaDB-10.2.11 Long Time To Start

2018-01-08 Thread Marko Mäkelä

Hi Michael,

> 2018-01-07  9:38:18 139980432365760 [Note] InnoDB: Highest supported file
> format is Barracuda.
> 2018-01-07 10:11:48 139980432365760 [Note] InnoDB: 128 out of 128 rollback
> segments are active.
>
This is really unacceptable behaviour. It looks like you have encountered
https://jira.mariadb.org/browse/MDEV-9843 where this exactly same behaviour
has been documented.

Can you reproduce the delay on every startup? Stack traces (from
http://poormansprofiler.org/ or "perf top -g" or similar) during this
startup delay would be very useful.

With best regards,

Marko
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] innodb and corrupted tables

2017-11-20 Thread Marko Mäkelä

Hi all,

On Mon, Nov 13, 2017 at 7:16 PM, Roberto Spadim 
wrote:

> 2017-11-12 22:37 GMT-02:00 Federico Razzoli :
>
>> I just noticed this old MySQL bug:
>> https://bugs.mysql.com/bug.php?id=10132
>>
>> In the comments (2005 to 2014) everyone seems to agree that current
>> behaviour should be treated as a bug.
>> The main commenter now works for MariaDB.
>>
>
That is right. And I recently filed a corresponding MariaDB bug, even
copying the title:
https://jira.mariadb.org/browse/MDEV-13542 Crashing on a corrupted page is
unhelpful

> What is currently your opinion?
>>
>
> if it execute
>
> 1: highly redundant systems. These places will tend to want an immediate 
> total failure because their strategy tends to be to replace or reclone a 
> slave.
>
>
> ok... it take some time to recover innodb,  but
>
> 2: less highly redundant or shared hosting. These places will tend to want to 
> continue limited service to the maximum extent possible.
>
>
> is more interesting, we don't need to stop database, just stop table
> space/table and return an error at engine level
>

Right. I have noticed these two schools of thought, also among Linux kernel
developers. Some want to kill the system ASAP, others want to fail
gracefully.

On a redundant system, you might even want to disable redo logging to speed
up some operations. If the system crashes, you just start over.

Will this behaviour change at some point? If so, what will be the
>> consequences for Galera?
>>
>
> no idea, but it's a critical issue
>

Yes, I hope that it will be possible to fix this at some point, but
unfortunately I cannot say when. It involves extensive changes to the
InnoDB code base, to properly propagate errors up the call stack. It is not
an easy task. Maybe it is feasible to improve the reliability piece by
piece. I am afraid that it can only be done in a development branch before
it reaches beta or GA status.

At Oracle there were no resources allocated on this. The MySQL bug 10132
was closed for some time until I reopened it (in 2014, according to a
comment timestamp). If I remember correctly, the motivation to close the
bug was that MySQL 5.5 introduced the ability of marking an index corrupted
(in CHECK TABLE only, and potentially with more devastating results:
Bug#19584379 Reporting corruption may corrupt the innodb data dictionary,
fixed by me in October 2014). IIRC, also CHECK TABLE could crash InnoDB due
to a corrupted page.

When it comes to Galera, I have understood that there are other ways to
cause inconsistency between the nodes. One example ought to be enabling the
auto-recalculation of persistent statistics and then updating the
mysql.innodb_index_stats or mysql.innodb_table_stats tables from SQL.

With best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation

DON’T MISS

M|18

MariaDB User Conference

February 26 - 27, 2018

New York City

https://m18.mariadb.com/
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] Contributing my ideas with Mariadb

2017-11-11 Thread Marko Mäkelä

Hi Rajat,

Foreign keys in MySQL or MariaDB are a bit of a second-class citizen, only
implemented inside InnoDB. Even the foreign key metadata is only stored
inside InnoDB. Also, foreign keys only work between InnoDB tables.

I think that once those issues have been fixed (once we have foreign keys
implemented avobe the storage engine layer, working on any transactional
tables), it should be more feasible to implement a multi-table truncate
operation. We certainly want the operation to be atomic, so it would have
to work something like this:

1. Lock all the tables, and check that all criteria are met.
2. Atomically write transaction log records to TRUNCATE every table.
3. Truncate the tables.
4. Remove the transaction log records.
5. Unlock the tables and return to the client.

Even if we ignore other storage engines than InnoDB, step 2 is a problem.
The logging of TRUNCATE was rewritten in WL#6501 in MySQL 5.7 in a way that
makes it challenging for tools like Mariabackup or xtrabackup to work. That
logging would have to be rewritten. I think that we need a transactional
"ddl_fixup" table that could also be used for implementing transactional,
fully crash-safe DDL operations with .frm files.

As you can see, the task is rather complex. There are many things that
would have to be refactored first. I do not know how much of this we can do
within the 10.4 release. Soon we will release MariaDB 10.3 as beta, to be
released as GA some time next year. This definitely cannot be done in 10.3.

Best regards,

Marko

On Sat, Nov 11, 2017 at 4:38 PM, Rajat Rawat 
wrote:

> Respected Sir/Mam
>
>
>
> Hello, Myself Rajat from India. I wanted to contribute/work with Mariadb.
>
>
>
> As we know that Mariadb dosen’t have cascade truncate function available
> yet, so I want to contribute to Mariadb towards this topic.
>
> My topic is “*Cascade Truncate*”.
>
>
>
> Since we can’t truncate the parent table unless the referential integrity
> constraints are present in the child table also. Hence, we can do a way
> that we can provide a command that truncate the child table as well.
>
>
> This enhancement is already provided in the Oracle Database in their 12c
> Version, by using the *truncate cascade* command.
>
>
>
> It is better to truncate the child table at the time when we want to
> truncate our parent table.
>
>
> As we know that we have already on Delete Cascade command available, but
> that command works only row by row and not for all rows at one time.
>
>   So, at one time if we want to truncate our whole table then *Cascade
> Truncate* is a better Option.
>
>
>
> That’s all about my proposal. If you like my proposal, then let me know
> about about it.
>
>
>
> I am very eager to contribute for you. I also forked and downloaded on my
> local machine, the file of Maraidb which is available on Github,  and
> currently I am reading/understanding all the patches, but I am confused
> with only 1 thing that from where should I start.
>
>
>
> So, if you like proposal then let me know about it and please guide me
> that from where should I start contributing for it.
>
>
>
> My skills: C, C++, Java, SQL, PL/SQL
>
> Achievement: Oracle Certified Associate Java Programmer SE 7.
>
>
>
> By  contributing for you I will definitely learn many new things and
> implement my ideas, while working with you.
>
>
>
> Hope I could be of any help for you.
>
>
>
> Your sincerely
>
> Rajat
>
> _______
> Mailing list: https://launchpad.net/~maria-discuss
> Post to : maria-discuss@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~maria-discuss
> More help   : https://help.launchpad.net/ListHelp
>
>

-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-discuss] legacy InnoDB compressed tables using row_format=compressed

2017-07-05 Thread Marko Mäkelä

But I think that
LSM-trees such as MyRocks are difficult to beat when it comes to disk space
usage. This of course depends on the workload.

Best regards,

Marko
-- 
Marko Mäkelä, Lead Developer InnoDB
MariaDB Corporation
___
Mailing list: https://launchpad.net/~maria-discuss
Post to : maria-discuss@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-discuss
More help   : https://help.launchpad.net/ListHelp

55 matches

Mail list logo