[MariaDB discuss] Re: Rocksdb as a replacement for toku?

Jonas Krauss via discuss Tue, 22 Aug 2023 01:18:07 -0700

> 4. Myrocks seems to be very slow compared to compressed inno. Was also
reading docs related to it. For its usecase is it more suitable for storing
large rows rather than many small rows?


Cannot confirm this. Compression can be fine-tuned in RocksDB to meet your
requirements, e.g. you can decide which algo and level of the algo should
be used per compaction level.

The main use case for RocksDB is when your indexes do not fit in the InnoDB
buffer pool anymore. This is because at this point InnoDB insert
performance will become a bottleneck. LSM tree insert performance is almost
unaffected by index sizes and will outperform InnoDB in this scenario.

For our application, the RocksDB query performance has always been
sufficient and comparable to that of InnoDB, but we have a highly
customized disk setup with ZFS and L2ARC on SSDs etc.

Regards

Jonas

Am Mo., 21. Aug. 2023 um 07:19 Uhr schrieb Marko Mäkelä via discuss <
[email protected]>:

> On Sun, Aug 20, 2023 at 6:12 PM pslawek83 via discuss
> <[email protected]> wrote:
> >
> > Thanks everyone for the answers. Tested this PAGE_COMPRESSION a bit and
> make some reading. Not sure if I fully understand how that works... is that
> correct?
> >
> >
> > 1. Impossible without punch hole as if we don't have it the server will
> just write partially used pages which will take full page size anyway. But
> in docs it seems it described its use with punch hole disabled. What's the
> reason to still have page compression without punch hole?
>
> Can you please point to the misleading part of the documentation so
> that it can be corrected?
>
> > 2. After enabling compression each page needs to be trimmed using punch
> hole, so if we have 1 TB database and 64k blocks, the database files will
> have over 15 million fragments (1000000000000/64000)
>
> That is correct.
>
> > So if we get like 50% compression and then we'll be able to put another
> 1TB of data into these holes we'll be having close to 31 million file
> fragments. I'm not sure if under such conditions the FS will be able to
> even allocate space efficiently anymore, copy these files, etc. Or im not
> getting something?
>
> For what it is worth, around the same time when the page_compressed
> was being developed for MariaDB Server 10.1, something similar was
> being worked on at Oracle, I think for MySQL 5.7. To my understanding,
> the same hardware partner worked with both companies. The obvious
> answer to
> https://smalldatum.blogspot.com/2015/10/wanted-file-system-on-which-innodb.html
> would have been https://en.wikipedia.org/wiki/Fusion-io_NVMFS if it
> had worked without kernel panics or existed in a mainline kernel. I
> don’t even know if it ever was open sourced.
>
> > 3. I was testing compression of ~1.5GB table with nice results on
> default 16kb page size. One thing i noticed is that l was always getting
> exactly the same compressed size using `du`. However the table read/copy
> times were different from 40s for uncompressed to 80-400 seconds depending
> on algo. Is that possible that is because page size is too low?
>
> As far as I remember, most compression algorithms that are derivatives
> of the LZ77 family have an input buffer size of 32 KiB. From that
> point of view, it would seem to make sense to use innodb_page_size=32k
> or innodb_page_size=64k instead of the default. Furthermore, I think
> that it was a mistake to implement bzip2 support at all. That
> algorithm would work with much larger buffer sizes.
>
> > 4. Myrocks seems to be very slow compared to compressed inno. Was also
> reading docs related to it. For its usecase is it more suitable for storing
> large rows rather than many small rows?
>
> To my understanding, the LSM trees that MyRocks is built on work best
> in an insert-only workload. On top of LSM trees, MyRocks implements
> some caches and Bloom filters. If you have few updates or deletes and
> most reads are accessing recently inserted data, it could work.
>
> > 5. I remember reading that some innodb compression is going to be
> obsolete. Is that row compression? Or there are no plans to obsolete any
> compression model?
>
> I wanted to deprecate and remove the ROW_FORMAT=COMPRESSED that I
> designed and implemented in the InnoDB Plugin for MySQL 5.1 based on
> some high level ideas of Heikki Tuuri between 2005 and 2007.
> https://jira.mariadb.org/browse/MDEV-22367 was the first step towards
> that. Based on community feedback, this won’t be happening. Even after
> some data structure cleanup that took place in MariaDB Server 10.6,
> supporting the format incurs an overhead of some 32 to 40 bytes per
> buffer page descriptor, which is not insignificant. Removing both
> ROW_FORMAT=COMPRESSED and the adaptive hash index would shrink the
> block descriptor to 80 bytes from its current 10.6 size of 160 bytes.
>
> The write performance with ROW_FORMAT=COMPRESSED should have been
> degraded when I implemented the easier-to-parse log record format in
> MDEV-12353 (MariaDB Server 10.5). There are open bugs about
> ROW_FORMAT=COMPRESSED, such as
> https://jira.mariadb.org/browse/MDEV-31574. I would say that the
> InnoDB ROW_FORMAT=COMPRESSED seemed like a good idea back when it was
> implemented, when HDD storage was the only practical option. Just like
> the InnoDB change buffer, which I removed in MariaDB Server 11.0 after
> having to deal with too many cases of corruption:
> https://fosdem.org/2023/schedule/event/innodb_change_buffer/
>
> With best regards,
>
> Marko
> --
> Marko Mäkelä, Lead Developer InnoDB
> MariaDB plc
> _______________________________________________
> discuss mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

_______________________________________________
discuss mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[MariaDB discuss] Re: Rocksdb as a replacement for toku?

Reply via email to