Re: Btrfs Compression

Austin S. Hemmelgarn Thu, 06 Jul 2017 04:52:14 -0700

On 2017-07-05 23:19, Paul Jones wrote:

While reading the thread about adding zstd compression, it occurred
to me that there is potentially another thing affecting performance -
Compressed extent size. (correct my terminology if it's incorrect). I
have two near identical RAID1 filesystems (used for backups) on near
identical discs (HGST 3T), one compressed and one not. The
filesystems have about 40 snapshots and are about 50% full. The
uncompressed filesystem runs at about 60 MB/s, the compressed
filesystem about 5-10 MB/s. There is noticeably more "noise" from the
compressed filesystem from all the head thrashing that happens while
rsync is happening.


Which brings me to my point - In terms of performance for
compression, is there some low hanging fruit in adjusting the extent
size to be more like uncompressed extents so there is not so much
seeking happening? With spinning discs with large data sets it seems
pointless making the numerical calculations faster if the discs can't
keep up. Obviously this is assuming optimisation for speed over
compression ratio.

Thoughts?That really depends on too much to be certain.  In all likelihood, your

CPU or memory are your bottleneck, not your storage devices. The dataitself gets compressed in memory, and then sent to the storage device,it's not streamed directly there from the compression thread, so if theCPU was compressing data faster than the storage devices could transferit, you would (or at least, should) be seeing better performance on thecompressed filesystem than the uncompressed one (because you transferless data on the compressed filesystem), assuming the datasets arefunctionally identical.


That in turn brings up a few other questions:

* What are the other hardware components involved (namely, CPU, RAM< andstorage controller)? If you're using some dinky little Atom orCortex-A7 CPU (or almost anything else 32-bit running at less than 2GHzpeak), then that's probably your bottleneck. Similarly, if you've got acheap storage controller than needs a lot of attention from the CPU,then that's probably your bottleneck (you can check this by seeing howmuch processing power is being used when just writing to theuncompressed array (check how much processing power rsync uses copyingbetween two tmpfs mounts, then subtract that from the total for copyingthe same data to the uncompressed filesystem)).* Which compression algorithm are you using, lzo or zlib? If the answeris zlib, then what you're seeing is generally expected behavior excepton systems with reasonably high-end CPU's and fast memory, because zlibis _slow_.* Are you storing the same data on both arrays? If not, then thatimmediately makes the comparison suspect (if one array is storing lotsof small files and the other is mostly storing small numbers of largefiles, then I would expect the one with lots of small files to get worseperformance, and compression on that one will just make things worse).This is even more important when using rsync, because the size of thefiles involved has a pretty big impact on it's hashing performance andeven data transfer rate (lots of small files == more time spent insyscalls other than read() and write()).

Additionally, when you're referring to extent size, I assume you meanthe huge number of 128k extents that the FIEMAP ioctl (and at leastolder versions of `filefrag`) shows for compressed files? If that's thecase, then it's important to understand that that's due to an issue withFIEMAP, it doesn't understand compressed extents in BTRFS correctly, soit shows one extent per compressed _block_ instead, even if they areinternally an extent in BTRFS. You can verify the actual number ofextents by checking how many runs of continuous 128k 'extents' there are.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Btrfs Compression

Reply via email to