Re: Give up on bcache?

2017-09-27 Thread Austin S. Hemmelgarn

On 2017-09-26 18:46, Ferry Toth wrote:

Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:


On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article=linux414-bcache-
raid=2

I think it might be idle hopes to think bcache can be used as a ssd
cache for btrfs to significantly improve performance.. True, the
benchmark is using ext.

It's a benchmark.  They're inherently synthetic and workload specific,
and therefore should not be trusted to represent things accurately for
arbitrary use cases.


So what. A decent benchmark tries to measure a specific aspect of the fs.
Yes, and it usually measures it using a ridiculously unrealistic 
workload.  Some of the benchmarks in iozone are a good example of this, 
like the backwards read one (there is nearly nothing that it provides 
any useful data for).  For a benchmark to be meaningful, you have to 
test what you actually intend to use, and from a practical perspective, 
that article is primarily testing throughput, which is not something you 
should be using SSD caching for.


I think you agree that applications doing lots of fsyncs (databases,
dpkg) are slow on btrfs especially on hdd's, whatever way you measure
that (it feels slow, it measures slow, it really is slow).
Yes, but they're also slow on _everything_.  fsync() is slow.  Period. 
It just more of an issue on BTRFS because it's a CoW filesystem _and_ 
it's slower than ext4 even with that CoW layer bypassed.


On a ssd the problem is less.
And most of that is a result of the significantly higher bulk throughput 
on the SSD, which is not something that SSD caching replicates.


So if you can fix that by using a ssd cache or a hybrid solution, how
would you like to compare that? It _feels_ faster?
That depends.  If it's on a desktop, then that actually is one of the 
best ways to test it, since user perception is your primary quality 
metric (you can make the fastest system in the world, but if the user 
can't tell, you've gained nothing).  If you're on anything else, you 
test the actual workload if possible, and a benchmark that tries to 
replicate the workload if not.  Put another way, if you're building a 
PGSQL server, you should be bench-marking things with a PGSQL 
bench-marking tool, not some arbitrary that likely won't replicate a 
PGSQL workload.



But the most important one (where btrfs always shows to be a little
slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.

And what makes you think it will be?  You're using it as a hot-data
cache, not a dedicated write-back cache, and you have the overhead from
bcache itself too.  Just some simple math based on examining the bcache
code suggests you can't get better than about 98% of the SSD's
performance if you're lucky, and I'd guess it's more like 80% most of
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on
the fs is used.

Actually, the improvement from using bcache with BTRFS is higher
proportionate to the baseline of not using it by a small margin than it
is when used with ext4.  BTRFS does a lot more with the disk, so you
have a lot more time spent accessing the disk, and thus more time that
can be reduced by improving disk performance.  While the CoW nature of
BTRFS does somewhat mitigate the performance improvement from using
bcache, it does not completely negate it.


I would like to reverse this, how much degradation do you suffer from
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.
Performance-wise?  It's workload dependent, but in most case it's a hit 
regardless of if you're using BTRFS or some other filesystem.


If instead you're asking what the difference in device longevity, you 
can probably expect the SSD to wear out faster in the second case. 
Unless you have a reasonably big SSD and are using write-around caching, 
every write will hit the SSD too, and you'll end up with lots of 
rewrites on the SSD.


IMHO you are hoping to get ssd performance at hdd cost.
Then you're looking at the wrong tool.  The primary use cases for SSD 
caching are smoothing latency and improving interactivity by reducing 
head movement.  Any other measure of performance is pretty much 
guaranteed to be worse with SSD caching than just using an SSD, and bulk 
throughput is often just as bad as, if not worse than, using a regular 
HDD by itself.


If you are that desperate for performance like an SSD, quit whining 
about cost and just buy an SSD.  Decent ones are down to less than 0.40 
USD per GB depending on the brand (search 'Crucial MX300' on Amazon if 
you want an example), so the cost isn't nearly as bad as people make it 
out to be, especially considering that most the time a normal person who 
isn't doing multimedia work or 

Re: Give up on bcache?

2017-09-26 Thread Ferry Toth
Op Tue, 26 Sep 2017 15:52:44 -0400, schreef Austin S. Hemmelgarn:

> On 2017-09-26 12:50, Ferry Toth wrote:
>> Looking at the Phoronix benchmark here:
>> 
>> https://www.phoronix.com/scan.php?page=article=linux414-bcache-
>> raid=2
>> 
>> I think it might be idle hopes to think bcache can be used as a ssd
>> cache for btrfs to significantly improve performance.. True, the
>> benchmark is using ext.
> It's a benchmark.  They're inherently synthetic and workload specific,
> and therefore should not be trusted to represent things accurately for
> arbitrary use cases.

So what. A decent benchmark tries to measure a specific aspect of the fs.

I think you agree that applications doing lots of fsyncs (databases, 
dpkg) are slow on btrfs especially on hdd's, whatever way you measure 
that (it feels slow, it measures slow, it really is slow).

On a ssd the problem is less.

So if you can fix that by using a ssd cache or a hybrid solution, how 
would you like to compare that? It _feels_ faster?

>> But the most important one (where btrfs always shows to be a little
>> slow)
>> would be the SQLLite test. And with ext at least performance _degrades_
>> except for the Writeback mode, and even there is nowhere near what the
>> SSD is capable of.
> And what makes you think it will be?  You're using it as a hot-data
> cache, not a dedicated write-back cache, and you have the overhead from
> bcache itself too.  Just some simple math based on examining the bcache
> code suggests you can't get better than about 98% of the SSD's
> performance if you're lucky, and I'd guess it's more like 80% most of
> the time.
>> 
>> I think with btrfs it will be even worse and that it is a fundamental
>> problem: caching is complex and the cache can not how how the data on
>> the fs is used.
> Actually, the improvement from using bcache with BTRFS is higher
> proportionate to the baseline of not using it by a small margin than it
> is when used with ext4.  BTRFS does a lot more with the disk, so you
> have a lot more time spent accessing the disk, and thus more time that
> can be reduced by improving disk performance.  While the CoW nature of
> BTRFS does somewhat mitigate the performance improvement from using
> bcache, it does not completely negate it.

I would like to reverse this, how much degradation do you suffer from 
btrfs on a ssd as baseline compared to btrfs on a mixed ssd/hdd system.

IMHO you are hoping to get ssd performance at hdd cost.  

>> I think the original idea of hot data tracking has a much better chance
>> to significantly improve performance. This of course as the SSD's and
>> HDD's then will be equal citizens and btrfs itself gets to decide on
>> which drive the data is best stored.
> First, the user needs to decide, not BTRFS (at least, by default, BTRFS
> should not be involved in the decision).  Second, tiered storage (that's
> what that's properly called) is mostly orthogonal to caching (though
> bcache and dm-cache behave like tiered storage once the cache is
> warmed).

So, on your desktop you really are going to seach for all sqllite, mysql 
and psql files, dpkg files etc. and move them to the ssd? You can already 
do that. Go ahead! 

The big win would be if the file system does that automatically for you.

>> With this implemented right, it would also finally silence the never
>> ending discussion why not btrfs and why zfs, ext, xfs etc. Which would
>> be a plus by its own right.
> Even with this, there would still be plenty of reasons to pick one of
> those filesystems over BTRFS.  There would however be one more reason to
> pick BTRFS over ext or XFS (but necessarily not ZFS, it already has
> caching built in).

Exactly, one more advantage of btrfs and one less of zfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Give up on bcache?

2017-09-26 Thread Adam Borowski
On Tue, Sep 26, 2017 at 11:33:19PM +0500, Roman Mamedov wrote:
> On Tue, 26 Sep 2017 16:50:00 + (UTC)
> Ferry Toth  wrote:
> 
> > https://www.phoronix.com/scan.php?page=article=linux414-bcache-
> > raid=2
> > 
> > I think it might be idle hopes to think bcache can be used as a ssd cache 
> > for btrfs to significantly improve performance..
> 
> My personal real-world experience shows that SSD caching -- with lvmcache --
> does indeed significantly improve performance of a large Btrfs filesystem with
> slowish base storage.
> 
> And that article, sadly, only demonstrates once again the general mediocre
> quality of Phoronix content: it is an astonishing oversight to not check out
> lvmcache in the same setup, to at least try to draw some useful conclusion, is
> it Bcache that is strangely deficient, or SSD caching as a general concept
> does not work well in the hardware setup utilized.

Also, it looks as if Phoronix' tests don't stress metadata at all.  Btrfs is
all about metadata, speeding it up greatly helps most workloads.

A pipe-dream wishlist would be:
* store and access master copy of metadata on SSD only
* pin all data blocks referenced by generations not yet mirrored
* slowly copy over metadata to HDD

-- 
⢀⣴⠾⠻⢶⣦⠀ We domesticated dogs 36000 years ago; together we chased
⣾⠁⢰⠒⠀⣿⡁ animals, hung out and licked or scratched our private parts.
⢿⡄⠘⠷⠚⠋⠀ Cats domesticated us 9500 years ago, and immediately we got
⠈⠳⣄ agriculture, towns then cities. -- whitroth on /.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Give up on bcache?

2017-09-26 Thread Austin S. Hemmelgarn

On 2017-09-26 12:50, Ferry Toth wrote:

Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article=linux414-bcache-
raid=2

I think it might be idle hopes to think bcache can be used as a ssd cache
for btrfs to significantly improve performance.. True, the benchmark is
using ext.
It's a benchmark.  They're inherently synthetic and workload specific, 
and therefore should not be trusted to represent things accurately for 
arbitrary use cases.


But the most important one (where btrfs always shows to be a little slow)
would be the SQLLite test. And with ext at least performance _degrades_
except for the Writeback mode, and even there is nowhere near what the
SSD is capable of.
And what makes you think it will be?  You're using it as a hot-data 
cache, not a dedicated write-back cache, and you have the overhead from 
bcache itself too.  Just some simple math based on examining the bcache 
code suggests you can't get better than about 98% of the SSD's 
performance if you're lucky, and I'd guess it's more like 80% most of 
the time.


I think with btrfs it will be even worse and that it is a fundamental
problem: caching is complex and the cache can not how how the data on the
fs is used.
Actually, the improvement from using bcache with BTRFS is higher 
proportionate to the baseline of not using it by a small margin than it 
is when used with ext4.  BTRFS does a lot more with the disk, so you 
have a lot more time spent accessing the disk, and thus more time that 
can be reduced by improving disk performance.  While the CoW nature of 
BTRFS does somewhat mitigate the performance improvement from using 
bcache, it does not completely negate it.


I think the original idea of hot data tracking has a much better chance
to significantly improve performance. This of course as the SSD's and
HDD's then will be equal citizens and btrfs itself gets to decide on
which drive the data is best stored.
First, the user needs to decide, not BTRFS (at least, by default, BTRFS 
should not be involved in the decision).  Second, tiered storage (that's 
what that's properly called) is mostly orthogonal to caching (though 
bcache and dm-cache behave like tiered storage once the cache is warmed).


With this implemented right, it would also finally silence the never
ending discussion why not btrfs and why zfs, ext, xfs etc. Which would be
a plus by its own right.
Even with this, there would still be plenty of reasons to pick one of 
those filesystems over BTRFS.  There would however be one more reason to 
pick BTRFS over ext or XFS (but necessarily not ZFS, it already has 
caching built in).


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Give up on bcache?

2017-09-26 Thread Kai Krakow
Am Tue, 26 Sep 2017 23:33:19 +0500
schrieb Roman Mamedov :

> On Tue, 26 Sep 2017 16:50:00 + (UTC)
> Ferry Toth  wrote:
> 
> > https://www.phoronix.com/scan.php?page=article=linux414-bcache-
> > raid=2
> > 
> > I think it might be idle hopes to think bcache can be used as a ssd
> > cache for btrfs to significantly improve performance..  
> 
> My personal real-world experience shows that SSD caching -- with
> lvmcache -- does indeed significantly improve performance of a large
> Btrfs filesystem with slowish base storage.
> 
> And that article, sadly, only demonstrates once again the general
> mediocre quality of Phoronix content: it is an astonishing oversight
> to not check out lvmcache in the same setup, to at least try to draw
> some useful conclusion, is it Bcache that is strangely deficient, or
> SSD caching as a general concept does not work well in the hardware
> setup utilized.

Bcache is actually not meant to increase benchmark performance except
for very few corner cases. It is designed to improve interactivity and
perceived performance, reducing head movements. On the bcache homepage
there's actually tips on how to benchmark bcache correctly, including
warm-up phase and turning on sequential caching. Phoronix doesn't do
that, they test default settings, which is imho a good thing but you
should know the consequences and research how to turn the knobs.

Depending on the caching mode and cache size, the SQlite test may not
show real-world numbers. Also, you should optimize some btrfs options
to work correctly with bcache, e.g. force it to mount "nossd" as it
detects the bcache device as SSD - which is wrong for some workloads, I
think especially desktop workloads and most server workloads.

Also, you may want to tune udev to correct some attributes so other
applications can do their detection and behavior correctly, too:

$ cat /etc/udev/rules.d/00-ssd-scheduler.rules
ACTION=="add|change", KERNEL=="bcache*", ATTR{queue/rotational}="1"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", 
ATTR{queue/iosched/slice_idle}="0"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", 
ATTR{queue/scheduler}="kyber"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", 
ATTR{queue/scheduler}="bfq"

Take note: on a non-mq system you may want to use noop/deadline/cfq
instead of kyber/bfq.


I'm running bcache since over two years now and the performance
improvement is very very high with boot times going down to 30-40s from
3+ minutes previously, faster app startup times (almost instantly like
on SSD), reduced noise by reduced head movements, etc. Also, it has
easy setup (no split metadata/data cache, you can attach more than one
device to a single cache), and it is rocksolid even when crashing the
system.

Bcache learns by using LRU for caching: What you don't need will be
pushed out of cache over time, what you use, stays. This is actually a
lot like "hot data caching". Given a big enough cache, everything of
your daily needs would stay in cache, easily achieving hit ratios
around 90%. Since sequential access is bypassed, you don't have to
worry to flush the cache with large copy operations.

My system uses a 512G SSD with 400G dedicated to bcache, attached to 3x
1TB HDD draid0 mraid1 btrfs, filled with 2TB of net data and daily
backups using borgbackup. Bcache runs in writeback mode, the backup
takes around 15 minutes each night to dig through all data and stores
it to an internal intermediate backup also on bcache (xfs, write-around
mode). Currently not implemented, this intermediate backup will later
be mirrored to external, off-site location.

Some of the rest of the SSD is EFI-ESP, some swap space, and
over-provisioned area to keep bcache performance high.

$ uptime && bcache-status
 21:28:44 up 3 days, 20:38,  3 users,  load average: 1,18, 1,44, 2,14
--- bcache ---
UUIDaacfbcd9-dae5-4377-92d1-6808831a4885
Block Size  4.00 KiB
Bucket Size 512.00 KiB
Congested?  False
Read Congestion 2.0ms
Write Congestion20.0ms
Total Cache Size400 GiB
Total Cache Used400 GiB (100%)
Total Cache Unused  0 B (0%)
Evictable Cache 396 GiB (99%)
Replacement Policy  [lru] fifo random
Cache Mode  (Various)
Total Hits  2364518 (89%)
Total Misses290764
Total Bypass Hits   4284468 (100%)
Total Bypass Misses 0
Total Bypassed  215 GiB


The bucket size and block size was chosen to best fit with Samsung TLC
arrangement. But this is pure theory, I never benchmarked the benefits.
I just feel more comfortable that way. ;-)


One should also keep in mind: The way how btrfs works cannot optimally
use bcache, as cow will obviously invalidate data in bcache - but
bcache doesn't have knowledge of this. Of course, such 

Re: Give up on bcache?

2017-09-26 Thread Roman Mamedov
On Tue, 26 Sep 2017 16:50:00 + (UTC)
Ferry Toth  wrote:

> https://www.phoronix.com/scan.php?page=article=linux414-bcache-
> raid=2
> 
> I think it might be idle hopes to think bcache can be used as a ssd cache 
> for btrfs to significantly improve performance..

My personal real-world experience shows that SSD caching -- with lvmcache --
does indeed significantly improve performance of a large Btrfs filesystem with
slowish base storage.

And that article, sadly, only demonstrates once again the general mediocre
quality of Phoronix content: it is an astonishing oversight to not check out
lvmcache in the same setup, to at least try to draw some useful conclusion, is
it Bcache that is strangely deficient, or SSD caching as a general concept
does not work well in the hardware setup utilized.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Give up on bcache?

2017-09-26 Thread Ferry Toth
Looking at the Phoronix benchmark here:

https://www.phoronix.com/scan.php?page=article=linux414-bcache-
raid=2

I think it might be idle hopes to think bcache can be used as a ssd cache 
for btrfs to significantly improve performance.. True, the benchmark is 
using ext.

But the most important one (where btrfs always shows to be a little slow) 
would be the SQLLite test. And with ext at least performance _degrades_ 
except for the Writeback mode, and even there is nowhere near what the 
SSD is capable of.

I think with btrfs it will be even worse and that it is a fundamental 
problem: caching is complex and the cache can not how how the data on the 
fs is used.

I think the original idea of hot data tracking has a much better chance 
to significantly improve performance. This of course as the SSD's and 
HDD's then will be equal citizens and btrfs itself gets to decide on 
which drive the data is best stored.

With this implemented right, it would also finally silence the never 
ending discussion why not btrfs and why zfs, ext, xfs etc. Which would be 
a plus by its own right.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html