Re: Why does btrfs benchmark so badly in this case?

2013-08-12 Thread Josef Bacik
On Fri, Aug 09, 2013 at 11:35:33PM +0200, Kai Krakow wrote:
 Josef Bacik jba...@fusionio.com schrieb:
 
  So I guess the reason that ZFS does well with that workload is that
  ZFS is using smaller blocks, maybe just 512B ?
  
  Yeah I'm not sure what ZFS does, but if you are writing over a block and
  the size/offset isn't aligned then you'd see similar issues with ZFS since
  it would
  have to read+modify+write.  It is likely that ZFS just is using a smaller
  blocksize.
 
 From what I remember, ZFS uses dynamic block sizes. However, block size can 
 be forced and thus tuned for workloads that require it:
 
 http://www.joyent.com/blog/bruning-questions-zfs-record-size
 
 Maybe that's the reason...
 
 It would be interesting to see how the benchmarks performed with forced 
 block size.
 

When I did bs=4k in the fio job to force it to use 4k blocksizes we performed
the same as ext4 and xfs.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Why does btrfs benchmark so badly in this case?

2013-08-08 Thread John Williams
Phoronix periodically runs benchmarks on filesystems, and one thing I
have noticed is that btrfs always does terribly on their fio Intel
IOMeter fileserver access pattern benchmark:

http://www.phoronix.com/scan.php?page=articleitem=linux_310_10fsnum=2

Here, btrfs is more than 6 times slower than ext4, and about 3 times
slower than XFS.

Lest we attribute it to an unavoidable downside of COW filesystems and
move on...no, we cannot do that, because ZFS does well here -- btrfs
is about 6 times slower than ZFS!

Note that btrfs does quite well in the other Phoronix benchmarks. It
is just the fio fileserver benchmark that btrfs has problems with.

What is going on here? Why is btrfs doing so poorly?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread Josef Bacik
On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
 Phoronix periodically runs benchmarks on filesystems, and one thing I
 have noticed is that btrfs always does terribly on their fio Intel
 IOMeter fileserver access pattern benchmark:
 
 http://www.phoronix.com/scan.php?page=articleitem=linux_310_10fsnum=2
 
 Here, btrfs is more than 6 times slower than ext4, and about 3 times
 slower than XFS.
 
 Lest we attribute it to an unavoidable downside of COW filesystems and
 move on...no, we cannot do that, because ZFS does well here -- btrfs
 is about 6 times slower than ZFS!
 
 Note that btrfs does quite well in the other Phoronix benchmarks. It
 is just the fio fileserver benchmark that btrfs has problems with.
 
 What is going on here? Why is btrfs doing so poorly?

Excellent question, I'll get back to you on that.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread Clemens Eisserer
 What is going on here? Why is btrfs doing so poorly?

Funny thing, I was thinking exactly the same when reading the article ;)

Regards
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread Josef Bacik
On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
 Phoronix periodically runs benchmarks on filesystems, and one thing I
 have noticed is that btrfs always does terribly on their fio Intel
 IOMeter fileserver access pattern benchmark:
 
 http://www.phoronix.com/scan.php?page=articleitem=linux_310_10fsnum=2
 
 Here, btrfs is more than 6 times slower than ext4, and about 3 times
 slower than XFS.
 
 Lest we attribute it to an unavoidable downside of COW filesystems and
 move on...no, we cannot do that, because ZFS does well here -- btrfs
 is about 6 times slower than ZFS!
 
 Note that btrfs does quite well in the other Phoronix benchmarks. It
 is just the fio fileserver benchmark that btrfs has problems with.
 
 What is going on here? Why is btrfs doing so poorly?

So the reason this workload sucks for btrfs is because we fall back on buffered
IO because fio does not do block size aligned writes for this workload.  If you
add

ba=4k

to the iometer fio file then we go the same speed as xfs and ext4.  Not a whole
lot we can do about this since unaligned writes means we have to read in pages
to cow the block properly, which is why we fall back to buffered.  Once we do
that we end up having a lot of page locking stuff that gets in the way and makes
us twice as slow.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread John Williams
On Thu, Aug 8, 2013 at 12:40 PM, Josef Bacik jba...@fusionio.com wrote:
 On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
 Phoronix periodically runs benchmarks on filesystems, and one thing I
 have noticed is that btrfs always does terribly on their fio Intel
 IOMeter fileserver access pattern benchmark:

 http://www.phoronix.com/scan.php?page=articleitem=linux_310_10fsnum=2

 So the reason this workload sucks for btrfs is because we fall back on 
 buffered
 IO because fio does not do block size aligned writes for this workload.  If 
 you
 add

 ba=4k

 to the iometer fio file then we go the same speed as xfs and ext4.  Not a 
 whole
 lot we can do about this since unaligned writes means we have to read in pages
 to cow the block properly, which is why we fall back to buffered.  Once we do
 that we end up having a lot of page locking stuff that gets in the way and 
 makes
 us twice as slow.  Thanks,

Thanks for looking into it.

So I guess the reason that ZFS does well with that workload is that
ZFS is using smaller blocks, maybe just 512B ?

I wonder how common these type of non-4K aligned workloads are.
Apparently, people with such workloads should avoid btrfs, but maybe
these types of workloads are very rare?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread Josef Bacik
On Thu, Aug 08, 2013 at 01:23:22PM -0700, John Williams wrote:
 On Thu, Aug 8, 2013 at 12:40 PM, Josef Bacik jba...@fusionio.com wrote:
  On Thu, Aug 08, 2013 at 09:13:04AM -0700, John Williams wrote:
  Phoronix periodically runs benchmarks on filesystems, and one thing I
  have noticed is that btrfs always does terribly on their fio Intel
  IOMeter fileserver access pattern benchmark:
 
  http://www.phoronix.com/scan.php?page=articleitem=linux_310_10fsnum=2
 
  So the reason this workload sucks for btrfs is because we fall back on 
  buffered
  IO because fio does not do block size aligned writes for this workload.  If 
  you
  add
 
  ba=4k
 
  to the iometer fio file then we go the same speed as xfs and ext4.  Not a 
  whole
  lot we can do about this since unaligned writes means we have to read in 
  pages
  to cow the block properly, which is why we fall back to buffered.  Once we 
  do
  that we end up having a lot of page locking stuff that gets in the way and 
  makes
  us twice as slow.  Thanks,
 
 Thanks for looking into it.
 
 So I guess the reason that ZFS does well with that workload is that
 ZFS is using smaller blocks, maybe just 512B ?
 

Yeah I'm not sure what ZFS does, but if you are writing over a block and the
size/offset isn't aligned then you'd see similar issues with ZFS since it would
have to read+modify+write.  It is likely that ZFS just is using a smaller
blocksize.

 I wonder how common these type of non-4K aligned workloads are.
 Apparently, people with such workloads should avoid btrfs, but maybe
 these types of workloads are very rare?

So most people who use AIO/O_DIRECT have really specific setups which generally
can adjust how they align stuff (databases for example this would be the db page
and those are usually large, like 16k-32k), or with virtual images which will
hopefully be doing things in block aligned io's, but this depends on the host
OS.  Like I said there isn't a whole lot we can do about this, you can do NOCOW
if you want to get around it without changing your application or you can change
the app to be blocksize aligned.  Thanks,

Josef
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread Chris Murphy

On Aug 8, 2013, at 2:23 PM, John Williams jwilliams4...@gmail.com wrote:
 
 So I guess the reason that ZFS does well with that workload is that
 ZFS is using smaller blocks, maybe just 512B ?

Likely. It uses a variable block size.


 I wonder how common these type of non-4K aligned workloads are.
 Apparently, people with such workloads should avoid btrfs, but maybe
 these types of workloads are very rare?

I can't directly answer the question, but all of the typical file systems on OS 
X, Linux, and Windows default to 4KB block sizes for many years now, baked in 
at creation time. On OS X, the block size varies automatically with respect to 
volume size at fs creation time (it goes to 8KB block sizes above 2TB, and 
scales up to 1MB block sizes), but still isn't ever less than 4KB unless 
manually created this way. So I'd think such workloads are rare.

I also don't know if any common use fs has an optimization whereby just the 
modified sector(s) is overwritten, rather than all sectors making up the file 
system block being modified.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Why does btrfs benchmark so badly in this case?

2013-08-08 Thread Zach Brown
 I also don't know if any common use fs has an optimization whereby
 just the modified sector(s) is overwritten, rather than all sectors
 making up the file system block being modified.

Most of them do.  The generic direct io path allows sector sized dio.
The very first bit of do_blockdev_direct_IO() is testing first for file
system block size alignment then for block device sector size alignment.

You can see this easily with dd conv=notrunc oflags=direct and blktrace.

# blockdev --getss /dev/sda
512
# blockdev --getbsz /dev/sda
4096

# blktrace -d /dev/sda -a issue -o - | blkparse -i - 

$ dd if=/dev/zero of=file bs=4096 count=1 oflag=direct conv=notrunc
  8,03   1435.957320002 17941  D  WS 137297704 + 8 [dd]

$ dd if=/dev/zero of=file bs=512 count=1 oflag=direct conv=notrunc
  8,01431.405641362 17940  D  WS 137297704 + 1 [dd]

- z
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html