[one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Gerry O'Brien

Hi,

Are there any recommendations for a file system performance testing 
suite tailored to OpenNebula typical workloads? I would like to compare 
the performance of zfs v. ext4. One of the reasons for considering zfs 
is that it allows replication to a remote site using snapshot streaming. 
Normal nightly backups, using something like rsync, are not suitable for 
virtual machine images where a single block change means the whole image 
has to be copied. The amount of change is to great.


On a related issue, does it make sense to have datastores 0 and 1 
in a single files system so that the instantiations of non-persistent 
images does not require a copy from one file system to another? I have 
in mind the case where the original image is a qcow2 image.


Regards,
Gerry

--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Carlo Daffara
It's difficult to provide an indication of what a typical workload may be, as 
it depends greatly on the
I/O properties of the VM that run inside (we found that the internal load of 
OpenNebula itself to be basically negligible).
For example, if you have lots of sequential I/O heavy VMs you may get benefits 
from one kind, while transactional and random I/O VMs may be more suitably 
served by other file systems.
We tend to use fio for benchmarks (http://freecode.com/projects/fio) that is 
included in most linux distributions; it provides for flexible selection of 
read-vs-write patterns, can select different probability distributions and 
includes a few common presets (like file server, mail server etc.)
Selecting the bottom file system for the store is thus extremely depending on 
application, feature and load. For example, we use in some configurations BTRFS 
with compression (slow rotative devices, especially when there are several of 
them in parallel), in other we use ext4 (good, all-around balanced) and in 
other XFS. For example XFS supports filesystem replication in a way similar to 
that of zfs (not as sofisticated, though), excellent performance for multiple 
parallel I/O operations.
ZFS in our tests tend to be extremely slow outside of a few sweet spots; a 
fact confirmed by external benchmarks like this one:
http://www.phoronix.com/scan.php?page=articleitem=zfs_linux_062num=3 We tried 
it (and we continue to do so, both for the FUSE and native kernel version) but 
for the moment the performance hit is excessive despite the nice feature set. 
BTRFS continue to improve nicely, and a set of patches to implement 
send/receive like ZFS are here: 
https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive but it is 
still marked as experimental.

I personally *love* ZFS, and the feature set is unparalleled. Unfortunately, 
the poor license choice means that it never got the kind of hammering and 
tuning that other linux kernel filesystem can get.
regards,
carlo daffara
cloudweavers

- Messaggio originale -
Da: Gerry O'Brien ge...@scss.tcd.ie
A: Users OpenNebula users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 13:16:52
Oggetto: [one-users] File system performance testing suite tailored to  
OpenNebula

Hi,

 Are there any recommendations for a file system performance testing 
suite tailored to OpenNebula typical workloads? I would like to compare 
the performance of zfs v. ext4. One of the reasons for considering zfs 
is that it allows replication to a remote site using snapshot streaming. 
Normal nightly backups, using something like rsync, are not suitable for 
virtual machine images where a single block change means the whole image 
has to be copied. The amount of change is to great.

 On a related issue, does it make sense to have datastores 0 and 1 
in a single files system so that the instantiations of non-persistent 
images does not require a copy from one file system to another? I have 
in mind the case where the original image is a qcow2 image.

 Regards,
 Gerry

-- 
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Gerry O'Brien

Hi Carlo,

  Thanks for the reply. I should really look at XFS for the replication 
and performance.


  Do you have any thoughts on my second questions about qcow2 copies 
form /datastores/1 to /datastores/0 in a single filesystem?


Regards,
  Gerry


On 11/09/2013 12:53, Carlo Daffara wrote:

It's difficult to provide an indication of what a typical workload may be, as 
it depends greatly on the
I/O properties of the VM that run inside (we found that the internal load of 
OpenNebula itself to be basically negligible).
For example, if you have lots of sequential I/O heavy VMs you may get benefits 
from one kind, while transactional and random I/O VMs may be more suitably 
served by other file systems.
We tend to use fio for benchmarks (http://freecode.com/projects/fio) that is 
included in most linux distributions; it provides for flexible selection of 
read-vs-write patterns, can select different probability distributions and 
includes a few common presets (like file server, mail server etc.)
Selecting the bottom file system for the store is thus extremely depending on 
application, feature and load. For example, we use in some configurations BTRFS 
with compression (slow rotative devices, especially when there are several of 
them in parallel), in other we use ext4 (good, all-around balanced) and in 
other XFS. For example XFS supports filesystem replication in a way similar to 
that of zfs (not as sofisticated, though), excellent performance for multiple 
parallel I/O operations.
ZFS in our tests tend to be extremely slow outside of a few sweet spots; a 
fact confirmed by external benchmarks like this one:
http://www.phoronix.com/scan.php?page=articleitem=zfs_linux_062num=3 We tried 
it (and we continue to do so, both for the FUSE and native kernel version) but for the 
moment the performance hit is excessive despite the nice feature set. BTRFS continue to 
improve nicely, and a set of patches to implement send/receive like ZFS are here: 
https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive but it is still 
marked as experimental.

I personally *love* ZFS, and the feature set is unparalleled. Unfortunately, 
the poor license choice means that it never got the kind of hammering and 
tuning that other linux kernel filesystem can get.
regards,
carlo daffara
cloudweavers

- Messaggio originale -
Da: Gerry O'Brien ge...@scss.tcd.ie
A: Users OpenNebula users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 13:16:52
Oggetto: [one-users] File system performance testing suite tailored to  
OpenNebula

Hi,

  Are there any recommendations for a file system performance testing
suite tailored to OpenNebula typical workloads? I would like to compare
the performance of zfs v. ext4. One of the reasons for considering zfs
is that it allows replication to a remote site using snapshot streaming.
Normal nightly backups, using something like rsync, are not suitable for
virtual machine images where a single block change means the whole image
has to be copied. The amount of change is to great.

  On a related issue, does it make sense to have datastores 0 and 1
in a single files system so that the instantiations of non-persistent
images does not require a copy from one file system to another? I have
in mind the case where the original image is a qcow2 image.

  Regards,
  Gerry




--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread João Pagaime

Hello all,

the topic is very interesting

I wonder if anyone could answer this:

what is the penalty of using a file-system on top of a file-system? that 
is what happens when the VM disk is a regular file on the hypervisor's 
filesystem. I mean: the VM has its own file-system and then the 
hypervisor maps that vm-disk on a regular file on another filesystem 
(the hypervisor filesystem). Thus the file-system on top of a 
file-system issue


putting the question the other way around: what is the benefit of using 
raw disk-device (local disk, LVM, iSCSI, ...) as an open-nebula datastore?


didn't test this but I feel the benefit should be substantial

anyway simple bonnie++ tests within a VM show heavy penalties, comparing 
test running in  the VM and outside (directly on the hipervisor).  That 
isn't of course an opennebula related performance issue, but a more 
general technology challenge


best regards,
João




Em 11-09-2013 13:10, Gerry O'Brien escreveu:

Hi Carlo,

  Thanks for the reply. I should really look at XFS for the 
replication and performance.


  Do you have any thoughts on my second questions about qcow2 copies 
form /datastores/1 to /datastores/0 in a single filesystem?


Regards,
  Gerry


On 11/09/2013 12:53, Carlo Daffara wrote:
It's difficult to provide an indication of what a typical workload 
may be, as it depends greatly on the
I/O properties of the VM that run inside (we found that the 
internal load of OpenNebula itself to be basically negligible).
For example, if you have lots of sequential I/O heavy VMs you may get 
benefits from one kind, while transactional and random I/O VMs may be 
more suitably served by other file systems.
We tend to use fio for benchmarks (http://freecode.com/projects/fio) 
that is included in most linux distributions; it provides for 
flexible selection of read-vs-write patterns, can select different 
probability distributions and includes a few common presets (like 
file server, mail server etc.)
Selecting the bottom file system for the store is thus extremely 
depending on application, feature and load. For example, we use in 
some configurations BTRFS with compression (slow rotative devices, 
especially when there are several of them in parallel), in other we 
use ext4 (good, all-around balanced) and in other XFS. For example 
XFS supports filesystem replication in a way similar to that of zfs 
(not as sofisticated, though), excellent performance for multiple 
parallel I/O operations.
ZFS in our tests tend to be extremely slow outside of a few sweet 
spots; a fact confirmed by external benchmarks like this one:
http://www.phoronix.com/scan.php?page=articleitem=zfs_linux_062num=3 We 
tried it (and we continue to do so, both for the FUSE and native 
kernel version) but for the moment the performance hit is excessive 
despite the nice feature set. BTRFS continue to improve nicely, and a 
set of patches to implement send/receive like ZFS are here: 
https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive 
but it is still marked as experimental.


I personally *love* ZFS, and the feature set is unparalleled. 
Unfortunately, the poor license choice means that it never got the 
kind of hammering and tuning that other linux kernel filesystem can get.

regards,
carlo daffara
cloudweavers

- Messaggio originale -
Da: Gerry O'Brien ge...@scss.tcd.ie
A: Users OpenNebula users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 13:16:52
Oggetto: [one-users] File system performance testing suite tailored 
toOpenNebula


Hi,

  Are there any recommendations for a file system performance 
testing

suite tailored to OpenNebula typical workloads? I would like to compare
the performance of zfs v. ext4. One of the reasons for considering zfs
is that it allows replication to a remote site using snapshot streaming.
Normal nightly backups, using something like rsync, are not suitable for
virtual machine images where a single block change means the whole image
has to be copied. The amount of change is to great.

  On a related issue, does it make sense to have datastores 0 and 1
in a single files system so that the instantiations of non-persistent
images does not require a copy from one file system to another? I have
in mind the case where the original image is a qcow2 image.

  Regards,
  Gerry






___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Liu, Guang Jun (Gene)
Its something I am looking for too. I am considering ZFS for the reasons
that large image clone happens concurrently.

Regards,
Gene
On 13-09-11 07:16 AM, Gerry O'Brien wrote:
 Hi,

 Are there any recommendations for a file system performance
 testing suite tailored to OpenNebula typical workloads? I would like
 to compare the performance of zfs v. ext4. One of the reasons for
 considering zfs is that it allows replication to a remote site using
 snapshot streaming. Normal nightly backups, using something like
 rsync, are not suitable for virtual machine images where a single
 block change means the whole image has to be copied. The amount of
 change is to great.

 On a related issue, does it make sense to have datastores 0 and 1
 in a single files system so that the instantiations of non-persistent
 images does not require a copy from one file system to another? I have
 in mind the case where the original image is a qcow2 image.

 Regards,
 Gerry


___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Carlo Daffara
Not a simple answer, however this article by LE and Huang provide quite some 
details:
https://www.usenix.org/legacy/event/fast12/tech/full_papers/Le.pdf
we ended up using ext4 and xfs mainly, with btrfs for mirrored disks or for 
very slow rotational media.
Raw is good if you are able to map disks directly and you don't change them, 
but our results find that the difference is not that great- but the 
inconvenience is major :-)
When using kvm and virtio, the actual loss in IO performance is not very high 
for the majority of workloads. Windows is a separate issue- ntfs has very poor 
performance on small blocks for sparse writes, and this tends to increase the 
apparent inefficiency of kvm.
Actually, using the virtio device drivers the penalty is very small for most 
workloads; we tested a windows7 machine both as native (physical) and 
virtualized using a simple crystalmark test, and we found that using virtio the 
4k random io write test is just 15% slower, while the sequential ones are much 
faster virtualized (thanks to the linux native page cache).
We use for the intensive io workloads a combination of a single ssd plus one or 
more rotative disks, combined using enhanceio.
We observed an increase of the available IOPS for random write (especially 
important for database servers, AD machines...) of 8 times using consumer-grade 
ssds.
cheers,
Carlo Daffara
cloudweavers

- Messaggio originale -
Da: João Pagaime joao.paga...@gmail.com
A: users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 15:20:19
Oggetto: Re: [one-users] File system performance testing suite tailored to 
OpenNebula

Hello all,

the topic is very interesting

I wonder if anyone could answer this:

what is the penalty of using a file-system on top of a file-system? that 
is what happens when the VM disk is a regular file on the hypervisor's 
filesystem. I mean: the VM has its own file-system and then the 
hypervisor maps that vm-disk on a regular file on another filesystem 
(the hypervisor filesystem). Thus the file-system on top of a 
file-system issue

putting the question the other way around: what is the benefit of using 
raw disk-device (local disk, LVM, iSCSI, ...) as an open-nebula datastore?

didn't test this but I feel the benefit should be substantial

anyway simple bonnie++ tests within a VM show heavy penalties, comparing 
test running in  the VM and outside (directly on the hipervisor).  That 
isn't of course an opennebula related performance issue, but a more 
general technology challenge

best regards,
João




Em 11-09-2013 13:10, Gerry O'Brien escreveu:
 Hi Carlo,

   Thanks for the reply. I should really look at XFS for the 
 replication and performance.

   Do you have any thoughts on my second questions about qcow2 copies 
 form /datastores/1 to /datastores/0 in a single filesystem?

 Regards,
   Gerry


 On 11/09/2013 12:53, Carlo Daffara wrote:
 It's difficult to provide an indication of what a typical workload 
 may be, as it depends greatly on the
 I/O properties of the VM that run inside (we found that the 
 internal load of OpenNebula itself to be basically negligible).
 For example, if you have lots of sequential I/O heavy VMs you may get 
 benefits from one kind, while transactional and random I/O VMs may be 
 more suitably served by other file systems.
 We tend to use fio for benchmarks (http://freecode.com/projects/fio) 
 that is included in most linux distributions; it provides for 
 flexible selection of read-vs-write patterns, can select different 
 probability distributions and includes a few common presets (like 
 file server, mail server etc.)
 Selecting the bottom file system for the store is thus extremely 
 depending on application, feature and load. For example, we use in 
 some configurations BTRFS with compression (slow rotative devices, 
 especially when there are several of them in parallel), in other we 
 use ext4 (good, all-around balanced) and in other XFS. For example 
 XFS supports filesystem replication in a way similar to that of zfs 
 (not as sofisticated, though), excellent performance for multiple 
 parallel I/O operations.
 ZFS in our tests tend to be extremely slow outside of a few sweet 
 spots; a fact confirmed by external benchmarks like this one:
 http://www.phoronix.com/scan.php?page=articleitem=zfs_linux_062num=3 We 
 tried it (and we continue to do so, both for the FUSE and native 
 kernel version) but for the moment the performance hit is excessive 
 despite the nice feature set. BTRFS continue to improve nicely, and a 
 set of patches to implement send/receive like ZFS are here: 
 https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive 
 but it is still marked as experimental.

 I personally *love* ZFS, and the feature set is unparalleled. 
 Unfortunately, the poor license choice means that it never got the 
 kind of hammering and tuning that other linux kernel filesystem can get.
 regards,
 carlo daffara
 cloudweavers

Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Carlo Daffara
As for the second part of the question, having a single filesystem helps in 
reducing the copy cost.
We have moved from the underlying FS to a distributed fs that does r/w 
snapshots, and changed the tm scripts to convert
copies into snapshot operations, so we have a little bit more flexibility in 
managing the filesystems and stores.
cheers
carlo daffara
cloudweavers

- Messaggio originale -
Da: Gerry O'Brien ge...@scss.tcd.ie
A: Users OpenNebula users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 13:16:52
Oggetto: [one-users] File system performance testing suite tailored to  
OpenNebula

Hi,

 Are there any recommendations for a file system performance testing 
suite tailored to OpenNebula typical workloads? I would like to compare 
the performance of zfs v. ext4. One of the reasons for considering zfs 
is that it allows replication to a remote site using snapshot streaming. 
Normal nightly backups, using something like rsync, are not suitable for 
virtual machine images where a single block change means the whole image 
has to be copied. The amount of change is to great.

 On a related issue, does it make sense to have datastores 0 and 1 
in a single files system so that the instantiations of non-persistent 
images does not require a copy from one file system to another? I have 
in mind the case where the original image is a qcow2 image.

 Regards,
 Gerry

-- 
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Gerry O'Brien

I presume this uses the XFS snapshot facility?

On 11/09/2013 14:57, Carlo Daffara wrote:

As for the second part of the question, having a single filesystem helps in 
reducing the copy cost.
We have moved from the underlying FS to a distributed fs that does r/w 
snapshots, and changed the tm scripts to convert
copies into snapshot operations, so we have a little bit more flexibility in 
managing the filesystems and stores.
cheers
carlo daffara
cloudweavers

- Messaggio originale -
Da: Gerry O'Brien ge...@scss.tcd.ie
A: Users OpenNebula users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 13:16:52
Oggetto: [one-users] File system performance testing suite tailored to  
OpenNebula

Hi,

  Are there any recommendations for a file system performance testing
suite tailored to OpenNebula typical workloads? I would like to compare
the performance of zfs v. ext4. One of the reasons for considering zfs
is that it allows replication to a remote site using snapshot streaming.
Normal nightly backups, using something like rsync, are not suitable for
virtual machine images where a single block change means the whole image
has to be copied. The amount of change is to great.

  On a related issue, does it make sense to have datastores 0 and 1
in a single files system so that the instantiations of non-persistent
images does not require a copy from one file system to another? I have
in mind the case where the original image is a qcow2 image.

  Regards,
  Gerry




--
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Carlo Daffara
no (xfs on linux does not perform snapshots); it uses xfsdump. It allows for 
progressive dumps, with differential backups to a remote xfs server. It uses a 
concept of levels (0 to 9) where 0 is a full backup, and you can provide 
differential backups at different levels. Some pointers are here:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/xfsbackuprestore.html
cheers
carlo daffara
cloudweavers

- Messaggio originale -
Da: Gerry O'Brien ge...@scss.tcd.ie
A: Carlo Daffara carlo.daff...@cloudweavers.eu
Cc: Users OpenNebula users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 16:38:41
Oggetto: Re: [one-users] File system performance testing suite tailored to 
OpenNebula

I presume this uses the XFS snapshot facility?

On 11/09/2013 14:57, Carlo Daffara wrote:
 As for the second part of the question, having a single filesystem helps in 
 reducing the copy cost.
 We have moved from the underlying FS to a distributed fs that does r/w 
 snapshots, and changed the tm scripts to convert
 copies into snapshot operations, so we have a little bit more flexibility in 
 managing the filesystems and stores.
 cheers
 carlo daffara
 cloudweavers

 - Messaggio originale -
 Da: Gerry O'Brien ge...@scss.tcd.ie
 A: Users OpenNebula users@lists.opennebula.org
 Inviato: Mercoledì, 11 settembre 2013 13:16:52
 Oggetto: [one-users] File system performance testing suite tailored to
 OpenNebula

 Hi,

   Are there any recommendations for a file system performance testing
 suite tailored to OpenNebula typical workloads? I would like to compare
 the performance of zfs v. ext4. One of the reasons for considering zfs
 is that it allows replication to a remote site using snapshot streaming.
 Normal nightly backups, using something like rsync, are not suitable for
 virtual machine images where a single block change means the whole image
 has to be copied. The amount of change is to great.

   On a related issue, does it make sense to have datastores 0 and 1
 in a single files system so that the instantiations of non-persistent
 images does not require a copy from one file system to another? I have
 in mind the case where the original image is a qcow2 image.

   Regards,
   Gerry



-- 
Gerry O'Brien

Systems Manager
School of Computer Science and Statistics
Trinity College Dublin
Dublin 2
IRELAND

00 353 1 896 1341

___
Users mailing list
Users@lists.opennebula.org
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org


Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread João Pagaime

thanks  for pointing out the paper

I've glanced at it and somewhat confirmed my impressions on write 
operations (which are very relevant on transactional environments):  the 
penalty on write operations doesn't seem to be negligible.


best regards,
João

Em 11-09-2013 14:55, Carlo Daffara escreveu:

Not a simple answer, however this article by LE and Huang provide quite some 
details:
https://www.usenix.org/legacy/event/fast12/tech/full_papers/Le.pdf
we ended up using ext4 and xfs mainly, with btrfs for mirrored disks or for 
very slow rotational media.
Raw is good if you are able to map disks directly and you don't change them, 
but our results find that the difference is not that great- but the 
inconvenience is major :-)
When using kvm and virtio, the actual loss in IO performance is not very high 
for the majority of workloads. Windows is a separate issue- ntfs has very poor 
performance on small blocks for sparse writes, and this tends to increase the 
apparent inefficiency of kvm.
Actually, using the virtio device drivers the penalty is very small for most 
workloads; we tested a windows7 machine both as native (physical) and 
virtualized using a simple crystalmark test, and we found that using virtio the 
4k random io write test is just 15% slower, while the sequential ones are much 
faster virtualized (thanks to the linux native page cache).
We use for the intensive io workloads a combination of a single ssd plus one or 
more rotative disks, combined using enhanceio.
We observed an increase of the available IOPS for random write (especially 
important for database servers, AD machines...) of 8 times using consumer-grade 
ssds.
cheers,
Carlo Daffara
cloudweavers

- Messaggio originale -
Da: João Pagaime joao.paga...@gmail.com
A: users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 15:20:19
Oggetto: Re: [one-users] File system performance testing suite tailored to 
OpenNebula

Hello all,

the topic is very interesting

I wonder if anyone could answer this:

what is the penalty of using a file-system on top of a file-system? that
is what happens when the VM disk is a regular file on the hypervisor's
filesystem. I mean: the VM has its own file-system and then the
hypervisor maps that vm-disk on a regular file on another filesystem
(the hypervisor filesystem). Thus the file-system on top of a
file-system issue

putting the question the other way around: what is the benefit of using
raw disk-device (local disk, LVM, iSCSI, ...) as an open-nebula datastore?

didn't test this but I feel the benefit should be substantial

anyway simple bonnie++ tests within a VM show heavy penalties, comparing
test running in  the VM and outside (directly on the hipervisor).  That
isn't of course an opennebula related performance issue, but a more
general technology challenge

best regards,
João




Em 11-09-2013 13:10, Gerry O'Brien escreveu:

Hi Carlo,

   Thanks for the reply. I should really look at XFS for the
replication and performance.

   Do you have any thoughts on my second questions about qcow2 copies
form /datastores/1 to /datastores/0 in a single filesystem?

 Regards,
   Gerry


On 11/09/2013 12:53, Carlo Daffara wrote:

It's difficult to provide an indication of what a typical workload
may be, as it depends greatly on the
I/O properties of the VM that run inside (we found that the
internal load of OpenNebula itself to be basically negligible).
For example, if you have lots of sequential I/O heavy VMs you may get
benefits from one kind, while transactional and random I/O VMs may be
more suitably served by other file systems.
We tend to use fio for benchmarks (http://freecode.com/projects/fio)
that is included in most linux distributions; it provides for
flexible selection of read-vs-write patterns, can select different
probability distributions and includes a few common presets (like
file server, mail server etc.)
Selecting the bottom file system for the store is thus extremely
depending on application, feature and load. For example, we use in
some configurations BTRFS with compression (slow rotative devices,
especially when there are several of them in parallel), in other we
use ext4 (good, all-around balanced) and in other XFS. For example
XFS supports filesystem replication in a way similar to that of zfs
(not as sofisticated, though), excellent performance for multiple
parallel I/O operations.
ZFS in our tests tend to be extremely slow outside of a few sweet
spots; a fact confirmed by external benchmarks like this one:
http://www.phoronix.com/scan.php?page=articleitem=zfs_linux_062num=3 We
tried it (and we continue to do so, both for the FUSE and native
kernel version) but for the moment the performance hit is excessive
despite the nice feature set. BTRFS continue to improve nicely, and a
set of patches to implement send/receive like ZFS are here:
https://btrfs.wiki.kernel.org/index.php/Design_notes_on_Send/Receive
but it is still marked as experimental.

I

Re: [one-users] File system performance testing suite tailored to OpenNebula

2013-09-11 Thread Carlo Daffara
Actually the point is that *it is* possible to get near-native performance, 
when appropriate tuning or precautions are taken.
Take as an example the graphs in page 5:
the throughput is *higher* with XFS as host filesystem than the raw device (BD 
in the graph) for the filesystem workload, and using XFS it's within 10% (apart 
for ext3, that has an higher performance hit); for the database workload it's 
JFS that's on a par or slightly faster.
Another important fact is latency (added latency due to multiple stacked FS) 
and again, the graph on page 6 shows that there are specific combinations of 
guest/host FS have very small added latencies due to filesystem stacking.
It is also clear that the default ext4 used in many guest VMs is absolutely 
sub-optimal for write workloads, where JFS is twice as fast.
Other aspects to consider:
The default io scheduler in linux is *abysmal* for VM workloads. Deadline is 
the clear winner, along with noop for SSD disks. Other small touches may be 
tuning the default readahead for rotational media (and removing it for ssd), 
increasing the retention of read cache pages, increasing (a little) the flush 
time of the write cache, that even with a 5 second sweep time increases the 
iops rate for write workloads by increasing the opportunities for optimizing 
the disk head path, and on and on...
so, my point is that it is possible with relatively small effort, to get 
near-disk performance from kvm with libvirt (same concept, with different 
aspects, for Xen). 
it's a fascinating area of work, and we had one of our people work for two 
weeks only doing tests using a windows VM with a benchmark application inside, 
over a large number of different fs/kvm parameters. We found out a lot of 
interesting cases :-)
cheers
carlo daffara
cloudweavers

- Messaggio originale -
Da: João Pagaime joao.paga...@gmail.com
A: users@lists.opennebula.org
Inviato: Mercoledì, 11 settembre 2013 19:31:07
Oggetto: Re: [one-users] File system performance testing suite tailored to 
OpenNebula

thanks  for pointing out the paper

I've glanced at it and somewhat confirmed my impressions on write 
operations (which are very relevant on transactional environments):  the 
penalty on write operations doesn't seem to be negligible.

best regards,
João

Em 11-09-2013 14:55, Carlo Daffara escreveu:
 Not a simple answer, however this article by LE and Huang provide quite some 
 details:
 https://www.usenix.org/legacy/event/fast12/tech/full_papers/Le.pdf
 we ended up using ext4 and xfs mainly, with btrfs for mirrored disks or for 
 very slow rotational media.
 Raw is good if you are able to map disks directly and you don't change them, 
 but our results find that the difference is not that great- but the 
 inconvenience is major :-)
 When using kvm and virtio, the actual loss in IO performance is not very high 
 for the majority of workloads. Windows is a separate issue- ntfs has very 
 poor performance on small blocks for sparse writes, and this tends to 
 increase the apparent inefficiency of kvm.
 Actually, using the virtio device drivers the penalty is very small for most 
 workloads; we tested a windows7 machine both as native (physical) and 
 virtualized using a simple crystalmark test, and we found that using virtio 
 the 4k random io write test is just 15% slower, while the sequential ones are 
 much faster virtualized (thanks to the linux native page cache).
 We use for the intensive io workloads a combination of a single ssd plus one 
 or more rotative disks, combined using enhanceio.
 We observed an increase of the available IOPS for random write (especially 
 important for database servers, AD machines...) of 8 times using 
 consumer-grade ssds.
 cheers,
 Carlo Daffara
 cloudweavers

 - Messaggio originale -
 Da: João Pagaime joao.paga...@gmail.com
 A: users@lists.opennebula.org
 Inviato: Mercoledì, 11 settembre 2013 15:20:19
 Oggetto: Re: [one-users] File system performance testing suite tailored to 
 OpenNebula

 Hello all,

 the topic is very interesting

 I wonder if anyone could answer this:

 what is the penalty of using a file-system on top of a file-system? that
 is what happens when the VM disk is a regular file on the hypervisor's
 filesystem. I mean: the VM has its own file-system and then the
 hypervisor maps that vm-disk on a regular file on another filesystem
 (the hypervisor filesystem). Thus the file-system on top of a
 file-system issue

 putting the question the other way around: what is the benefit of using
 raw disk-device (local disk, LVM, iSCSI, ...) as an open-nebula datastore?

 didn't test this but I feel the benefit should be substantial

 anyway simple bonnie++ tests within a VM show heavy penalties, comparing
 test running in  the VM and outside (directly on the hipervisor).  That
 isn't of course an opennebula related performance issue, but a more
 general technology challenge

 best regards,
 João




 Em 11-09-2013 13:10, Gerry O'Brien escreveu:
 Hi