[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On 2009/10/21 14:13, MORITA Kazutaka wrote: Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. We added some pages to Sheepdog website: Design: http://www.osrg.net/sheepdog/design.html FAQ : http://www.osrg.net/sheepdog/faq.html Sheepdog mailing list is also ready to use (thanks for Tomasz) Subscribe/Unsubscribe/Preferences http://lists.wpkg.org/mailman/listinfo/sheepdog Archive http://lists.wpkg.org/pipermail/sheepdog/ We are always looking for developers or users interested in participating in Sheepdog project! Thanks. MORITA Kazutaka
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On 2009/10/25 17:51, Dietmar Maurer wrote: Do you support multiple guests accessing the same image? A VM image can be attached to any VMs but one VM at a time; multiple running VMs cannot access to the same VM image. I guess this is a problem when you want to do live migrations? Yes, because Sheepdog locks a VM image when it is opened. To avoid this problem, locking must be delayed until migration has done. This is also a TODO item. -- MORITA Kazutaka
Re: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Dietmar Maurer wrote: Also, on _loaded_ systems, I noticed creating/removing logical volumes can take really long (several minutes); where allocating a file of a given size would just take a fraction of that. Allocating a file takes much longer, unless you use a 'sparse' file. If you mean "allocating" like with: dd if=/dev/zero of=image bs=1G count=50 Then of course, that's a lot of IO. As you mentioned, you can create a sparse file (but then, you'll end up with a lot of fragmentation). But a better way would be to use persistent preallocation (fallocate), instead of "traditional" dd or a sparse file. -- Tomasz Chmielewski http://wpkg.org
[Qemu-devel] RE: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
> >> Do you support multiple guests accessing the same image? > > > > A VM image can be attached to any VMs but one VM at a time; multiple > > running VMs cannot access to the same VM image. I guess this is a problem when you want to do live migrations? - Dietmar
RE: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
> Also, on _loaded_ systems, I noticed creating/removing logical volumes > can take really long (several minutes); where allocating a file of a > given size would just take a fraction of that. Allocating a file takes much longer, unless you use a 'sparse' file. - Dietmar
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On 10/23/2009 05:40 PM, FUJITA Tomonori wrote: On Fri, 23 Oct 2009 09:14:29 -0500 Javier Guerra wrote: I think that the major difference between sheepdog and cluster file systems such as Google File system, pNFS, etc is the interface between clients and a storage system. note that GFS is "Global File System" (written by Sistina (the same folks from LVM) and bought by RedHat). Google Filesystem is a different thing, and ironically the client/storage interface is a little more like sheepdog and unlike a regular cluster filesystem. Hmm, Avi referred to Global File System? I wasn't sure. 'GFS' is ambiguous. Anyway, Global File System is a SAN file system. It's a completely different architecture from Sheepdog. I did, and yes, it is completely different since you don't require central storage. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.
Re: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Chris Webb wrote: Javier Guerra writes: i'd just want to add my '+1 votes' on both getting rid of JVM dependency and using block devices (usually LVM) instead of ext3/btrfs If the chunks into which the virtual drives are split are quite small (say the 64MB used by Hadoop), LVM may be a less appropriate choice. It doesn't support very large numbers of very small logical volumes very well. Also, on _loaded_ systems, I noticed creating/removing logical volumes can take really long (several minutes); where allocating a file of a given size would just take a fraction of that. Sot sure how it would matter here, but probably it would. -- Tomasz Chmielewski http://wpkg.org
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On Fri, 23 Oct 2009 09:14:29 -0500 Javier Guerra wrote: > > I think that the major difference between sheepdog and cluster file > > systems such as Google File system, pNFS, etc is the interface between > > clients and a storage system. > > note that GFS is "Global File System" (written by Sistina (the same > folks from LVM) and bought by RedHat). Google Filesystem is a > different thing, and ironically the client/storage interface is a > little more like sheepdog and unlike a regular cluster filesystem. Hmm, Avi referred to Global File System? I wasn't sure. 'GFS' is ambiguous. Anyway, Global File System is a SAN file system. It's a completely different architecture from Sheepdog. > > Sheepdog uses consistent hashing to decide where objects store; I/O > > load is balanced across the nodes. When a new node is added or the > > existing node is removed, the hash table changes and the data > > automatically and transparently are moved over nodes. > > > > We plan to implement a mechanism to distribute the data not randomly > > but intelligently; we could use machine load, the locations of VMs, etc. > > i don't have much hands-on experience on consistent hashing; but it > sounds reasonable to make each node's ring segment proportional to its > storage capacity. Yeah, that's one of the techniques, I think. > dynamic load balancing seems a tougher nut to > crack, especially while keeping all clients mapping consistent. There are some techniques to do that. We think that there are some existing techniques to distribute data intelligently. We just have not analyzed the options. > i'd just want to add my '+1 votes' on both getting rid of JVM > dependency and using block devices (usually LVM) instead of ext3/btrfs LVM doesn't fit for our requirement nicely. What we need is updating some objects in an atomic way. We can implement that for ourselves but we prefer to keep our code simple by using the existing mechanism.
Re: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On Fri, Oct 23, 2009 at 8:10 PM, Alexander Graf wrote: > > On 23.10.2009, at 12:41, MORITA Kazutaka wrote: > > On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: > > How is load balancing implemented? Can you move an image transparently > > while a guest is running? Will an image be moved closer to its guest? > > Sheepdog uses consistent hashing to decide where objects store; I/O > load is balanced across the nodes. When a new node is added or the > existing node is removed, the hash table changes and the data > automatically and transparently are moved over nodes. > > We plan to implement a mechanism to distribute the data not randomly > but intelligently; we could use machine load, the locations of VMs, etc. > > What exactly does balanced mean? Can it cope with individual nodes having > more disk space than others? I mean objects are uniformly distributed over the nodes by the hash function. Distribution using free disk space information is one of TODOs. > Do you support multiple guests accessing the same image? > > A VM image can be attached to any VMs but one VM at a time; multiple > running VMs cannot access to the same VM image. > > What about read-only access? Imagine you'd have 5 kvm instances each > accessing it using -snapshot. By creating new clone images from existing snapshot image, you can do the similar thing. Sheepdog can create cloning image instantly. -- MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazut...@lab.ntt.co.jp
Re: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On Fri, Oct 23, 2009 at 9:58 AM, Chris Webb wrote: > If the chunks into which the virtual drives are split are quite small (say > the 64MB used by Hadoop), LVM may be a less appropriate choice. It doesn't > support very large numbers of very small logical volumes very well. absolutely. the 'nicest' way to do it would be to use a single block device per sheep process, and do the splitting there. it's an extra layer of code, and once you add non-naïve behavior for deleting and fragmentation, you quickly approach filesystem-like complexity. unless you can do some very clever mapping that reuses the consistent hash algorithms to find not only which server(s) you want, but also which chunk to hit the kind of things i'd love to code, but never found the use for it. i'll definitely dig deeper in the code. -- Javier
Re: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Javier Guerra writes: > i'd just want to add my '+1 votes' on both getting rid of JVM > dependency and using block devices (usually LVM) instead of ext3/btrfs If the chunks into which the virtual drives are split are quite small (say the 64MB used by Hadoop), LVM may be a less appropriate choice. It doesn't support very large numbers of very small logical volumes very well. Best wishes, Chris.
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
This looks very interesting - how does this compare with Exanodes/Seanodes? Thanks, Avishay
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On Fri, Oct 23, 2009 at 5:41 AM, MORITA Kazutaka wrote: > On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: >> If so, is it reasonable to compare this to a cluster file system setup (like >> GFS) with images as files on this filesystem? The difference would be that >> clustering is implemented in userspace in sheepdog, but in the kernel for a >> clustering filesystem. > > I think that the major difference between sheepdog and cluster file > systems such as Google File system, pNFS, etc is the interface between > clients and a storage system. note that GFS is "Global File System" (written by Sistina (the same folks from LVM) and bought by RedHat). Google Filesystem is a different thing, and ironically the client/storage interface is a little more like sheepdog and unlike a regular cluster filesystem. >> How is load balancing implemented? Can you move an image transparently >> while a guest is running? Will an image be moved closer to its guest? > > Sheepdog uses consistent hashing to decide where objects store; I/O > load is balanced across the nodes. When a new node is added or the > existing node is removed, the hash table changes and the data > automatically and transparently are moved over nodes. > > We plan to implement a mechanism to distribute the data not randomly > but intelligently; we could use machine load, the locations of VMs, etc. i don't have much hands-on experience on consistent hashing; but it sounds reasonable to make each node's ring segment proportional to its storage capacity. dynamic load balancing seems a tougher nut to crack, especially while keeping all clients mapping consistent. >> Do you support multiple guests accessing the same image? > > A VM image can be attached to any VMs but one VM at a time; multiple > running VMs cannot access to the same VM image. this is a must-have safety measure; but a 'manual override' is quite useful for those that know how to manage a cluster-aware filesystem inside a VM image, maybe like Xen's "w!" flag does. justs be sure to avoid distributed caching for a shared image! in all, great project, and with such a clean patch into KVM/Qemu, high hopes of making into regular use. i'd just want to add my '+1 votes' on both getting rid of JVM dependency and using block devices (usually LVM) instead of ext3/btrfs -- Javier
[Qemu-devel] RE: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
> Anyways, I do not know JGroups - maybe that 'reliable multicast' solves > all network problems somehow - Is there any documentation about how > they do it? OK, found the papers on their web site - quite interesting too.
[Qemu-devel] RE: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Another suggestion: use LVM instead of btrfs (to get better performance)
[Qemu-devel] RE: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
> We use JGroups (Java library) for reliable multicast communication in > our cluster manager daemon. I doubt that there is something like 'reliable multicast' - you will run into many problems when you try to handle errors. > We don't worry about the performance much > since the cluster manager daemon is not involved in the I/O path. We > might think about moving to corosync if it is more stable than > JGroups. corosync is already quite stable. And it support virtual synchrony http://en.wikipedia.org/wiki/Virtual_synchrony Anyways, I do not know JGroups - maybe that 'reliable multicast' solves all network problems somehow - Is there any documentation about how they do it? - Dietmar
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On 23.10.2009, at 12:41, MORITA Kazutaka wrote: On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: How is load balancing implemented? Can you move an image transparently while a guest is running? Will an image be moved closer to its guest? Sheepdog uses consistent hashing to decide where objects store; I/O load is balanced across the nodes. When a new node is added or the existing node is removed, the hash table changes and the data automatically and transparently are moved over nodes. We plan to implement a mechanism to distribute the data not randomly but intelligently; we could use machine load, the locations of VMs, etc. What exactly does balanced mean? Can it cope with individual nodes having more disk space than others? Do you support multiple guests accessing the same image? A VM image can be attached to any VMs but one VM at a time; multiple running VMs cannot access to the same VM image. What about read-only access? Imagine you'd have 5 kvm instances each accessing it using -snapshot. Great project btw! Alex
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On Fri, Oct 23, 2009 at 12:30 AM, Avi Kivity wrote: > On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: >> >> Hi everyone, >> >> Sheepdog is a distributed storage system for KVM/QEMU. It provides >> highly available block level storage volumes to VMs like Amazon EBS. >> Sheepdog supports advanced volume management features such as snapshot, >> cloning, and thin provisioning. Sheepdog runs on several tens or hundreds >> of nodes, and the architecture is fully symmetric; there is no central >> node such as a meta-data server. > > Very interesting! From a very brief look at the code, it looks like the > sheepdog block format driver is a network client that is able to access > highly available images, yes? Yes. Sheepdog is a simple key-value storage system that consists of multiple nodes (a bit similar to Amazon Dynamo, I guess). The qemu Sheepdog driver (client) divides a VM image into fixed-size objects and store them on the key-value storage system. > If so, is it reasonable to compare this to a cluster file system setup (like > GFS) with images as files on this filesystem? The difference would be that > clustering is implemented in userspace in sheepdog, but in the kernel for a > clustering filesystem. I think that the major difference between sheepdog and cluster file systems such as Google File system, pNFS, etc is the interface between clients and a storage system. > How is load balancing implemented? Can you move an image transparently > while a guest is running? Will an image be moved closer to its guest? Sheepdog uses consistent hashing to decide where objects store; I/O load is balanced across the nodes. When a new node is added or the existing node is removed, the hash table changes and the data automatically and transparently are moved over nodes. We plan to implement a mechanism to distribute the data not randomly but intelligently; we could use machine load, the locations of VMs, etc. > Can you stripe an image across nodes? Yes, a VM images is divided into multiple objects, and they are stored over nodes. > Do you support multiple guests accessing the same image? A VM image can be attached to any VMs but one VM at a time; multiple running VMs cannot access to the same VM image. > What about fault tolerance - storing an image redundantly on multiple nodes? Yes, all objects are replicated to multiple nodes. -- MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazut...@lab.ntt.co.jp
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Chris Webb writes: > MORITA Kazutaka writes: > > > We use JGroups (Java library) for reliable multicast communication in > > our cluster manager daemon. We don't worry about the performance much > > since the cluster manager daemon is not involved in the I/O path. We > > might think about moving to corosync if it is more stable than > > JGroups. > > I'd love to see this running on top of corosync too. Corosync is a well > tested, stable cluster manager, and doesn't have the JVM dependency of > jgroups so feels more suitable for building 'thin virtualisation fabrics'. Very exciting project, by the way! Best wishes, Chris.
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
MORITA Kazutaka writes: > We use JGroups (Java library) for reliable multicast communication in > our cluster manager daemon. We don't worry about the performance much > since the cluster manager daemon is not involved in the I/O path. We > might think about moving to corosync if it is more stable than > JGroups. I'd love to see this running on top of corosync too. Corosync is a well tested, stable cluster manager, and doesn't have the JVM dependency of jgroups so feels more suitable for building 'thin virtualisation fabrics'. Cheers, Chris.
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
We use JGroups (Java library) for reliable multicast communication in our cluster manager daemon. We don't worry about the performance much since the cluster manager daemon is not involved in the I/O path. We might think about moving to corosync if it is more stable than JGroups. On Wed, Oct 21, 2009 at 6:08 PM, Dietmar Maurer wrote: > Quite interesting. But would it be possible to use corosync for the cluster > communication? The point is that we need corosync anyways for pacemaker, it > is written in C (high performance) and seem to implement the feature you need? > >> -Original Message- >> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On >> Behalf Of MORITA Kazutaka >> Sent: Mittwoch, 21. Oktober 2009 07:14 >> To: k...@vger.kernel.org; qemu-devel@nongnu.org; linux- >> fsde...@vger.kernel.org >> Subject: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM >> >> Hi everyone, >> >> Sheepdog is a distributed storage system for KVM/QEMU. It provides >> highly available block level storage volumes to VMs like Amazon EBS. >> Sheepdog supports advanced volume management features such as snapshot, >> cloning, and thin provisioning. Sheepdog runs on several tens or >> hundreds >> of nodes, and the architecture is fully symmetric; there is no central >> node such as a meta-data server. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazut...@lab.ntt.co.jp
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Hello, Does the following patch work for you? diff --git a/sheep/work.c b/sheep/work.c index 4df8dc0..45f362d 100644 --- a/sheep/work.c +++ b/sheep/work.c @@ -28,6 +28,7 @@ #include #include #include +#define _LINUX_FCNTL_H #include #include "list.h" On Wed, Oct 21, 2009 at 5:45 PM, Nikolai K. Bochev wrote: > Hello, > > I am getting the following error trying to compile sheepdog on Ubuntu 9.10 ( > 2.6.31-14 x64 ) : > > cd shepherd; make > make[1]: Entering directory > `/home/shiny/Packages/sheepdog-2009102101/shepherd' > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE shepherd.c > -o shepherd.o > shepherd.c: In function ‘main’: > shepherd.c:300: warning: dereferencing pointer ‘hdr.55’ does break > strict-aliasing rules > shepherd.c:300: note: initialized from here > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE treeview.c > -o treeview.o > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE > ../lib/event.c -o ../lib/event.o > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE > ../lib/net.c -o ../lib/net.o > ../lib/net.c: In function ‘write_object’: > ../lib/net.c:358: warning: ‘vosts’ may be used uninitialized in this function > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE > ../lib/logger.c -o ../lib/logger.o > cc shepherd.o treeview.o ../lib/event.o ../lib/net.o ../lib/logger.o -o > shepherd -lncurses -lcrypto > make[1]: Leaving directory `/home/shiny/Packages/sheepdog-2009102101/shepherd' > cd sheep; make > make[1]: Entering directory `/home/shiny/Packages/sheepdog-2009102101/sheep' > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE sheep.c -o > sheep.o > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE store.c -o > store.o > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE net.c -o > net.o > cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE work.c -o > work.o > In file included from /usr/include/asm/fcntl.h:1, > from /usr/include/linux/fcntl.h:4, > from /usr/include/linux/signalfd.h:13, > from work.c:31: > /usr/include/asm-generic/fcntl.h:117: error: redefinition of ‘struct flock’ > /usr/include/asm-generic/fcntl.h:140: error: redefinition of ‘struct flock64’ > make[1]: *** [work.o] Error 1 > make[1]: Leaving directory `/home/shiny/Packages/sheepdog-2009102101/sheep' > make: *** [all] Error 2 > > I have all the required libs installed. Patching and compiling qemu-kvm went > flawless. > > - Original Message - > From: "MORITA Kazutaka" > To: k...@vger.kernel.org, qemu-devel@nongnu.org, linux-fsde...@vger.kernel.org > Sent: Wednesday, October 21, 2009 8:13:47 AM > Subject: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM > > Hi everyone, > > Sheepdog is a distributed storage system for KVM/QEMU. It provides > highly available block level storage volumes to VMs like Amazon EBS. > Sheepdog supports advanced volume management features such as snapshot, > cloning, and thin provisioning. Sheepdog runs on several tens or hundreds > of nodes, and the architecture is fully symmetric; there is no central > node such as a meta-data server. > > The following list describes the features of Sheepdog. > > * Linear scalability in performance and capacity > * No single point of failure > * Redundant architecture (data is written to multiple nodes) > - Tolerance against network failure > * Zero configuration (newly added machines will join the cluster > automatically) > - Autonomous load balancing > * Snapshot > - Online snapshot from qemu-monitor > * Clone from a snapshot volume > * Thin provisioning > - Amazon EBS API support (to use from a Eucalyptus instance) > > (* = current features, - = on our todo list) > > More details and download links are here: > > http://www.osrg.net/sheepdog/ > > Note that the code is still in an early stage. > There are some critical TODO items: > > - VM image deletion support > - Support architectures other than X86_64 > - Data recoverys > - Free space management > - Guarantee reliability and availability under heavy load > - Performance improvement > - Reclaim unused blocks > - More documentation > > We hope finding people interested in working together. > Enjoy! > > > Here are examples: > > - create images > > $ kvm-img create -f sheepdog "Alice's Disk" 256G > $ kvm-img create -f sheepdog "Bob's Disk" 256G > > - list images > > $ shepherd info -t vdi > 4 : Alice's Disk 256 GB (allocated: 0 MB, shared: 0 MB), 2009-10-15 > 16:17:18, tag: 0, current > 8 : Bob's Disk 256 GB (allocated: 0 MB, shared: 0 MB), 2009-10-15 > 16:29:20, tag: 0, current > > - start up a virtual machine > > $ kvm --drive format=sheepdog,file="Alice's Disk" > > - create a snapshot > > $ kvm-img snapshot -c name sheepdog:"Alice's Disk" > > - clone from a snapshot > > $
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Am 22.10.2009 um 18:28 schrieb Anthony Liguori : Avi Kivity wrote: On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. Very interesting! From a very brief look at the code, it looks like the sheepdog block format driver is a network client that is able to access highly available images, yes? If so, is it reasonable to compare this to a cluster file system setup (like GFS) with images as files on this filesystem? The difference would be that clustering is implemented in userspace in sheepdog, but in the kernel for a clustering filesystem. I'm still in the process of reading the code, but that's the impression I got too. It made me think that the protocol for qemu to communicate with sheepdog could be a filesystem protocol (like 9p) Speaking about 9p, what's the status there? Alex
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Avi Kivity wrote: On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. Very interesting! From a very brief look at the code, it looks like the sheepdog block format driver is a network client that is able to access highly available images, yes? If so, is it reasonable to compare this to a cluster file system setup (like GFS) with images as files on this filesystem? The difference would be that clustering is implemented in userspace in sheepdog, but in the kernel for a clustering filesystem. I'm still in the process of reading the code, but that's the impression I got too. It made me think that the protocol for qemu to communicate with sheepdog could be a filesystem protocol (like 9p) and sheepdog could expose itself as a synthetic. There are some interesting ramifications to something like that--namely that you could mount sheepdog on localhost and interact with it through the vfs. Very interesting stuff, I'm looking forward to examining more closely. Regards, Anthony Liguori
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On 10/21/2009 07:13 AM, MORITA Kazutaka wrote: Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. Very interesting! From a very brief look at the code, it looks like the sheepdog block format driver is a network client that is able to access highly available images, yes? If so, is it reasonable to compare this to a cluster file system setup (like GFS) with images as files on this filesystem? The difference would be that clustering is implemented in userspace in sheepdog, but in the kernel for a clustering filesystem. How is load balancing implemented? Can you move an image transparently while a guest is running? Will an image be moved closer to its guest? Can you stripe an image across nodes? Do you support multiple guests accessing the same image? What about fault tolerance - storing an image redundantly on multiple nodes? -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Hello, when i try to compile, i'm getting the following error ( Using ubuntu 9.10, x64 ) : cd shepherd; make make[1]: Entering directory `/home/shiny/Packages/sheepdog-2009102101/shepherd' cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE shepherd.c -o shepherd.o shepherd.c: In function ‘main’: shepherd.c:300: warning: dereferencing pointer ‘hdr.55’ does break strict-aliasing rules shepherd.c:300: note: initialized from here cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE treeview.c -o treeview.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE ../lib/event.c -o ../lib/event.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE ../lib/net.c -o ../lib/net.o ../lib/net.c: In function ‘write_object’: ../lib/net.c:358: warning: ‘vosts’ may be used uninitialized in this function cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE ../lib/logger.c -o ../lib/logger.o cc shepherd.o treeview.o ../lib/event.o ../lib/net.o ../lib/logger.o -o shepherd -lncurses -lcrypto make[1]: Leaving directory `/home/shiny/Packages/sheepdog-2009102101/shepherd' cd sheep; make make[1]: Entering directory `/home/shiny/Packages/sheepdog-2009102101/sheep' cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE sheep.c -o sheep.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE store.c -o store.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE net.c -o net.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE work.c -o work.o In file included from /usr/include/asm/fcntl.h:1, from /usr/include/linux/fcntl.h:4, from /usr/include/linux/signalfd.h:13, from work.c:31: /usr/include/asm-generic/fcntl.h:117: error: redefinition of ‘struct flock’ /usr/include/asm-generic/fcntl.h:140: error: redefinition of ‘struct flock64’ make[1]: *** [work.o] Error 1 make[1]: Leaving directory `/home/shiny/Packages/sheepdog-2009102101/sheep' make: *** [all] Error 2 The qemu-kvm source with patched support for sheepdog compiles fine. - Original Message - From: "MORITA Kazutaka" To: k...@vger.kernel.org, qemu-devel@nongnu.org, linux-fsde...@vger.kernel.org Sent: Wednesday, October 21, 2009 8:13:47 AM Subject: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. The following list describes the features of Sheepdog. * Linear scalability in performance and capacity * No single point of failure * Redundant architecture (data is written to multiple nodes) - Tolerance against network failure * Zero configuration (newly added machines will join the cluster automatically) - Autonomous load balancing * Snapshot - Online snapshot from qemu-monitor * Clone from a snapshot volume * Thin provisioning - Amazon EBS API support (to use from a Eucalyptus instance) (* = current features, - = on our todo list) More details and download links are here: http://www.osrg.net/sheepdog/ Note that the code is still in an early stage. There are some critical TODO items: - VM image deletion support - Support architectures other than X86_64 - Data recoverys - Free space management - Guarantee reliability and availability under heavy load - Performance improvement - Reclaim unused blocks - More documentation We hope finding people interested in working together. Enjoy! Here are examples: - create images $ kvm-img create -f sheepdog "Alice's Disk" 256G $ kvm-img create -f sheepdog "Bob's Disk" 256G - list images $ shepherd info -t vdi 4 : Alice's Disk 256 GB (allocated: 0 MB, shared: 0 MB), 2009-10-15 16:17:18, tag: 0, current 8 : Bob's Disk 256 GB (allocated: 0 MB, shared: 0 MB), 2009-10-15 16:29:20, tag: 0, current - start up a virtual machine $ kvm --drive format=sheepdog,file="Alice's Disk" - create a snapshot $ kvm-img snapshot -c name sheepdog:"Alice's Disk" - clone from a snapshot $ kvm-img create -b sheepdog:"Alice's Disk":0 -f sheepdog "Charlie's Disk" Thanks. -- MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazut...@lab.ntt.co.jp -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Hello, I am getting the following error trying to compile sheepdog on Ubuntu 9.10 ( 2.6.31-14 x64 ) : cd shepherd; make make[1]: Entering directory `/home/shiny/Packages/sheepdog-2009102101/shepherd' cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE shepherd.c -o shepherd.o shepherd.c: In function ‘main’: shepherd.c:300: warning: dereferencing pointer ‘hdr.55’ does break strict-aliasing rules shepherd.c:300: note: initialized from here cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE treeview.c -o treeview.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE ../lib/event.c -o ../lib/event.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE ../lib/net.c -o ../lib/net.o ../lib/net.c: In function ‘write_object’: ../lib/net.c:358: warning: ‘vosts’ may be used uninitialized in this function cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE ../lib/logger.c -o ../lib/logger.o cc shepherd.o treeview.o ../lib/event.o ../lib/net.o ../lib/logger.o -o shepherd -lncurses -lcrypto make[1]: Leaving directory `/home/shiny/Packages/sheepdog-2009102101/shepherd' cd sheep; make make[1]: Entering directory `/home/shiny/Packages/sheepdog-2009102101/sheep' cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE sheep.c -o sheep.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE store.c -o store.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE net.c -o net.o cc -c -g -O2 -Wall -Wstrict-prototypes -I../include -D_GNU_SOURCE work.c -o work.o In file included from /usr/include/asm/fcntl.h:1, from /usr/include/linux/fcntl.h:4, from /usr/include/linux/signalfd.h:13, from work.c:31: /usr/include/asm-generic/fcntl.h:117: error: redefinition of ‘struct flock’ /usr/include/asm-generic/fcntl.h:140: error: redefinition of ‘struct flock64’ make[1]: *** [work.o] Error 1 make[1]: Leaving directory `/home/shiny/Packages/sheepdog-2009102101/sheep' make: *** [all] Error 2 I have all the required libs installed. Patching and compiling qemu-kvm went flawless. - Original Message - From: "MORITA Kazutaka" To: k...@vger.kernel.org, qemu-devel@nongnu.org, linux-fsde...@vger.kernel.org Sent: Wednesday, October 21, 2009 8:13:47 AM Subject: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. The following list describes the features of Sheepdog. * Linear scalability in performance and capacity * No single point of failure * Redundant architecture (data is written to multiple nodes) - Tolerance against network failure * Zero configuration (newly added machines will join the cluster automatically) - Autonomous load balancing * Snapshot - Online snapshot from qemu-monitor * Clone from a snapshot volume * Thin provisioning - Amazon EBS API support (to use from a Eucalyptus instance) (* = current features, - = on our todo list) More details and download links are here: http://www.osrg.net/sheepdog/ Note that the code is still in an early stage. There are some critical TODO items: - VM image deletion support - Support architectures other than X86_64 - Data recoverys - Free space management - Guarantee reliability and availability under heavy load - Performance improvement - Reclaim unused blocks - More documentation We hope finding people interested in working together. Enjoy! Here are examples: - create images $ kvm-img create -f sheepdog "Alice's Disk" 256G $ kvm-img create -f sheepdog "Bob's Disk" 256G - list images $ shepherd info -t vdi 4 : Alice's Disk 256 GB (allocated: 0 MB, shared: 0 MB), 2009-10-15 16:17:18, tag:0, current 8 : Bob's Disk256 GB (allocated: 0 MB, shared: 0 MB), 2009-10-15 16:29:20, tag:0, current - start up a virtual machine $ kvm --drive format=sheepdog,file="Alice's Disk" - create a snapshot $ kvm-img snapshot -c name sheepdog:"Alice's Disk" - clone from a snapshot $ kvm-img create -b sheepdog:"Alice's Disk":0 -f sheepdog "Charlie's Disk" Thanks. -- MORITA, Kazutaka NTT Cyber Space Labs OSS Computing Project Kernel Group E-mail: morita.kazut...@lab.ntt.co.jp -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] RE: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Quite interesting. But would it be possible to use corosync for the cluster communication? The point is that we need corosync anyways for pacemaker, it is written in C (high performance) and seem to implement the feature you need? > -Original Message- > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On > Behalf Of MORITA Kazutaka > Sent: Mittwoch, 21. Oktober 2009 07:14 > To: k...@vger.kernel.org; qemu-devel@nongnu.org; linux- > fsde...@vger.kernel.org > Subject: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM > > Hi everyone, > > Sheepdog is a distributed storage system for KVM/QEMU. It provides > highly available block level storage volumes to VMs like Amazon EBS. > Sheepdog supports advanced volume management features such as snapshot, > cloning, and thin provisioning. Sheepdog runs on several tens or > hundreds > of nodes, and the architecture is fully symmetric; there is no central > node such as a meta-data server.