Re: [ceph-users] Proper Ceph network configuration
Yes, that's correct. We use the public/cluster networks exclusively, so in the configuration we specify the MON addresses on the public network, and define both the public/cluster network subnet. I've not tested, but wonder if it's possible to have the MON addresses on a 1GbE network, then define public/cluster networks in the config and things still operate? - Original Message - From: "Jon Heese" To: "Bill Campbell" Cc: ceph-users@lists.ceph.com Sent: Friday, October 23, 2015 10:03:46 AM Subject: RE: [ceph-users] Proper Ceph network configuration Bill, Thanks for the explanation – that helps a lot. In that case, I actually want the 10.174.1.0/24 network to be both my cluster and my public network, because I want all “heavy” data traffic to be on that network. And by “heavy”, I mean large volumes of data, both normal Ceph client traffic and OSD-to-OSD communication. Contrast this with the more “control plane” connections between the MONs and the OSDs, which we intend to go over the lighter-weight management network. The documentation seems to indicate that the MONs also communicate on the “public” network, but our MONs aren’t currently on that network (we were treating it as an OSD/Client network). I guess I need to put them on that network…? Thanks. Jon Heese Systems Engineer INetU Managed Hosting P: 610.266.7441 x 261 F: 610.266.7434 www.inetu.net ** This message contains confidential information, which also may be privileged, and is intended only for the person(s) addressed above. Any unauthorized use, distribution, copying or disclosure of confidential and/or privileged information is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately via reply e-mail. ** From: Campbell, Bill [mailto:bcampb...@axcess-financial.com] Sent: Friday, October 23, 2015 9:11 AM To: Jon Heese Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] Proper Ceph network configuration The "public" network is where all storage accesses from other systems or clients will occur. When you map RBD's to other hosts, access object storage through the RGW, or CephFS access, you will access the data through the "public" network. The "cluster" network is where all internal replication between OSD processes will occur. As an example in our set up, we have a 10GbE public network for hypervisor nodes to access, along with a 10GbE cluster network for back-end replication/communication. Our 1GbE network is used for monitoring integration and system administration. From: "Jon Heese" < jhe...@inetu.net > To: ceph-users@lists.ceph.com Sent: Friday, October 23, 2015 8:58:28 AM Subject: [ceph-users] Proper Ceph network configuration Hello, We have two separate networks in our Ceph cluster design: 10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to be a management or control plane network 10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 10Gbe, intended to be the data network So we want all of the OSD traffic to go over the "back end", and the MON traffic to go over the "front end". We thought the following would do that: public network = 10.197.5.0/24 # skinny pipe, mgmt & MON traffic cluster network = 10.174.1.0/24 # fat pipe, OSD traffic But that doesn't seem to be the case -- iftop and netstat show that little/no OSD communication is happening over the 10.174.1 network and it's all happening over the 10.197.5 network. What configuration should we be running to enforce the networks per our design? Thanks! Jon Heese Systems Engineer INetU Managed Hosting P: 610.266.7441 x 261 F: 610.266.7434 www.inetu.net ** This message contains confidential information, which also may be privileged, and is intended only for the person(s) addressed above. Any unauthorized use, distribution, copying or disclosure of confidential and/or privileged information is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately via reply e-mail. ** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Proper Ceph network configuration
The "public" network is where all storage accesses from other systems or clients will occur. When you map RBD's to other hosts, access object storage through the RGW, or CephFS access, you will access the data through the "public" network. The "cluster" network is where all internal replication between OSD processes will occur. As an example in our set up, we have a 10GbE public network for hypervisor nodes to access, along with a 10GbE cluster network for back-end replication/communication. Our 1GbE network is used for monitoring integration and system administration. - Original Message - From: "Jon Heese" To: ceph-users@lists.ceph.com Sent: Friday, October 23, 2015 8:58:28 AM Subject: [ceph-users] Proper Ceph network configuration Hello, We have two separate networks in our Ceph cluster design: 10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to be a management or control plane network 10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 10Gbe, intended to be the data network So we want all of the OSD traffic to go over the "back end", and the MON traffic to go over the "front end". We thought the following would do that: public network = 10.197.5.0/24 # skinny pipe, mgmt & MON traffic cluster network = 10.174.1.0/24 # fat pipe, OSD traffic But that doesn't seem to be the case -- iftop and netstat show that little/no OSD communication is happening over the 10.174.1 network and it's all happening over the 10.197.5 network. What configuration should we be running to enforce the networks per our design? Thanks! Jon Heese Systems Engineer INetU Managed Hosting P: 610.266.7441 x 261 F: 610.266.7434 www.inetu.net ** This message contains confidential information, which also may be privileged, and is intended only for the person(s) addressed above. Any unauthorized use, distribution, copying or disclosure of confidential and/or privileged information is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify the sender immediately via reply e-mail. ** ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] any recommendation of using EnhanceIO?
Hey Stefan, Are you using your Ceph cluster for virtualization storage? Is dm-writeboost configured on the OSD nodes themselves? - Original Message - From: "Stefan Priebe - Profihost AG" To: "Mark Nelson" , ceph-users@lists.ceph.com Sent: Tuesday, August 18, 2015 7:36:10 AM Subject: Re: [ceph-users] any recommendation of using EnhanceIO? We're using an extra caching layer for ceph since the beginning for our older ceph deployments. All new deployments go with full SSDs. I've tested so far: - EnhanceIO - Flashcache - Bcache - dm-cache - dm-writeboost The best working solution was and is bcache except for it's buggy code. The current code in 4.2-rc7 vanilla kernel still contains bugs. f.e. discards result in crashed FS after reboots and so on. But it's still the fastest for ceph. The 2nd best solution which we already use in production is dm-writeboost (https://github.com/akiradeveloper/dm-writeboost). Everything else is too slow. Stefan Am 18.08.2015 um 13:33 schrieb Mark Nelson: > Hi Jan, > > Out of curiosity did you ever try dm-cache? I've been meaning to give > it a spin but haven't had the spare cycles. > > Mark > > On 08/18/2015 04:00 AM, Jan Schermer wrote: >> I already evaluated EnhanceIO in combination with CentOS 6 (and >> backported 3.10 and 4.0 kernel-lt if I remember correctly). >> It worked fine during benchmarks and stress tests, but once we run DB2 >> on it it panicked within minutes and took all the data with it (almost >> literally - files that werent touched, like OS binaries were b0rked >> and the filesystem was unsalvageable). >> If you disregard this warning - the performance gains weren't that >> great either, at least in a VM. It had problems when flushing to disk >> after reaching dirty watermark and the block size has some >> not-well-documented implications (not sure now, but I think it only >> cached IO _larger_than the block size, so if your database keeps >> incrementing an XX-byte counter it will go straight to disk). >> >> Flashcache doesn't respect barriers (or does it now?) - if that's ok >> for you than go for it, it should be stable and I used it in the past >> in production without problems. >> >> bcache seemed to work fine, but I needed to >> a) use it for root >> b) disable and enable it on the fly (doh) >> c) make it non-persisent (flush it) before reboot - not sure if that >> was possible either. >> d) all that in a customer's VM, and that customer didn't have a strong >> technical background to be able to fiddle with it... >> So I haven't tested it heavily. >> >> Bcache should be the obvious choice if you are in control of the >> environment. At least you can cry on LKML's shoulder when you lose >> data :-) >> >> Jan >> >> >>> On 18 Aug 2015, at 01:49, Alex Gorbachev wrote: >>> >>> What about https://github.com/Frontier314/EnhanceIO? Last commit 2 >>> months ago, but no external contributors :( >>> >>> The nice thing about EnhanceIO is there is no need to change device >>> name, unlike bcache, flashcache etc. >>> >>> Best regards, >>> Alex >>> >>> On Thu, Jul 23, 2015 at 11:02 AM, Daniel Gryniewicz >>> wrote: I did some (non-ceph) work on these, and concluded that bcache was the best supported, most stable, and fastest. This was ~1 year ago, to take it with a grain of salt, but that's what I would recommend. Daniel From: "Dominik Zalewski" To: "German Anders" Cc: "ceph-users" Sent: Wednesday, July 1, 2015 5:28:10 PM Subject: Re: [ceph-users] any recommendation of using EnhanceIO? Hi, I’ve asked same question last weeks or so (just search the mailing list archives for EnhanceIO :) and got some interesting answers. Looks like the project is pretty much dead since it was bought out by HGST. Even their website has some broken links in regards to EnhanceIO I’m keen to try flashcache or bcache (its been in the mainline kernel for some time) Dominik On 1 Jul 2015, at 21:13, German Anders wrote: Hi cephers, Is anyone out there that implement enhanceIO in a production environment? any recommendation? any perf output to share with the diff between using it and not? Thanks in advance, German ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com
Re: [ceph-users] CEPH RBD with ESXi
I don't have much of the details (our engineering group handled most of the testing), however we currently have 10 Dell PowerEdge R720xd systems, each with 24 600GB 10k SAS OSDs (the system has a RAID controller with 2GB NVRAM, in testing performance was better with this then with 6 SSD drives for journals). The cluster is configured with public/private networks, both on 10GbE networks. The NAS systems (there are 2 in Active/Passive mode) are connected to the 10GbE public network, along with the VMware hypervisor nodes. Performance is acceptable (nothing earth shattering, latency can be a concern during peak I/O periods, particularly backups) but we have a relatively small VMware environment, primarily for legacy application systems that either aren't supported or we're afraid to move to our larger private cloud infrastructure (which also uses Ceph, but direct access with QEMU+KVM). The iSCSI testing was about 2 years ago, I believe testing was done against Cuttlefish and we were using tgtd for the target. I'm sure there have been enhancements in both stability and performance since then, we've just not gotten around to evaluating or changing it, as what we have is working well for us (we have mixed workloads, but generally hover around 500-800 active IOPS during the day, with peaks to 2-3k during off-hour maintenance times). We've been running for about 1.5 years with this setup, and no major issues. - Original Message - From: "Nikhil Mitra (nikmitra)" To: "Bill Campbell" Cc: ceph-users@lists.ceph.com Sent: Monday, July 20, 2015 3:05:25 PM Subject: Re: [ceph-users] CEPH RBD with ESXi Hi Bill, Would you be kind enough to share how your setup looks like today as we are planning to use ESXi back-ended with CEPH storage. When you tested iSCSI what were the issues you noticed ? What version of CEPH were you running then ? What iSCSI software did you use for setup ? Regards, Nikhil Mitra From: "Campbell, Bill" < bcampb...@axcess-financial.com > Reply-To: "Campbell, Bill" < bcampb...@axcess-financial.com > Date: Monday, July 20, 2015 at 11:52 AM To: Nikhil Mitra < nikmi...@cisco.com > Cc: " ceph-users@lists.ceph.com " < ceph-users@lists.ceph.com > Subject: Re: [ceph-users] CEPH RBD with ESXi We use VMware with Ceph, however we don't use RBD directly (we have an NFS server which has RBD volumes exported as datastores in VMware). We did attempt iSCSI with RBD to connect to VMware but ran into stability issues (could have been the target software we were using) but have found NFS to be pretty reliable. - Original Message - From: "Nikhil Mitra (nikmitra)" < nikmi...@cisco.com > To: ceph-users@lists.ceph.com Sent: Monday, July 20, 2015 2:07:13 PM Subject: [ceph-users] CEPH RBD with ESXi Hi, Has anyone implemented using CEPH RBD with Vmware ESXi hypervisor. Just looking to use it as a native VMFS datastore to host VMDK’s. Please let me know if there are any documents out there that might point me in the right direction to get started on this. Thank you. Regards, Nikhil Mitra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CEPH RBD with ESXi
We use VMware with Ceph, however we don't use RBD directly (we have an NFS server which has RBD volumes exported as datastores in VMware). We did attempt iSCSI with RBD to connect to VMware but ran into stability issues (could have been the target software we were using) but have found NFS to be pretty reliable. - Original Message - From: "Nikhil Mitra (nikmitra)" To: ceph-users@lists.ceph.com Sent: Monday, July 20, 2015 2:07:13 PM Subject: [ceph-users] CEPH RBD with ESXi Hi, Has anyone implemented using CEPH RBD with Vmware ESXi hypervisor. Just looking to use it as a native VMFS datastore to host VMDK’s. Please let me know if there are any documents out there that might point me in the right direction to get started on this. Thank you. Regards, Nikhil Mitra ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Concurrency in ceph
I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to access an RBD directly for each virtual instance deployed, live-migration included (as each RBD is in and of itself a separate block device, not file system). I would imagine OpenStack works in a similar fashion. - Original Message - From: "hp cre" To: "Gregory Farnum" Cc: ceph-users@lists.ceph.com Sent: Tuesday, November 18, 2014 4:43:07 PM Subject: Re: [ceph-users] Concurrency in ceph Ok thanks Greg. But what openstack does, AFAIU, is use rbd devices directly, one for each Vm instance, right? And that's how it supports live migrations on KVM, etc.. Right? Openstack and similar cloud frameworks don't need to create vm instances on filesystems, am I correct? On 18 Nov 2014 23:33, "Gregory Farnum" < g...@gregs42.com > wrote: On Tue, Nov 18, 2014 at 1:26 PM, hp cre < hpc...@gmail.com > wrote: > Hello everyone, > > I'm new to ceph but been working with proprietary clustered filesystem for > quite some time. > > I almost understand how ceph works, but have a couple of questions which > have been asked before here, but i didn't understand the answer. > > In the closed source world, we use clustered filesystems like Veritas > clustered filesystem to mount a shared block device (using San) to more than > one compute node concurrently for shared read/write. > > What I can't seem to get a solid and clear answer for its this.. > How can I use ceph to do the same thing? Can RADOS guarantee coherency and > integrity of my data if I use an rbd device with any filesystem on top of > it? Or must I still use a cluster aware filesystem such as vxfs or ocfs? RBD behaves just like a regular disk if you mount it to multiple nodes at once (although you need to disable the client caching). This means that the disk accesses will be coherent, but using ext4 on top of it won't work because ext4 assumes it is the only accessor — you have to use a cluster-aware FS like ocfs2. A SAN would have the same problem here, so I'm not sure why you think it works with them... > And is CephFS going to some this problem? Or does it not have support for > concurrent read/write access among all now mounting it? CephFS definitely does support concurrent access to the same data. > And, does iscsi targets over rbd devices behave the same? Uh, yes, iSCSI over rbd will be the same as regular RBD in this regard, modulo anything the iSCSI gateway might be set up to do. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD import format 1 & 2
When you run qemu-img you are essentially converting the qcow2 image to the appropriate raw format during the conversion and import process to the cluster. When you use rbd import you are not doing a conversion, so the image is being imported AS IS (you can validate this by looking at the size of the image after importing). In order to get to format 2 initially you may need to convert the qcow2 to raw first, then import. Unfortunately I dont think qemu-img supports outputting to stdout, so this will have to be a two-step process. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of NEVEU Stephane Sent: Friday, July 25, 2014 8:57 AM To: NEVEU Stephane; ceph-users@lists.ceph.com Subject: Re: [ceph-users] RBD import format 1 & 2 I finally reconverted my only format 1 image into format 2 so now everything is in format 2, but Im still confused, my vm disks are still readonly (Ive tried different images centos 6.5 with kernel 2.6.32 and ubuntu with 3.13), do I have to modprobe rbd on the host ? De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de NEVEU Stephane Envoyé : vendredi 25 juillet 2014 13:45 À : ceph-users@lists.ceph.com Objet : [ceph-users] RBD import format 1 & 2 Hi all, One quick question about image format 1 & 2 : Ive got a img.qcow2 and I want to convert it : The first solution is qemu-img convert f qcow2 O rbd img.qcow2 rbd:/mypool/myimage As far as I understood It will converted into format 1 which is the default one so I wont be able to clone my image. Second solution is to import it directly into format 2 : Rbd import image-format 2 img.qcow2 mypool/myimage But in this case, when I start my VM, my vm / filesystem turns readonly with many buffer IO error on dm-0. Im running Ubuntu 14.04 for both kvm host and VMs so kernel version is 3.13.0-30 Any idea ? Thx NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph
I think the version of Libvirt included with RHEL/CentOS supports RBD storage (but not pools), so outside of compiling a newer version not sure there can be anything else done aside from waiting for repo additions/newer versions of the distro. Not sure what your scenario is, but this is the exact reason we switched our underlying virtualization infrastructure to Ubuntu. Their cloud archive PPA has updated packages for QEMU/KVM, Libvirt, Open vSwitch, etc. that are backported for LTS releases, and is something I personally think RHEL is WAY behind the curve on (getting better with their RDO initiative though). We didn't like consuming resources validating that updated builds of QEMU/Libvirt were going to cause problems and just allocated those resources to learning the Ubuntu environment. As far as streamlining management on top of that, you have some options (outside of virt-manager, which has no native support for RBD IIRC) like Proxmox (which is an entire solution like ESXi/Hyper-V using KVM) or something like OpenStack or OpenNebula (we use OpenNebula). Beats having to edit domains by hand. ;-) - Original Message - From: "Chris C" To: "Dan van der Ster" Cc: ceph-users@lists.ceph.com Sent: Friday, December 6, 2013 10:37:03 AM Subject: Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph Dan, I found the thread but it looks like another dead end :( /Chris C On Fri, Dec 6, 2013 at 4:46 AM, Dan van der Ster < d...@vanderster.com > wrote: See thread a couple days ago "[ceph-users] qemu-kvm packages for centos" On Thu, Dec 5, 2013 at 10:44 PM, Chris C < mazzy...@gmail.com > wrote: > I've been working on getting this setup working. I have virtual machines > working using rbd based images by editing the domain directly. > > Is there any way to make the creation process better? We are hoping to be > able to use a virsh pool using the rbd driver but it appears that Redhat has > not compiled libvirt with rbd support. > > Thought? > > Thanks, > /Chris C > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance with 8K blocks.
As Gregory mentioned, your 'dd' test looks to be reading from the cache (you are writing 8GB in, and then reading that 8GB out, so the reads are all cached reads) so the performance is going to seem good. You can add the 'oflag=direct' to your dd test to try and get a more accurate reading from that. RADOS performance from what I've seen is largely going to hinge on replica size and journal location. Are your journals on separate disks or on the same disk as the OSD? What is the replica size of your pool?From: "Jason Villalta" To: "Bill Campbell" Cc: "Gregory Farnum" , "ceph-users" Sent: Tuesday, September 17, 2013 11:31:43 AMSubject: Re: [ceph-users] Ceph performance with 8K blocks.Thanks for you feed back it is helpful.I may have been wrong about the default windows block size. What would be the best tests to compare native performance of the SSD disks at 4K blocks vs Ceph performance with 4K blocks? It just seems their is a huge difference in the results. On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <bcampb...@axcess-financial.com> wrote: Windows default (NTFS) is a 4k block. Are you changing the allocation unit to 8k as a default for your configuration? From: "Gregory Farnum" <g...@inktank.com> To: "Jason Villalta" <ja...@rubixnet.com>Cc: ceph-users@lists.ceph.com Sent: Tuesday, September 17, 2013 10:40:09 AMSubject: Re: [ceph-users] Ceph performance with 8K blocks.Your 8k-block dd test is not nearly the same as your 8k-block rados bench or SQL tests. Both rados bench and SQL require the write to be committed to disk before moving on to the next one; dd is simply writing into the page cache. So you're not going to get 460 or even 273MB/s with sync 8k writes regardless of your settings. However, I think you should be able to tune your OSDs into somewhat better numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small pipeline!), and an SSD-based daemon should be going faster. What kind of logging are you running with and what configs have you set? Hopefully you can get Mark or Sam or somebody who's done some performance tuning to offer some tips as well. :)-GregOn Tuesday, September 17, 2013, Jason Villalta wrote: Hello all, I am new to the list.I have a single machines setup for testing Ceph. It has a dual proc 6 cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 240GB SSDs and an OSD setup on each disk with the OSD and Journal in separate partitions formatted with ext4. My goal here is to prove just how fast Ceph can go and what kind of performance to expect when using it as a back-end storage for virtual machines mostly windows. I would also like to try to understand how it will scale IO by removing one disk of the three and doing the benchmark tests. But that is secondary. So far here are my results. I am aware this is all sequential, I just want to know how fast it can go. DD IO test of SSD disks: I am testing 8K blocks since that is the default block size of windows. dd of=ddbenchfile if=/dev/zero bs=8K count=100819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s dd if=ddbenchfile of=/dev/null bs=8K819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s RADOS bench test with 3 SSD disks and 4MB object size(Default): rados --no-cleanup bench -p pbench 30 writeTotal writes made: 2061Write size: 4194304Bandwidth (MB/sec): 273.004Stddev Bandwidth: 67.5237 Max bandwidth (MB/sec): 352Min bandwidth (MB/sec): 0Average Latency: 0.234199Stddev Latency: 0.130874Max latency: 0.867119Min latency: 0.039318 -rados bench -p pbench 30 seqTotal reads made: 2061Read size: 4194304Bandwidth (MB/sec): 956.466Average Latency: 0.0666347 Max latency: 0.208986Min latency: 0.011625This all looks like I would expect from using three disks. The problems appear to come with the 8K blocks/object size. RADOS bench test with 3 SSD disks and 8K object size(8K blocks):rados --no-cleanup bench -b 8192 -p pbench 30 writeTotal writes made: 13770Write size: 8192 Bandwidth (MB/sec): 3.581Stddev Bandwidth: 1.04405Max bandwidth (MB/sec): 6.19531Min bandwidth (MB/sec): 0Average Latency: 0.0348977 Stddev Latency: 0.0349212Max latency: 0.326429Min latency: 0.0019--rados bench -b 8192 -p pbench 30 seqTotal reads made: 13770 Read size: 8192Bandwidth (MB/sec): 52.573Average Latency: 0.00237483Max latency: 0.006783Min latency: 0.000521 So are these performance correct or is this something I missed with the testing procedure? The RADOS bench number with 8K block size are the same we see when testing performance in an VM with SQLIO. Does anyone know of any configure changes that are needed to get the Ceph performance close
Re: [ceph-users] Ceph performance with 8K blocks.
Windows default (NTFS) is a 4k block. Are you changing the allocation unit to 8k as a default for your configuration? - Original Message - From: "Gregory Farnum" To: "Jason Villalta" Cc: ceph-users@lists.ceph.com Sent: Tuesday, September 17, 2013 10:40:09 AM Subject: Re: [ceph-users] Ceph performance with 8K blocks. Your 8k-block dd test is not nearly the same as your 8k-block rados bench or SQL tests. Both rados bench and SQL require the write to be committed to disk before moving on to the next one; dd is simply writing into the page cache. So you're not going to get 460 or even 273MB/s with sync 8k writes regardless of your settings. However, I think you should be able to tune your OSDs into somewhat better numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small pipeline!), and an SSD-based daemon should be going faster. What kind of logging are you running with and what configs have you set? Hopefully you can get Mark or Sam or somebody who's done some performance tuning to offer some tips as well. :) -Greg On Tuesday, September 17, 2013, Jason Villalta wrote: Hello all, I am new to the list. I have a single machines setup for testing Ceph. It has a dual proc 6 cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 240GB SSDs and an OSD setup on each disk with the OSD and Journal in separate partitions formatted with ext4. My goal here is to prove just how fast Ceph can go and what kind of performance to expect when using it as a back-end storage for virtual machines mostly windows. I would also like to try to understand how it will scale IO by removing one disk of the three and doing the benchmark tests. But that is secondary. So far here are my results. I am aware this is all sequential, I just want to know how fast it can go. DD IO test of SSD disks: I am testing 8K blocks since that is the default block size of windows. dd of=ddbenchfile if=/dev/zero bs=8K count=100 819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s dd if=ddbenchfile of=/dev/null bs=8K 819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s RADOS bench test with 3 SSD disks and 4MB object size(Default): rados --no-cleanup bench -p pbench 30 write Total writes made: 2061 Write size: 4194304 Bandwidth (MB/sec): 273.004 Stddev Bandwidth: 67.5237 Max bandwidth (MB/sec): 352 Min bandwidth (MB/sec): 0 Average Latency: 0.234199 Stddev Latency: 0.130874 Max latency: 0.867119 Min latency: 0.039318 - rados bench -p pbench 30 seq Total reads made: 2061 Read size: 4194304 Bandwidth (MB/sec): 956.466 Average Latency: 0.0666347 Max latency: 0.208986 Min latency: 0.011625 This all looks like I would expect from using three disks. The problems appear to come with the 8K blocks/object size. RADOS bench test with 3 SSD disks and 8K object size(8K blocks): rados --no-cleanup bench -b 8192 -p pbench 30 write Total writes made: 13770 Write size: 8192 Bandwidth (MB/sec): 3.581 Stddev Bandwidth: 1.04405 Max bandwidth (MB/sec): 6.19531 Min bandwidth (MB/sec): 0 Average Latency: 0.0348977 Stddev Latency: 0.0349212 Max latency: 0.326429 Min latency: 0.0019 -- rados bench -b 8192 -p pbench 30 seq Total reads made: 13770 Read size: 8192 Bandwidth (MB/sec): 52.573 Average Latency: 0.00237483 Max latency: 0.006783 Min latency: 0.000521 So are these performance correct or is this something I missed with the testing procedure? The RADOS bench number with 8K block size are the same we see when testing performance in an VM with SQLIO. Does anyone know of any configure changes that are needed to get the Ceph performance closer to native performance with 8K blocks? Thanks in advance. -- -- Jason Villalta Co-founder 800.799.4407x1230 | www.RubixTechnology.com -- Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies.___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Web Management Interface
Hello, I was wondering if there were any plans in the near future for some sort of Web-based management interface for Ceph clusters? Bill Campbell Infrastructure Architect Axcess Financial Services, Inc. 7755 Montgomery Rd., Suite 400 Cincinnati, OH 45236 NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Space available reported on Ceph file system
Yes, that is the TOTAL amount in the cluster. For example, if you have a replica size of '3' , 81489 GB available, and you write 1 GB of data, then that data is written to the cluster 3 times, so your total available will be 81486 GB. It definitely threw me off at first, but seeing as you can have multiple pools with different replica sizes it makes sense to report the TOTAL cluster availability, rather than trying to calculate how much is available based on replica size. -Original Message- From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marco Aroldi Sent: Friday, March 15, 2013 3:49 PM To: ceph-users@lists.ceph.com Subject: [ceph-users] Space available reported on Ceph file system Hi, I have a test cluster of 80Tb raw. My pools are using rep size = 2, so the real storage capacity is 40Tb but I see in pgmap a total of 80Tb available and also the cephfs mounted on a client reports 80Tb available too I would expect to see somewhere a "40Tb available" Is this behavior correct? Thanks pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 2880 pgp_num 2880 last_change 1 owner 0 crash_replay_interval 45 pgmap v796: 8640 pgs: 8640 active+clean; 8913 bytes data, 1770 MB used, 81489 GB / 81491 GB avail; 229B/s wr, 0op/s root@client1 ~ $ df -h Filesystem Size Used Avail Use% Mounted on 192.168.21.12:6789:/ 80T 1,8G 80T 1% /mnt/ceph -- Marco Aroldi ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com