Re: [ceph-users] Proper Ceph network configuration

2015-10-23 Thread Campbell, Bill
Yes, that's correct. 

We use the public/cluster networks exclusively, so in the configuration we 
specify the MON addresses on the public network, and define both the 
public/cluster network subnet. I've not tested, but wonder if it's possible to 
have the MON addresses on a 1GbE network, then define public/cluster networks 
in the config and things still operate? 

- Original Message -

From: "Jon Heese"  
To: "Bill Campbell"  
Cc: ceph-users@lists.ceph.com 
Sent: Friday, October 23, 2015 10:03:46 AM 
Subject: RE: [ceph-users] Proper Ceph network configuration 



Bill, 



Thanks for the explanation – that helps a lot. In that case, I actually want 
the 10.174.1.0/24 network to be both my cluster and my public network, because 
I want all “heavy” data traffic to be on that network. And by “heavy”, I mean 
large volumes of data, both normal Ceph client traffic and OSD-to-OSD 
communication. Contrast this with the more “control plane” connections between 
the MONs and the OSDs, which we intend to go over the lighter-weight management 
network. 



The documentation seems to indicate that the MONs also communicate on the 
“public” network, but our MONs aren’t currently on that network (we were 
treating it as an OSD/Client network). I guess I need to put them on that 
network…? 



Thanks. 




Jon Heese 
Systems Engineer 
INetU Managed Hosting 
P: 610.266.7441 x 261 
F: 610.266.7434 
www.inetu.net 


** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. ** 


From: Campbell, Bill [mailto:bcampb...@axcess-financial.com] 
Sent: Friday, October 23, 2015 9:11 AM 
To: Jon Heese  
Cc: ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] Proper Ceph network configuration 





The "public" network is where all storage accesses from other systems or 
clients will occur. When you map RBD's to other hosts, access object storage 
through the RGW, or CephFS access, you will access the data through the 
"public" network. The "cluster" network is where all internal replication 
between OSD processes will occur. As an example in our set up, we have a 10GbE 
public network for hypervisor nodes to access, along with a 10GbE cluster 
network for back-end replication/communication. Our 1GbE network is used for 
monitoring integration and system administration. 






From: "Jon Heese" < jhe...@inetu.net > 
To: ceph-users@lists.ceph.com 
Sent: Friday, October 23, 2015 8:58:28 AM 
Subject: [ceph-users] Proper Ceph network configuration 





Hello, 



We have two separate networks in our Ceph cluster design: 



10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network 

10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be the data network 



So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end". We thought the following would do that: 



public network = 10.197.5.0/24 # skinny pipe, mgmt & MON traffic 

cluster network = 10.174.1.0/24 # fat pipe, OSD traffic 



But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network. 



What configuration should we be running to enforce the networks per our design? 
Thanks! 



Jon Heese 
Systems Engineer 
INetU Managed Hosting 
P: 610.266.7441 x 261 
F: 610.266.7434 
www.inetu.net 

** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. ** 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 







NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies. 




NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Proper Ceph network configuration

2015-10-23 Thread Campbell, Bill
The "public" network is where all storage accesses from other systems or 
clients will occur. When you map RBD's to other hosts, access object storage 
through the RGW, or CephFS access, you will access the data through the 
"public" network. The "cluster" network is where all internal replication 
between OSD processes will occur. As an example in our set up, we have a 10GbE 
public network for hypervisor nodes to access, along with a 10GbE cluster 
network for back-end replication/communication. Our 1GbE network is used for 
monitoring integration and system administration. 

- Original Message -

From: "Jon Heese"  
To: ceph-users@lists.ceph.com 
Sent: Friday, October 23, 2015 8:58:28 AM 
Subject: [ceph-users] Proper Ceph network configuration 



Hello, 



We have two separate networks in our Ceph cluster design: 



10.197.5.0/24 - The "front end" network, "skinny pipe", all 1Gbe, intended to 
be a management or control plane network 

10.174.1.0/24 - The "back end" network, "fat pipe", all OSD nodes use 2x bonded 
10Gbe, intended to be the data network 



So we want all of the OSD traffic to go over the "back end", and the MON 
traffic to go over the "front end". We thought the following would do that: 



public network = 10.197.5.0/24 # skinny pipe, mgmt & MON traffic 

cluster network = 10.174.1.0/24 # fat pipe, OSD traffic 



But that doesn't seem to be the case -- iftop and netstat show that little/no 
OSD communication is happening over the 10.174.1 network and it's all happening 
over the 10.197.5 network. 



What configuration should we be running to enforce the networks per our design? 
Thanks! 



Jon Heese 
Systems Engineer 
INetU Managed Hosting 
P: 610.266.7441 x 261 
F: 610.266.7434 
www.inetu.net 

** This message contains confidential information, which also may be 
privileged, and is intended only for the person(s) addressed above. Any 
unauthorized use, distribution, copying or disclosure of confidential and/or 
privileged information is strictly prohibited. If you have received this 
communication in error, please erase all copies of the message and its 
attachments and notify the sender immediately via reply e-mail. ** 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] any recommendation of using EnhanceIO?

2015-08-18 Thread Campbell, Bill
Hey Stefan, 
Are you using your Ceph cluster for virtualization storage? Is dm-writeboost 
configured on the OSD nodes themselves? 

- Original Message -

From: "Stefan Priebe - Profihost AG"  
To: "Mark Nelson" , ceph-users@lists.ceph.com 
Sent: Tuesday, August 18, 2015 7:36:10 AM 
Subject: Re: [ceph-users] any recommendation of using EnhanceIO? 

We're using an extra caching layer for ceph since the beginning for our 
older ceph deployments. All new deployments go with full SSDs. 

I've tested so far: 
- EnhanceIO 
- Flashcache 
- Bcache 
- dm-cache 
- dm-writeboost 

The best working solution was and is bcache except for it's buggy code. 
The current code in 4.2-rc7 vanilla kernel still contains bugs. f.e. 
discards result in crashed FS after reboots and so on. But it's still 
the fastest for ceph. 

The 2nd best solution which we already use in production is 
dm-writeboost (https://github.com/akiradeveloper/dm-writeboost). 

Everything else is too slow. 

Stefan 
Am 18.08.2015 um 13:33 schrieb Mark Nelson: 
> Hi Jan, 
> 
> Out of curiosity did you ever try dm-cache? I've been meaning to give 
> it a spin but haven't had the spare cycles. 
> 
> Mark 
> 
> On 08/18/2015 04:00 AM, Jan Schermer wrote: 
>> I already evaluated EnhanceIO in combination with CentOS 6 (and 
>> backported 3.10 and 4.0 kernel-lt if I remember correctly). 
>> It worked fine during benchmarks and stress tests, but once we run DB2 
>> on it it panicked within minutes and took all the data with it (almost 
>> literally - files that werent touched, like OS binaries were b0rked 
>> and the filesystem was unsalvageable). 
>> If you disregard this warning - the performance gains weren't that 
>> great either, at least in a VM. It had problems when flushing to disk 
>> after reaching dirty watermark and the block size has some 
>> not-well-documented implications (not sure now, but I think it only 
>> cached IO _larger_than the block size, so if your database keeps 
>> incrementing an XX-byte counter it will go straight to disk). 
>> 
>> Flashcache doesn't respect barriers (or does it now?) - if that's ok 
>> for you than go for it, it should be stable and I used it in the past 
>> in production without problems. 
>> 
>> bcache seemed to work fine, but I needed to 
>> a) use it for root 
>> b) disable and enable it on the fly (doh) 
>> c) make it non-persisent (flush it) before reboot - not sure if that 
>> was possible either. 
>> d) all that in a customer's VM, and that customer didn't have a strong 
>> technical background to be able to fiddle with it... 
>> So I haven't tested it heavily. 
>> 
>> Bcache should be the obvious choice if you are in control of the 
>> environment. At least you can cry on LKML's shoulder when you lose 
>> data :-) 
>> 
>> Jan 
>> 
>> 
>>> On 18 Aug 2015, at 01:49, Alex Gorbachev  wrote: 
>>> 
>>> What about https://github.com/Frontier314/EnhanceIO? Last commit 2 
>>> months ago, but no external contributors :( 
>>> 
>>> The nice thing about EnhanceIO is there is no need to change device 
>>> name, unlike bcache, flashcache etc. 
>>> 
>>> Best regards, 
>>> Alex 
>>> 
>>> On Thu, Jul 23, 2015 at 11:02 AM, Daniel Gryniewicz  
>>> wrote: 
 I did some (non-ceph) work on these, and concluded that bcache was 
 the best 
 supported, most stable, and fastest. This was ~1 year ago, to take 
 it with 
 a grain of salt, but that's what I would recommend. 
 
 Daniel 
 
 
  
 From: "Dominik Zalewski"  
 To: "German Anders"  
 Cc: "ceph-users"  
 Sent: Wednesday, July 1, 2015 5:28:10 PM 
 Subject: Re: [ceph-users] any recommendation of using EnhanceIO? 
 
 
 Hi, 
 
 I’ve asked same question last weeks or so (just search the mailing list 
 archives for EnhanceIO :) and got some interesting answers. 
 
 Looks like the project is pretty much dead since it was bought out 
 by HGST. 
 Even their website has some broken links in regards to EnhanceIO 
 
 I’m keen to try flashcache or bcache (its been in the mainline 
 kernel for 
 some time) 
 
 Dominik 
 
 On 1 Jul 2015, at 21:13, German Anders  wrote: 
 
 Hi cephers, 
 
 Is anyone out there that implement enhanceIO in a production 
 environment? 
 any recommendation? any perf output to share with the diff between 
 using it 
 and not? 
 
 Thanks in advance, 
 
 German 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 
 
 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
 
 
 ___ 
 ceph-users mailing list 
 ceph-users@lists.ceph.com 

Re: [ceph-users] CEPH RBD with ESXi

2015-07-20 Thread Campbell, Bill
I don't have much of the details (our engineering group handled most of the 
testing), however we currently have 10 Dell PowerEdge R720xd systems, each with 
24 600GB 10k SAS OSDs (the system has a RAID controller with 2GB NVRAM, in 
testing performance was better with this then with 6 SSD drives for journals). 
The cluster is configured with public/private networks, both on 10GbE networks. 
The NAS systems (there are 2 in Active/Passive mode) are connected to the 10GbE 
public network, along with the VMware hypervisor nodes. Performance is 
acceptable (nothing earth shattering, latency can be a concern during peak I/O 
periods, particularly backups) but we have a relatively small VMware 
environment, primarily for legacy application systems that either aren't 
supported or we're afraid to move to our larger private cloud infrastructure 
(which also uses Ceph, but direct access with QEMU+KVM). The iSCSI testing was 
about 2 years ago, I believe testing was done against Cuttlefish and we were 
using tgtd for the target. I'm sure there have been enhancements in both 
stability and performance since then, we've just not gotten around to 
evaluating or changing it, as what we have is working well for us (we have 
mixed workloads, but generally hover around 500-800 active IOPS during the day, 
with peaks to 2-3k during off-hour maintenance times). We've been running for 
about 1.5 years with this setup, and no major issues. 

- Original Message -

From: "Nikhil Mitra (nikmitra)"  
To: "Bill Campbell"  
Cc: ceph-users@lists.ceph.com 
Sent: Monday, July 20, 2015 3:05:25 PM 
Subject: Re: [ceph-users] CEPH RBD with ESXi 

Hi Bill, 

Would you be kind enough to share how your setup looks like today as we are 
planning to use ESXi back-ended with CEPH storage. When you tested iSCSI what 
were the issues you noticed ? What version of CEPH were you running then ? What 
iSCSI software did you use for setup ? 

Regards, 
Nikhil Mitra 


From: "Campbell, Bill" < bcampb...@axcess-financial.com > 
Reply-To: "Campbell, Bill" < bcampb...@axcess-financial.com > 
Date: Monday, July 20, 2015 at 11:52 AM 
To: Nikhil Mitra < nikmi...@cisco.com > 
Cc: " ceph-users@lists.ceph.com " < ceph-users@lists.ceph.com > 
Subject: Re: [ceph-users] CEPH RBD with ESXi 

We use VMware with Ceph, however we don't use RBD directly (we have an NFS 
server which has RBD volumes exported as datastores in VMware). We did attempt 
iSCSI with RBD to connect to VMware but ran into stability issues (could have 
been the target software we were using) but have found NFS to be pretty 
reliable. 

- Original Message -

From: "Nikhil Mitra (nikmitra)" < nikmi...@cisco.com > 
To: ceph-users@lists.ceph.com 
Sent: Monday, July 20, 2015 2:07:13 PM 
Subject: [ceph-users] CEPH RBD with ESXi 

Hi, 

Has anyone implemented using CEPH RBD with Vmware ESXi hypervisor. Just looking 
to use it as a native VMFS datastore to host VMDK’s. Please let me know if 
there are any documents out there that might point me in the right direction to 
get started on this. Thank you. 

Regards, 
Nikhil Mitra 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies. 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH RBD with ESXi

2015-07-20 Thread Campbell, Bill
We use VMware with Ceph, however we don't use RBD directly (we have an NFS 
server which has RBD volumes exported as datastores in VMware). We did attempt 
iSCSI with RBD to connect to VMware but ran into stability issues (could have 
been the target software we were using) but have found NFS to be pretty 
reliable. 

- Original Message -

From: "Nikhil Mitra (nikmitra)"  
To: ceph-users@lists.ceph.com 
Sent: Monday, July 20, 2015 2:07:13 PM 
Subject: [ceph-users] CEPH RBD with ESXi 

Hi, 

Has anyone implemented using CEPH RBD with Vmware ESXi hypervisor. Just looking 
to use it as a native VMFS datastore to host VMDK’s. Please let me know if 
there are any documents out there that might point me in the right direction to 
get started on this. Thank you. 

Regards, 
Nikhil Mitra 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Concurrency in ceph

2014-11-18 Thread Campbell, Bill
I can't speak for OpenStack, but OpenNebula uses Libvirt/QEMU/KVM to access an 
RBD directly for each virtual instance deployed, live-migration included (as 
each RBD is in and of itself a separate block device, not file system). I would 
imagine OpenStack works in a similar fashion. 

- Original Message -

From: "hp cre"  
To: "Gregory Farnum"  
Cc: ceph-users@lists.ceph.com 
Sent: Tuesday, November 18, 2014 4:43:07 PM 
Subject: Re: [ceph-users] Concurrency in ceph 



Ok thanks Greg. 
But what openstack does, AFAIU, is use rbd devices directly, one for each Vm 
instance, right? And that's how it supports live migrations on KVM, etc.. 
Right? Openstack and similar cloud frameworks don't need to create vm instances 
on filesystems, am I correct? 
On 18 Nov 2014 23:33, "Gregory Farnum" < g...@gregs42.com > wrote: 


On Tue, Nov 18, 2014 at 1:26 PM, hp cre < hpc...@gmail.com > wrote: 
> Hello everyone, 
> 
> I'm new to ceph but been working with proprietary clustered filesystem for 
> quite some time. 
> 
> I almost understand how ceph works, but have a couple of questions which 
> have been asked before here, but i didn't understand the answer. 
> 
> In the closed source world, we use clustered filesystems like Veritas 
> clustered filesystem to mount a shared block device (using San) to more than 
> one compute node concurrently for shared read/write. 
> 
> What I can't seem to get a solid and clear answer for its this.. 
> How can I use ceph to do the same thing? Can RADOS guarantee coherency and 
> integrity of my data if I use an rbd device with any filesystem on top of 
> it? Or must I still use a cluster aware filesystem such as vxfs or ocfs? 

RBD behaves just like a regular disk if you mount it to multiple nodes 
at once (although you need to disable the client caching). This means 
that the disk accesses will be coherent, but using ext4 on top of it 
won't work because ext4 assumes it is the only accessor — you have to 
use a cluster-aware FS like ocfs2. A SAN would have the same problem 
here, so I'm not sure why you think it works with them... 


> And is CephFS going to some this problem? Or does it not have support for 
> concurrent read/write access among all now mounting it? 

CephFS definitely does support concurrent access to the same data. 

> And, does iscsi targets over rbd devices behave the same? 

Uh, yes, iSCSI over rbd will be the same as regular RBD in this 
regard, modulo anything the iSCSI gateway might be set up to do. 
-Greg 




___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RBD import format 1 & 2

2014-07-25 Thread Campbell, Bill
When you run qemu-img you are essentially converting the qcow2 image to
the appropriate raw format during the conversion and import process to the
cluster.  When you use rbd import you are not doing a conversion, so the
image is being imported AS IS (you can validate this by looking at the
size of the image after importing).  In order to get to format 2 initially
you may need to convert the qcow2 to raw first, then import.
Unfortunately I don’t think qemu-img supports outputting to stdout, so
this will have to be a two-step process.



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
NEVEU Stephane
Sent: Friday, July 25, 2014 8:57 AM
To: NEVEU Stephane; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] RBD import format 1 & 2



I finally reconverted my only “format 1” image into format 2 so now
everything is in format 2, but I’m still confused, my vm disks are still
readonly (I’ve tried different images centos 6.5 with kernel 2.6.32 and
ubuntu with 3.13), do I have to modprobe rbd on the host ?







De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de
NEVEU Stephane
Envoyé : vendredi 25 juillet 2014 13:45
À : ceph-users@lists.ceph.com
Objet : [ceph-users] RBD import format 1 & 2





Hi all,



One quick question about image format 1 & 2 :



I’ve got a img.qcow2 and I want to convert it :



The first solution is qemu-img convert –f qcow2 –O rbd img.qcow2
rbd:/mypool/myimage



As far as I understood It will converted into format 1 which is the
default one so I won’t be able to clone my image.



Second solution is to import it directly into format 2 :

Rbd import –image-format 2 img.qcow2 mypool/myimage



But in this case, when I start my VM, my vm / filesystem turns readonly
with many buffer IO error on dm-0.



I’m running Ubuntu 14.04 for both kvm host and VMs so kernel version is
3.13.0-30



Any idea ?

Thx


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph

2013-12-06 Thread Campbell, Bill
I think the version of Libvirt included with RHEL/CentOS supports RBD storage 
(but not pools), so outside of compiling a newer version not sure there can be 
anything else done aside from waiting for repo additions/newer versions of the 
distro. 

Not sure what your scenario is, but this is the exact reason we switched our 
underlying virtualization infrastructure to Ubuntu. Their cloud archive PPA has 
updated packages for QEMU/KVM, Libvirt, Open vSwitch, etc. that are backported 
for LTS releases, and is something I personally think RHEL is WAY behind the 
curve on (getting better with their RDO initiative though). We didn't like 
consuming resources validating that updated builds of QEMU/Libvirt were going 
to cause problems and just allocated those resources to learning the Ubuntu 
environment. 

As far as streamlining management on top of that, you have some options 
(outside of virt-manager, which has no native support for RBD IIRC) like 
Proxmox (which is an entire solution like ESXi/Hyper-V using KVM) or something 
like OpenStack or OpenNebula (we use OpenNebula). Beats having to edit domains 
by hand. ;-) 

- Original Message -

From: "Chris C"  
To: "Dan van der Ster"  
Cc: ceph-users@lists.ceph.com 
Sent: Friday, December 6, 2013 10:37:03 AM 
Subject: Re: [ceph-users] centos6.4 + libvirt + qemu + rbd/ceph 

Dan, 
I found the thread but it looks like another dead end :( 

/Chris C 


On Fri, Dec 6, 2013 at 4:46 AM, Dan van der Ster < d...@vanderster.com > wrote: 


See thread a couple days ago "[ceph-users] qemu-kvm packages for centos" 

On Thu, Dec 5, 2013 at 10:44 PM, Chris C < mazzy...@gmail.com > wrote: 
> I've been working on getting this setup working. I have virtual machines 
> working using rbd based images by editing the domain directly. 
> 
> Is there any way to make the creation process better? We are hoping to be 
> able to use a virsh pool using the rbd driver but it appears that Redhat has 
> not compiled libvirt with rbd support. 
> 
> Thought? 
> 
> Thanks, 
> /Chris C 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 





___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Campbell, Bill
As Gregory mentioned, your 'dd' test looks to be reading from the cache (you are writing 8GB in, and then reading that 8GB out, so the reads are all cached reads) so the performance is going to seem good.  You can add the 'oflag=direct' to your dd test to try and get a more accurate reading from that.  RADOS performance from what I've seen is largely going to hinge on replica size and journal location.  Are your journals on separate disks or on the same disk as the OSD?  What is the replica size of your pool?From: "Jason Villalta" To: "Bill Campbell" Cc: "Gregory Farnum" , "ceph-users" Sent: Tuesday, September 17, 2013 11:31:43 AMSubject: Re: [ceph-users] Ceph performance with 8K blocks.Thanks for you feed back it is helpful.I may have been wrong about the default windows block size.  What would be the best tests to compare native performance of the SSD disks at 4K blocks vs Ceph performance with 4K blocks?  It just seems their is a huge difference in the results.
On Tue, Sep 17, 2013 at 10:56 AM, Campbell, Bill <bcampb...@axcess-financial.com> wrote:
Windows default (NTFS) is a 4k block.  Are you changing the allocation unit to 8k as a default for your configuration?
From: "Gregory Farnum" <g...@inktank.com>
To: "Jason Villalta" <ja...@rubixnet.com>Cc: ceph-users@lists.ceph.com
Sent: Tuesday, September 17, 2013 10:40:09 AMSubject: Re: [ceph-users] Ceph performance with 8K blocks.Your 8k-block dd test is not nearly the same as your 8k-block rados bench or SQL tests. Both rados bench and SQL require the write to be committed to disk before moving on to the next one; dd is simply writing into the page cache. So you're not going to get 460 or even 273MB/s with sync 8k writes regardless of your settings.

However, I think you should be able to tune your OSDs into somewhat better numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small pipeline!), and an SSD-based daemon should be going faster. What kind of logging are you running with and what configs have you set?

Hopefully you can get Mark or Sam or somebody who's done some performance tuning to offer some tips as well. :)-GregOn Tuesday, September 17, 2013, Jason Villalta  wrote:

Hello all,  I am new to the list.I have a single machines setup for testing Ceph.  It has a dual proc 6 cores(12core total) for CPU and 128GB of RAM.  I also have 3 Intel 520 240GB SSDs and an OSD setup on each disk with the OSD and Journal in separate partitions formatted with ext4.



My goal here is to prove just how fast Ceph can go and what kind of performance to expect when using it as a back-end storage for virtual machines mostly windows.  I would also like to try to understand how it will scale IO by removing one disk of the three and doing the benchmark tests.  But that is secondary.  So far here are my results.  I am aware this is all sequential, I just want to know how fast it can go.



DD IO test of SSD disks:  I am testing 8K blocks since that is the default block size of windows. dd of=ddbenchfile if=/dev/zero bs=8K count=100819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s


dd if=ddbenchfile of=/dev/null bs=8K819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s
RADOS bench test with 3 SSD disks and 4MB object size(Default):

rados --no-cleanup bench -p pbench 30 writeTotal writes made:      2061Write size:             4194304Bandwidth (MB/sec):     273.004Stddev Bandwidth:       67.5237


Max bandwidth (MB/sec): 352Min bandwidth (MB/sec): 0Average Latency:        0.234199Stddev Latency:         0.130874Max latency:            0.867119Min latency:            0.039318


-rados bench -p pbench 30 seqTotal reads made:     2061Read size:            4194304Bandwidth (MB/sec):    956.466Average Latency:       0.0666347


Max latency:           0.208986Min latency:           0.011625This all looks like I would expect from using three disks.  The problems appear to come with the 8K blocks/object size.


RADOS bench test with 3 SSD disks and 8K object size(8K blocks):rados --no-cleanup bench -b 8192 -p pbench 30 writeTotal writes made:      13770Write size:             8192


Bandwidth (MB/sec):     3.581Stddev Bandwidth:       1.04405Max bandwidth (MB/sec): 6.19531Min bandwidth (MB/sec): 0Average Latency:        0.0348977


Stddev Latency:         0.0349212Max latency:            0.326429Min latency:            0.0019--rados bench -b 8192 -p pbench 30 seqTotal reads made:     13770


Read size:            8192Bandwidth (MB/sec):    52.573Average Latency:       0.00237483Max latency:           0.006783Min latency:           0.000521


So are these performance correct or is this something I missed with the testing procedure?  The RADOS bench number with 8K block size are the same we see when testing performance in an VM with SQLIO.  Does anyone know of any configure changes that are needed to get the Ceph performance close

Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-17 Thread Campbell, Bill
Windows default (NTFS) is a 4k block. Are you changing the allocation unit to 
8k as a default for your configuration? 

- Original Message -

From: "Gregory Farnum"  
To: "Jason Villalta"  
Cc: ceph-users@lists.ceph.com 
Sent: Tuesday, September 17, 2013 10:40:09 AM 
Subject: Re: [ceph-users] Ceph performance with 8K blocks. 

Your 8k-block dd test is not nearly the same as your 8k-block rados bench or 
SQL tests. Both rados bench and SQL require the write to be committed to disk 
before moving on to the next one; dd is simply writing into the page cache. So 
you're not going to get 460 or even 273MB/s with sync 8k writes regardless of 
your settings. 

However, I think you should be able to tune your OSDs into somewhat better 
numbers -- that rados bench is giving you ~300IOPs on every OSD (with a small 
pipeline!), and an SSD-based daemon should be going faster. What kind of 
logging are you running with and what configs have you set? 

Hopefully you can get Mark or Sam or somebody who's done some performance 
tuning to offer some tips as well. :) 
-Greg 

On Tuesday, September 17, 2013, Jason Villalta wrote: 



Hello all, 
I am new to the list. 

I have a single machines setup for testing Ceph. It has a dual proc 6 
cores(12core total) for CPU and 128GB of RAM. I also have 3 Intel 520 240GB 
SSDs and an OSD setup on each disk with the OSD and Journal in separate 
partitions formatted with ext4. 

My goal here is to prove just how fast Ceph can go and what kind of performance 
to expect when using it as a back-end storage for virtual machines mostly 
windows. I would also like to try to understand how it will scale IO by 
removing one disk of the three and doing the benchmark tests. But that is 
secondary. So far here are my results. I am aware this is all sequential, I 
just want to know how fast it can go. 

DD IO test of SSD disks: I am testing 8K blocks since that is the default block 
size of windows. 
dd of=ddbenchfile if=/dev/zero bs=8K count=100 
819200 bytes (8.2 GB) copied, 17.7953 s, 460 MB/s 

dd if=ddbenchfile of=/dev/null bs=8K 
819200 bytes (8.2 GB) copied, 2.94287 s, 2.8 GB/s 

RADOS bench test with 3 SSD disks and 4MB object size(Default): 
rados --no-cleanup bench -p pbench 30 write 
Total writes made: 2061 
Write size: 4194304 
Bandwidth (MB/sec): 273.004 

Stddev Bandwidth: 67.5237 
Max bandwidth (MB/sec): 352 
Min bandwidth (MB/sec): 0 
Average Latency: 0.234199 
Stddev Latency: 0.130874 
Max latency: 0.867119 
Min latency: 0.039318 
- 
rados bench -p pbench 30 seq 
Total reads made: 2061 
Read size: 4194304 
Bandwidth (MB/sec): 956.466 

Average Latency: 0.0666347 
Max latency: 0.208986 
Min latency: 0.011625 

This all looks like I would expect from using three disks. The problems appear 
to come with the 8K blocks/object size. 

RADOS bench test with 3 SSD disks and 8K object size(8K blocks): 
rados --no-cleanup bench -b 8192 -p pbench 30 write 
Total writes made: 13770 
Write size: 8192 
Bandwidth (MB/sec): 3.581 

Stddev Bandwidth: 1.04405 
Max bandwidth (MB/sec): 6.19531 
Min bandwidth (MB/sec): 0 
Average Latency: 0.0348977 
Stddev Latency: 0.0349212 
Max latency: 0.326429 
Min latency: 0.0019 
-- 
rados bench -b 8192 -p pbench 30 seq 
Total reads made: 13770 
Read size: 8192 
Bandwidth (MB/sec): 52.573 

Average Latency: 0.00237483 
Max latency: 0.006783 
Min latency: 0.000521 

So are these performance correct or is this something I missed with the testing 
procedure? The RADOS bench number with 8K block size are the same we see when 
testing performance in an VM with SQLIO. Does anyone know of any configure 
changes that are needed to get the Ceph performance closer to native 
performance with 8K blocks? 

Thanks in advance. 



-- 
-- 
Jason Villalta 
Co-founder 
800.799.4407x1230 | www.RubixTechnology.com 





-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Web Management Interface

2013-05-14 Thread Campbell, Bill
Hello,
I was wondering if there were any plans in the near future for some sort of 
Web-based management interface for Ceph clusters?


Bill Campbell 
Infrastructure Architect 

Axcess Financial Services, Inc. 
7755 Montgomery Rd., Suite 400 
Cincinnati, OH 45236 

NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Space available reported on Ceph file system

2013-03-15 Thread Campbell, Bill
Yes, that is the TOTAL amount in the cluster.

For example, if you have a replica size of '3' , 81489 GB available, and
you write 1 GB of data, then that data is written to the cluster 3 times,
so your total available will be 81486 GB.  It definitely threw me off at
first, but seeing as you can have multiple pools with different replica
sizes it makes sense to report the TOTAL cluster availability, rather than
trying to calculate how much is available based on replica size.

-Original Message-
From: ceph-users-boun...@lists.ceph.com
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Marco Aroldi
Sent: Friday, March 15, 2013 3:49 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Space available reported on Ceph file system

Hi,
I have a test cluster of 80Tb raw.
My pools are using rep size = 2, so the real storage capacity is 40Tb but
I see in pgmap a total of 80Tb available and also the cephfs mounted on a
client reports 80Tb available too I would expect to see somewhere a "40Tb
available"

Is this behavior correct?
Thanks

pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num
2880 pgp_num 2880 last_change 1 owner 0 crash_replay_interval 45

pgmap v796: 8640 pgs: 8640 active+clean; 8913 bytes data, 1770 MB used,
81489 GB / 81491 GB avail; 229B/s wr, 0op/s

root@client1 ~ $ df -h
Filesystem  Size  Used Avail Use% Mounted on
192.168.21.12:6789:/   80T  1,8G 80T   1% /mnt/ceph

--
Marco Aroldi
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
NOTICE: Protect the information in this message in accordance with the 
company's security policies. If you received this message in error, immediately 
notify the sender and destroy all copies.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com