OSD Weights
Hi Everyone, I just wanted to confirm my thoughts on the ceph osd weightings. My understanding is they are a statistical distribution number. My current setup has 3TB hard drives and they all have the default weight of 1. I was thinking that if I mixed in 4TB hard drives in the future it would only put 3TB of data on them. I thought if I changed the weight to 3 for the 3TB hard drives and 4 for the 4TB hard drives it would correctly use the larger storage disks. Is that correct? Thanks, Chris NOTICE: This e-mail and any attachments is intended only for use by the addressee(s) named herein and may contain legally privileged, proprietary or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, is strictly prohibited. If you receive this email in error please immediately notify me via reply email or at (800) 927-9800 and permanently delete the original copy and any copy of any e-mail, and any printout. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RadosGW Quota
Does the RadosGW have the ability to limit how much data clients(users) can upload to it? I'm looking for a way to implement quotas in case someone decides to upload the world on my cluster and break it -Chris NOTICE: This e-mail and any attachments is intended only for use by the addressee(s) named herein and may contain legally privileged, proprietary or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, is strictly prohibited. If you receive this email in error please immediately notify me via reply email or at (800) 927-9800 and permanently delete the original copy and any copy of any e-mail, and any printout. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: questions on networks and hardware
Hi John, I have a public/cluster network option setup in my config file. You do not need to also specify an addr for each osd individually. Here's an example of my working config: [global] auth cluster required = none auth service required = none auth client required = none public network = 172.20.41.0/25 cluster network = 172.20.41.128/25 osd mkfs type = xfs [osd] osd journal size = 1000 filestore max sync interval = 30 [mon.a] host = plcephd01 mon addr = 172.20.41.4:6789 [mon.b] host = plcephd03 mon addr = 172.20.41.6:6789 [mon.c] host = plcephd05 mon addr = 172.20.41.8:6789 [osd.0] host = plcephd01 devs = /dev/sda3 [osd.X] and so on... -Chris -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of John Nielsen Sent: Friday, January 18, 2013 6:35 PM To: ceph-devel@vger.kernel.org Subject: questions on networks and hardware I'm planning a Ceph deployment which will include: 10Gbit/s public/client network 10Gbit/s cluster network dedicated mon hosts (3 to start) dedicated storage hosts (multiple disks, one XFS and OSD per disk, 3-5 to start) dedicated RADOS gateway host (1 to start) I've done some initial testing and read through most of the docs but I still have a few questions. Please respond even if you just have a suggestion or response for one of them. If I have "cluster network" and "public network" entries under [global] or [osd], do I still need to specify "public addr" and "cluster addr" for each OSD individually? Which network(s) should the monitor hosts be on? If both, is it valid to have more than one "mon addr" entry per mon host or is there a different way to do it? Is it worthwhile to have 10G NIC's on the monitor hosts? (The storage hosts will each have 2x 10Gbit/s NIC's.) I'd like to have 2x 10Gbit/s NIC's on the gateway host and maximize throughput. Any suggestions on how to best do that? I'm assuming it will talk to the OSD's on the Ceph public/client network, so does that imply a third even-more-public network for the gateway's clients? I think this has come up before, but has anyone written up something with more details on setting up gateways? Hardware recommendations, strategies to improve caching and performance, multiple gateway setups with and without a load balancer, etc. Thanks! JN -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html NOTICE: This e-mail and any attachments is intended only for use by the addressee(s) named herein and may contain legally privileged, proprietary or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, is strictly prohibited. If you receive this email in error please immediately notify me via reply email or at (800) 927-9800 and permanently delete the original copy and any copy of any e-mail, and any printout. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Slow requests
I heard the solution for this was to restart the osd's. That fixed it for me. -Chris -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Jens Kristian Søgaard Sent: Sunday, December 16, 2012 9:00 AM To: ceph-devel@vger.kernel.org Subject: Slow requests Hi, My log is filling up with warnings about a single slow request that has been around for a very long time: osd.1 10.0.0.2:6800/900 162926 : [WRN] 1 slow requests, 1 included below; oldest blocked for > 84446.312051 secs osd.1 10.0.0.2:6800/900 162927 : [WRN] slow request 84446.312051 seconds old, received at 2012-12-15 15:27:56.891437: osd_sub_op(client.4528.0:19602219 0.fe 3807b5fe/rb.0.11b7.4a933baa.0008629e/head//0 [] v 53'185888 snapset=0=[]:[] snapc=0=[]) v7 currently started How can I identify the cause of this and how can I cancel this request? I'm running Ceph on Fedora 17 using the latest RPMs available from ceph.com (0.52-6). Thanks in advance, -- Jens Kristian Søgaard, Mermaid Consulting ApS, j...@mermaidconsulting.dk, http://www.mermaidconsulting.com/ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html NOTICE: This e-mail and any attachments is intended only for use by the addressee(s) named herein and may contain legally privileged, proprietary or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, is strictly prohibited. If you receive this email in error please immediately notify me via reply email or at (800) 927-9800 and permanently delete the original copy and any copy of any e-mail, and any printout. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: New Project with Ceph
iSCSI won't be used in this project. I'll be using fibre channel. I would think that the VMware initiator is the one that needs to be concerned about multipathing. As long as it is aware there are two paths it won't write to both of them at the same time. I'll find out shortly when I rack the second proxy machine and see how it performs when I kill the network to one of them. I'm reading through pacemaker like Sebastien suggested and I agree that is the way to go for getting rbd images to persist past reboot. Thanks, Chris -Original Message- From: Dennis Jacobfeuerborn [mailto:denni...@conversis.de] Sent: Friday, November 23, 2012 7:41 PM To: Holcombe, Christopher Cc: Sebastien HAN; ceph-devel@vger.kernel.org Subject: Re: New Project with Ceph While LIO knows about multipathing this is only usable on a single machine. iSCSI is a statefull protocol so the target needs to explicitly support clustering and that is not the case for any of the available open source target daemons. Regards, Dennis On 11/23/2012 07:06 PM, Holcombe, Christopher wrote: > Hi Sebastien, > > Yes LIO knows about multipathing and shouldn't have a problem with 2 > machines. I'm going to test the heck out of it just to be safe! So > Pacemaker is what I should research next to remount rbd's after a reboot or a > proxy crash you're saying? I haven't done anything with Pacemaker so that > would be new territory for me. > > I didn't know about RBD devices scaling better at small sizes. Thanks for > the tip! I think our group would have no problem with smaller devises of > 250GB or 500GB. Are you talking much smaller than that for iops? > > Thanks! > > -Original Message- > From: Sebastien HAN [mailto:han.sebast...@gmail.com] > Sent: Friday, November 23, 2012 12:53 PM > To: Holcombe, Christopher > Cc: ceph-devel@vger.kernel.org > Subject: Re: New Project with Ceph > > Hi, > > Your project seems nice, nothing really new in term of integration but quite > promissing. I also think it's good idea that people start to speak about > their project, you can get input from the community. > > It's fairly easy to make an RBD device surviving a reboot. I assume that your > iSCSI export will be handle by at least 2 machines for HA purpose. Thus you > will use Pacemaker, there is already à RA for that and it's part of the ceph > package as well ;-). If a server crash the device is re-mappped on the other > server and can possibily failback when first node comes back online. On top > of the stack you could use the RA for Lio and even do multi-pathing. > > In term of RBD size, if you can use smaller devices. Thank to this you will > get more IOPS operations. Since RBD devices are stripped over objects. My > benchmarks (and I'm not the only one) showed me that multiple RBD scale > better. > > I wish you the best for your project. ;-) > > Cheers! > > On 23 nov. 2012, at 18:31, "Holcombe, Christopher" > wrote: > >> Hi Everyone, >> >> First email here to the developer Ceph mailing list. Some of you may know >> me from the irc channel under the handle 'noob2' . I hang out there every >> once in a while to ask questions and share knowledge. Last week I discussed >> a project I am working on in the irc channel. Scuttlemonkey suggested I >> send an email off to this list with the possibility of a guest entry on the >> Ceph blog! Let me describe what I am trying to accomplish: >> >> Background : VMware Storage using Ceph. After discovering Ceph I thought of >> several uses for it. Storage is really expensive for enterprise customers >> and it doesn't need to be. Going back to first principles results in the >> conclusion that storage hardware is very cheap now. About 5% to 10% what >> enterprise customers are paying. With that in mind I realized there is >> great room for improvement. Most of the storage we use is carried over a >> brocade fibre network and I think Ceph is perfect for this task. What is >> needed is a proxy to merge the rados back end to the fibre network. I used >> LIO on a previous project and had a theory that I could use it to meet our >> storage needs with Ceph. At some point in the future we will direct mount >> rbd over the network but we are not ready for that yet. >> >> Design: Ceph already did most of the heavy lifting for me. Triple >> replication, self-healing, interaction through the kernel as a block device >> and ability to scale easily with commodity servers. My production Ceph >> cluster which I'm still in the process of getting
RE: New Project with Ceph
Hi Sebastien, Yes LIO knows about multipathing and shouldn't have a problem with 2 machines. I'm going to test the heck out of it just to be safe! So Pacemaker is what I should research next to remount rbd's after a reboot or a proxy crash you're saying? I haven't done anything with Pacemaker so that would be new territory for me. I didn't know about RBD devices scaling better at small sizes. Thanks for the tip! I think our group would have no problem with smaller devises of 250GB or 500GB. Are you talking much smaller than that for iops? Thanks! -Original Message- From: Sebastien HAN [mailto:han.sebast...@gmail.com] Sent: Friday, November 23, 2012 12:53 PM To: Holcombe, Christopher Cc: ceph-devel@vger.kernel.org Subject: Re: New Project with Ceph Hi, Your project seems nice, nothing really new in term of integration but quite promissing. I also think it's good idea that people start to speak about their project, you can get input from the community. It's fairly easy to make an RBD device surviving a reboot. I assume that your iSCSI export will be handle by at least 2 machines for HA purpose. Thus you will use Pacemaker, there is already à RA for that and it's part of the ceph package as well ;-). If a server crash the device is re-mappped on the other server and can possibily failback when first node comes back online. On top of the stack you could use the RA for Lio and even do multi-pathing. In term of RBD size, if you can use smaller devices. Thank to this you will get more IOPS operations. Since RBD devices are stripped over objects. My benchmarks (and I'm not the only one) showed me that multiple RBD scale better. I wish you the best for your project. ;-) Cheers! On 23 nov. 2012, at 18:31, "Holcombe, Christopher" wrote: > Hi Everyone, > > First email here to the developer Ceph mailing list. Some of you may know me > from the irc channel under the handle 'noob2' . I hang out there every once > in a while to ask questions and share knowledge. Last week I discussed a > project I am working on in the irc channel. Scuttlemonkey suggested I send > an email off to this list with the possibility of a guest entry on the Ceph > blog! Let me describe what I am trying to accomplish: > > Background : VMware Storage using Ceph. After discovering Ceph I thought of > several uses for it. Storage is really expensive for enterprise customers > and it doesn't need to be. Going back to first principles results in the > conclusion that storage hardware is very cheap now. About 5% to 10% what > enterprise customers are paying. With that in mind I realized there is great > room for improvement. Most of the storage we use is carried over a brocade > fibre network and I think Ceph is perfect for this task. What is needed is a > proxy to merge the rados back end to the fibre network. I used LIO on a > previous project and had a theory that I could use it to meet our storage > needs with Ceph. At some point in the future we will direct mount rbd over > the network but we are not ready for that yet. > > Design: Ceph already did most of the heavy lifting for me. Triple > replication, self-healing, interaction through the kernel as a block device > and ability to scale easily with commodity servers. My production Ceph > cluster which I'm still in the process of getting quotes for will be HP > DL180G6 servers. Each of these will house 12 3TB data drives connected to a > HP410 1GB flash backed write controller. In building some previous clusters > I learned that spending a little extra on the raid controller is usually > worth it. Our network contains 2 48 port gigabit switches in each rack for > redundancy. My plan is to use a 4 port gigabit network card and split the > replication traffic off from the client traffic. I plan on setting up 2 > 802.3ad aggregated links. That should give the server about 2x 1.9Gb/s of > bandwidth. We are currently short on 10Gb network ports but from what I'm > seeing in testing the HP raid cards can't handle enough data to make it worth > it. If that changes after tuning I can always upgrade. We are an HP shop so > my hands are a little tied. Next is the proxy machines. I'm going to reuse > 2 older HP dl380 G5 servers that we took out of service. One will be part of > the A fabric for the fibre and the other will be on the B fabric. This is > needed for redundancy so the fibre initiator can fail back and forth should > it need to. I plan on creating rbd blocks of 1TB each on the Ceph cluster, > mounting it on both of the proxy machines and exporting using LIO. LIO has > both block mode which can export any block device the kernel knows about or > file mod
New Project with Ceph
Hi Everyone, First email here to the developer Ceph mailing list. Some of you may know me from the irc channel under the handle 'noob2' . I hang out there every once in a while to ask questions and share knowledge. Last week I discussed a project I am working on in the irc channel. Scuttlemonkey suggested I send an email off to this list with the possibility of a guest entry on the Ceph blog! Let me describe what I am trying to accomplish: Background : VMware Storage using Ceph. After discovering Ceph I thought of several uses for it. Storage is really expensive for enterprise customers and it doesn't need to be. Going back to first principles results in the conclusion that storage hardware is very cheap now. About 5% to 10% what enterprise customers are paying. With that in mind I realized there is great room for improvement. Most of the storage we use is carried over a brocade fibre network and I think Ceph is perfect for this task. What is needed is a proxy to merge the rados back end to the fibre network. I used LIO on a previous project and had a theory that I could use it to meet our storage needs with Ceph. At some point in the future we will direct mount rbd over the network but we are not ready for that yet. Design: Ceph already did most of the heavy lifting for me. Triple replication, self-healing, interaction through the kernel as a block device and ability to scale easily with commodity servers. My production Ceph cluster which I'm still in the process of getting quotes for will be HP DL180G6 servers. Each of these will house 12 3TB data drives connected to a HP410 1GB flash backed write controller. In building some previous clusters I learned that spending a little extra on the raid controller is usually worth it. Our network contains 2 48 port gigabit switches in each rack for redundancy. My plan is to use a 4 port gigabit network card and split the replication traffic off from the client traffic. I plan on setting up 2 802.3ad aggregated links. That should give the server about 2x 1.9Gb/s of bandwidth. We are currently short on 10Gb network ports but from what I'm seeing in testing the HP raid cards can't handle enough data to make it worth it. If that changes after tuning I can always upgrade. We are an HP shop so my hands are a little tied. Next is the proxy machines. I'm going to reuse 2 older HP dl380 G5 servers that we took out of service. One will be part of the A fabric for the fibre and the other will be on the B fabric. This is needed for redundancy so the fibre initiator can fail back and forth should it need to. I plan on creating rbd blocks of 1TB each on the Ceph cluster, mounting it on both of the proxy machines and exporting using LIO. LIO has both block mode which can export any block device the kernel knows about or file mode which can export a file as a block device. My testing has shown that VMware can mount this storage, vmotion vm's onto it and use it like any other SAN storage. The only challenge I have at this point is getting the rbd devices to survive a reboot on the proxy machines. I also will have to train the other admins on how to use it. It is certainly more complicated than SAN storage we are used to but that shouldn't stop me. I can build a web interface on top of this using django. If I can achieve these without too much difficulty than Ceph is truly an enterprise storage replacement. That's my project at a high level. Ceph has many uses but I'm finding this use the most interesting at the moment. When it is all finished it should save us over 90% on storage costs going forward. If anyone knows how I could go about getting Ubuntu to save rbd mappings after a reboot that would be really helpful. Thank you guys for your hard work! Chris Holcombe Unix Administrator Corporation Service Company cholc...@cscinfo.com 302-636-8667 NOTICE: This e-mail and any attachments is intended only for use by the addressee(s) named herein and may contain legally privileged, proprietary or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this email, and any attachments thereto, is strictly prohibited. If you receive this email in error please immediately notify me via reply email or at (800) 927-9800 and permanently delete the original copy and any copy of any e-mail, and any printout. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html