OSD Weights

2013-02-11 Thread Holcombe, Christopher
Hi Everyone,

I just wanted to confirm my thoughts on the ceph osd weightings.  My 
understanding is they are a statistical distribution number.  My current setup 
has 3TB hard drives and they all have the default weight of 1.  I was thinking 
that if I mixed in 4TB hard drives in the future it would only put 3TB of data 
on them.  I thought if I changed the weight to 3 for the 3TB hard drives and 4 
for the 4TB hard drives it would correctly use the larger storage disks.  Is 
that correct?

Thanks,
Chris



NOTICE: This e-mail and any attachments is intended only for use by the 
addressee(s) named herein and may contain legally privileged, proprietary or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
email, and any attachments thereto, is strictly prohibited. If you receive this 
email in error please immediately notify me via reply email or at (800) 
927-9800 and permanently delete the original copy and any copy of any e-mail, 
and any printout.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RadosGW Quota

2013-02-07 Thread Holcombe, Christopher
Does the RadosGW have the ability to limit how much data clients(users) can 
upload to it?  I'm looking for a way to implement quotas in case someone 
decides to upload the world on my cluster and break it

-Chris




NOTICE: This e-mail and any attachments is intended only for use by the 
addressee(s) named herein and may contain legally privileged, proprietary or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
email, and any attachments thereto, is strictly prohibited. If you receive this 
email in error please immediately notify me via reply email or at (800) 
927-9800 and permanently delete the original copy and any copy of any e-mail, 
and any printout.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: questions on networks and hardware

2013-01-19 Thread Holcombe, Christopher
Hi John,

I have a public/cluster network option setup in my config file.  You do not 
need to also specify an addr for each osd individually.  Here's an example of 
my working config:
[global]
auth cluster required = none
auth service required = none
auth client required = none
public network = 172.20.41.0/25
cluster network = 172.20.41.128/25
osd mkfs type = xfs
[osd]
osd journal size = 1000
filestore max sync interval = 30

[mon.a]
host = plcephd01
mon addr = 172.20.41.4:6789
[mon.b]
host = plcephd03
mon addr = 172.20.41.6:6789
[mon.c]
host = plcephd05
mon addr = 172.20.41.8:6789

[osd.0]
host = plcephd01
devs = /dev/sda3
[osd.X]
 and so on...


-Chris
-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of John Nielsen
Sent: Friday, January 18, 2013 6:35 PM
To: ceph-devel@vger.kernel.org
Subject: questions on networks and hardware

I'm planning a Ceph deployment which will include:
10Gbit/s public/client network
10Gbit/s cluster network
dedicated mon hosts (3 to start)
dedicated storage hosts (multiple disks, one XFS and OSD per disk, 3-5 
to start)
dedicated RADOS gateway host (1 to start)

I've done some initial testing and read through most of the docs but I still 
have a few questions. Please respond even if you just have a suggestion or 
response for one of them.

If I have "cluster network" and "public network" entries under [global] or 
[osd], do I still need to specify "public addr" and "cluster addr" for each OSD 
individually?

Which network(s) should the monitor hosts be on? If both, is it valid to have 
more than one "mon addr" entry per mon host or is there a different way to do 
it?

Is it worthwhile to have 10G NIC's on the monitor hosts? (The storage hosts 
will each have 2x 10Gbit/s NIC's.)

I'd like to have 2x 10Gbit/s NIC's on the gateway host and maximize throughput. 
Any suggestions on how to best do that? I'm assuming it will talk to the OSD's 
on the Ceph public/client network, so does that imply a third even-more-public 
network for the gateway's clients?

I think this has come up before, but has anyone written up something with more 
details on setting up gateways? Hardware recommendations, strategies to improve 
caching and performance, multiple gateway setups with and without a load 
balancer, etc.

Thanks!

JN

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the 
body of a message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html



NOTICE: This e-mail and any attachments is intended only for use by the 
addressee(s) named herein and may contain legally privileged, proprietary or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
email, and any attachments thereto, is strictly prohibited. If you receive this 
email in error please immediately notify me via reply email or at (800) 
927-9800 and permanently delete the original copy and any copy of any e-mail, 
and any printout.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Slow requests

2012-12-16 Thread Holcombe, Christopher
I heard the solution for this was to restart the osd's.  That fixed it for me.

-Chris

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Jens Kristian Søgaard
Sent: Sunday, December 16, 2012 9:00 AM
To: ceph-devel@vger.kernel.org
Subject: Slow requests

Hi,

My log is filling up with warnings about a single slow request that has been 
around for a very long time:

osd.1 10.0.0.2:6800/900 162926 : [WRN] 1 slow requests, 1 included below; 
oldest blocked for > 84446.312051 secs

osd.1 10.0.0.2:6800/900 162927 : [WRN] slow request 84446.312051 seconds old, 
received at 2012-12-15 15:27:56.891437:
osd_sub_op(client.4528.0:19602219 0.fe
3807b5fe/rb.0.11b7.4a933baa.0008629e/head//0 [] v 53'185888 snapset=0=[]:[] 
snapc=0=[]) v7 currently started


How can I identify the cause of this and how can I cancel this request?

I'm running Ceph on Fedora 17 using the latest RPMs available from ceph.com 
(0.52-6).


Thanks in advance,
--
Jens Kristian Søgaard, Mermaid Consulting ApS,
j...@mermaidconsulting.dk,
http://www.mermaidconsulting.com/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



NOTICE: This e-mail and any attachments is intended only for use by the 
addressee(s) named herein and may contain legally privileged, proprietary or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
email, and any attachments thereto, is strictly prohibited. If you receive this 
email in error please immediately notify me via reply email or at (800) 
927-9800 and permanently delete the original copy and any copy of any e-mail, 
and any printout.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: New Project with Ceph

2012-11-23 Thread Holcombe, Christopher
iSCSI won't be used in this project.  I'll be using fibre channel.  I would 
think that the VMware initiator is the one that needs to be concerned about 
multipathing.  As long as it is aware there are two paths it won't write to 
both of them at the same time.  I'll find out shortly when I rack the second 
proxy machine and see how it performs when I kill the network to one of them.  
I'm reading through pacemaker like Sebastien suggested and I agree that is the 
way to go for getting rbd images to persist past reboot.  

Thanks,
Chris

-Original Message-
From: Dennis Jacobfeuerborn [mailto:denni...@conversis.de] 
Sent: Friday, November 23, 2012 7:41 PM
To: Holcombe, Christopher
Cc: Sebastien HAN; ceph-devel@vger.kernel.org
Subject: Re: New Project with Ceph

While LIO knows about multipathing this is only usable on a single machine.
iSCSI is a statefull protocol so the target needs to explicitly support 
clustering and that is not the case for any of the available open source target 
daemons.

Regards,
  Dennis

On 11/23/2012 07:06 PM, Holcombe, Christopher wrote:
> Hi Sebastien,
> 
> Yes LIO knows about multipathing and shouldn't have a problem with 2 
> machines.   I'm going to test the heck out of it just to be safe!  So 
> Pacemaker is what I should research next to remount rbd's after a reboot or a 
> proxy crash you're saying?  I haven't done anything with Pacemaker so that 
> would be new territory for me.  
> 
> I didn't know about RBD devices scaling better at small sizes.  Thanks for 
> the tip!  I think our group would have no problem with smaller devises of 
> 250GB or 500GB.  Are you talking much smaller than that for iops?  
> 
> Thanks!
> 
> -Original Message-
> From: Sebastien HAN [mailto:han.sebast...@gmail.com]
> Sent: Friday, November 23, 2012 12:53 PM
> To: Holcombe, Christopher
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: New Project with Ceph
> 
> Hi,
> 
> Your project seems nice, nothing really new in term of integration but quite 
> promissing. I also think it's good idea that people start to speak about 
> their project, you can get input from the community. 
> 
> It's fairly easy to make an RBD device surviving a reboot. I assume that your 
> iSCSI export will be handle by at least 2 machines for HA purpose. Thus you 
> will use Pacemaker, there is already à RA for that and it's part of the ceph 
> package as well ;-). If a server crash the device is re-mappped on the other 
> server and can possibily failback when first node comes back online. On top 
> of the stack you could use the RA for Lio and even do multi-pathing. 
> 
> In term of RBD size, if you can use smaller devices. Thank to this you will 
> get more IOPS operations. Since RBD devices are stripped over objects. My 
> benchmarks (and I'm not the only one) showed me that multiple RBD scale 
> better. 
> 
> I wish you the best for your project. ;-)
> 
> Cheers! 
> 
> On 23 nov. 2012, at 18:31, "Holcombe, Christopher"  
> wrote:
> 
>> Hi Everyone,
>>
>> First email here to the developer Ceph mailing list.  Some of you may know 
>> me from the irc channel under the handle 'noob2' .  I hang out there every 
>> once in a while to ask questions and share knowledge. Last week I discussed 
>> a project I am working on in the irc channel.  Scuttlemonkey suggested I 
>> send an email off to this list with the possibility of a guest entry on the 
>> Ceph blog!  Let me describe what I am trying to accomplish:
>>
>> Background : VMware Storage using Ceph.  After discovering Ceph I thought of 
>> several uses for it.  Storage is really expensive for enterprise customers 
>> and it doesn't need to be.  Going back to first principles results in the 
>> conclusion that storage hardware is very cheap now.  About 5% to 10% what 
>> enterprise customers are paying.  With that in mind I realized there is 
>> great room for improvement.  Most of the storage we use is carried over a 
>> brocade fibre network and I think Ceph is perfect for this task.  What is 
>> needed is a proxy to merge the rados back end to the fibre network.  I used 
>> LIO on a previous project and had a theory that I could use it to meet our 
>> storage needs with Ceph.  At some point in the future we will direct mount 
>> rbd over the network but we are not ready for that yet.
>>
>> Design: Ceph already did most of the heavy lifting for me.  Triple 
>> replication, self-healing, interaction through the kernel as a block device 
>> and ability to scale easily with commodity servers.  My production Ceph 
>> cluster which I'm still in the process of getting 

RE: New Project with Ceph

2012-11-23 Thread Holcombe, Christopher
Hi Sebastien,

Yes LIO knows about multipathing and shouldn't have a problem with 2 machines.  
 I'm going to test the heck out of it just to be safe!  So Pacemaker is what I 
should research next to remount rbd's after a reboot or a proxy crash you're 
saying?  I haven't done anything with Pacemaker so that would be new territory 
for me.  

I didn't know about RBD devices scaling better at small sizes.  Thanks for the 
tip!  I think our group would have no problem with smaller devises of 250GB or 
500GB.  Are you talking much smaller than that for iops?  

Thanks!

-Original Message-
From: Sebastien HAN [mailto:han.sebast...@gmail.com] 
Sent: Friday, November 23, 2012 12:53 PM
To: Holcombe, Christopher
Cc: ceph-devel@vger.kernel.org
Subject: Re: New Project with Ceph

Hi,

Your project seems nice, nothing really new in term of integration but quite 
promissing. I also think it's good idea that people start to speak about their 
project, you can get input from the community. 

It's fairly easy to make an RBD device surviving a reboot. I assume that your 
iSCSI export will be handle by at least 2 machines for HA purpose. Thus you 
will use Pacemaker, there is already à RA for that and it's part of the ceph 
package as well ;-). If a server crash the device is re-mappped on the other 
server and can possibily failback when first node comes back online. On top of 
the stack you could use the RA for Lio and even do multi-pathing. 

In term of RBD size, if you can use smaller devices. Thank to this you will get 
more IOPS operations. Since RBD devices are stripped over objects. My 
benchmarks (and I'm not the only one) showed me that multiple RBD scale better. 

I wish you the best for your project. ;-)

Cheers! 

On 23 nov. 2012, at 18:31, "Holcombe, Christopher"  wrote:

> Hi Everyone,
> 
> First email here to the developer Ceph mailing list.  Some of you may know me 
> from the irc channel under the handle 'noob2' .  I hang out there every once 
> in a while to ask questions and share knowledge. Last week I discussed a 
> project I am working on in the irc channel.  Scuttlemonkey suggested I send 
> an email off to this list with the possibility of a guest entry on the Ceph 
> blog!  Let me describe what I am trying to accomplish:
> 
> Background : VMware Storage using Ceph.  After discovering Ceph I thought of 
> several uses for it.  Storage is really expensive for enterprise customers 
> and it doesn't need to be.  Going back to first principles results in the 
> conclusion that storage hardware is very cheap now.  About 5% to 10% what 
> enterprise customers are paying.  With that in mind I realized there is great 
> room for improvement.  Most of the storage we use is carried over a brocade 
> fibre network and I think Ceph is perfect for this task.  What is needed is a 
> proxy to merge the rados back end to the fibre network.  I used LIO on a 
> previous project and had a theory that I could use it to meet our storage 
> needs with Ceph.  At some point in the future we will direct mount rbd over 
> the network but we are not ready for that yet.
> 
> Design: Ceph already did most of the heavy lifting for me.  Triple 
> replication, self-healing, interaction through the kernel as a block device 
> and ability to scale easily with commodity servers.  My production Ceph 
> cluster which I'm still in the process of getting quotes for will be HP 
> DL180G6 servers.   Each of these will house 12 3TB data drives connected to a 
> HP410 1GB flash backed write controller.  In building some previous clusters 
> I learned that spending a little extra on the raid controller is usually 
> worth it.  Our network contains 2 48 port gigabit switches in each rack for 
> redundancy.  My plan is to use a 4 port gigabit network card and split the 
> replication traffic off from the client traffic.  I plan on setting up 2 
> 802.3ad aggregated links.  That should give the server about 2x 1.9Gb/s of 
> bandwidth.  We are currently short on 10Gb network ports but from what I'm 
> seeing in testing the HP raid cards can't handle enough data to make it worth 
> it.  If that changes after tuning I can always upgrade.  We are an HP shop so 
> my hands are a little tied.  Next is the proxy machines.  I'm going to reuse 
> 2 older HP dl380 G5 servers that we took out of service.  One will be part of 
> the A fabric for the fibre and the other will be on the B fabric.  This is 
> needed for redundancy so the fibre initiator can fail back and forth should 
> it need to.  I plan on creating rbd blocks of 1TB each on the Ceph cluster, 
> mounting it on both of the proxy machines and exporting using LIO.  LIO has 
> both block mode which can export any block device the kernel knows about or 
> file mod

New Project with Ceph

2012-11-23 Thread Holcombe, Christopher
Hi Everyone,

First email here to the developer Ceph mailing list.  Some of you may know me 
from the irc channel under the handle 'noob2' .  I hang out there every once in 
a while to ask questions and share knowledge. Last week I discussed a project I 
am working on in the irc channel.  Scuttlemonkey suggested I send an email off 
to this list with the possibility of a guest entry on the Ceph blog!  Let me 
describe what I am trying to accomplish:

Background : VMware Storage using Ceph.  After discovering Ceph I thought of 
several uses for it.  Storage is really expensive for enterprise customers and 
it doesn't need to be.  Going back to first principles results in the 
conclusion that storage hardware is very cheap now.  About 5% to 10% what 
enterprise customers are paying.  With that in mind I realized there is great 
room for improvement.  Most of the storage we use is carried over a brocade 
fibre network and I think Ceph is perfect for this task.  What is needed is a 
proxy to merge the rados back end to the fibre network.  I used LIO on a 
previous project and had a theory that I could use it to meet our storage needs 
with Ceph.  At some point in the future we will direct mount rbd over the 
network but we are not ready for that yet.

Design: Ceph already did most of the heavy lifting for me.  Triple replication, 
self-healing, interaction through the kernel as a block device and ability to 
scale easily with commodity servers.  My production Ceph cluster which I'm 
still in the process of getting quotes for will be HP DL180G6 servers.   Each 
of these will house 12 3TB data drives connected to a HP410 1GB flash backed 
write controller.  In building some previous clusters I learned that spending a 
little extra on the raid controller is usually worth it.  Our network contains 
2 48 port gigabit switches in each rack for redundancy.  My plan is to use a 4 
port gigabit network card and split the replication traffic off from the client 
traffic.  I plan on setting up 2 802.3ad aggregated links.  That should give 
the server about 2x 1.9Gb/s of bandwidth.  We are currently short on 10Gb 
network ports but from what I'm seeing in testing the HP raid cards can't 
handle enough data to make it worth it.  If that changes after tuning I can 
always upgrade.  We are an HP shop so my hands are a little tied.  Next is the 
proxy machines.  I'm going to reuse 2 older HP dl380 G5 servers that we took 
out of service.  One will be part of the A fabric for the fibre and the other 
will be on the B fabric.  This is needed for redundancy so the fibre initiator 
can fail back and forth should it need to.  I plan on creating rbd blocks of 
1TB each on the Ceph cluster, mounting it on both of the proxy machines and 
exporting using LIO.  LIO has both block mode which can export any block device 
the kernel knows about or file mode which can export a file as a block device.  
My testing has shown that VMware can mount this storage, vmotion vm's onto it 
and use it like any other SAN storage.  The only challenge I have at this point 
is getting the rbd devices to survive a reboot on the proxy machines.  I also 
will have to train the other admins on how to use it.  It is certainly more 
complicated than SAN storage we are used to but that shouldn't stop me.  I can 
build a web interface on top of this using django.  If I can achieve these 
without too much difficulty than Ceph is truly an enterprise storage 
replacement.
That's my project at a high level.  Ceph has many uses but I'm finding this use 
the most interesting at the moment.  When it is all finished it should save us 
over 90% on storage costs going forward.  If anyone knows how I could go about 
getting Ubuntu to save rbd mappings after a reboot that would be really 
helpful.  Thank you guys for your hard work!


Chris Holcombe
Unix Administrator
Corporation Service Company
cholc...@cscinfo.com
302-636-8667




NOTICE: This e-mail and any attachments is intended only for use by the 
addressee(s) named herein and may contain legally privileged, proprietary or 
confidential information. If you are not the intended recipient of this e-mail, 
you are hereby notified that any dissemination, distribution or copying of this 
email, and any attachments thereto, is strictly prohibited. If you receive this 
email in error please immediately notify me via reply email or at (800) 
927-9800 and permanently delete the original copy and any copy of any e-mail, 
and any printout.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html