Re: [ceph-users] Openstack Instances and RBDs

2013-11-02 Thread Haomai Wang
Now OpenStack H needs a new RBD installation guide. But there exists some
confusing problem may disturb newers:

1. OpenStack H can make use of RBD as the backend of
Nova(/var/lib/nova/instances)
2. A critical patch is missing(https://review.openstack.org/#/c/46879/)
which is used to integrate Glance and Nova.
3. If you did all above(make use of rbd as nova backend and backport
patch), live-migration still needs a compatible patch
to work. And after JoshDurgin restore point 2 commit and merged, I will
commit this compatible patch.

So we still have more things to do with OpenStack & Ceph. If you are a new
user, consider just use Ceph as backend of
Glance and Cinder.



On Sun, Nov 3, 2013 at 3:01 AM, Dinu Vlad  wrote:

> I don't know of any guide besides the official install docs from
> grizzly/havana, but I'm running openstack grizzly on top of rbd storage
> using glance & cinder and it makes (almost) no use of
> /var/lib/nova/instances. Live migrations also work. The only files there
> should be "config.xml" and "console" - otherwise, live-migrations won't
> work OR the path should be a mounted shared storage (NFS, GlusterFS etc).
>
> Nova-compute stores "disk*" files under that path in the following cases:
> - when one starts an instance only by using "--image " argument
> to nova-boot, without a pre-created cinder volume and without the
> "--block-device-mapping" argument
> - when one uses a "config disk" for bootstrapping instances
> - when one configures a "swap" disk in the flavor used to start the
> instance
>
>
> On Nov 2, 2013, at 2:32 AM, Gaylord Holder  wrote:
>
> >
> http://www.sebastien-han.fr/blog/2013/06/03/ceph-integration-in-openstack-grizzly-update-and-roadmap-for-havana/
> >
> > suggests it is possible to run openstack instances (not only images) off
> of RBDs in grizzly and havana (which I'm running), and to use RBDs in lieu
> of a shared file system.
> >
> > I've followed
> >
> > http://ceph.com/docs/next/rbd/libvirt/
> >
> > but I can only get boot-from-volume to work.  Instances still are being
> housed in /var/lib/nova/instances, making live-migration a non-starter.
> >
> > Is there a better guide for running openstack instances out of RBDs, or
> is it just not ready yet?
> >
> > Thanks,
> >
> > -Gaylord
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 

Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph nodes info

2013-11-02 Thread Joao Eduardo Luis

On 10/31/2013 01:09 AM, Raghavendra Lad wrote:

Hi,

How to install a Ceph node?



http://ceph.com/docs/master/start/

  -Joao



Please let me know the install steps, preparing node in Ubuntu 12.04 LTS.

Do we need a separate Server / Admin node and what needs to be installed
for Ceph. I would integrate with Openstack Grizzly.

Regards,
R



--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph cluster performance

2013-11-02 Thread Dinu Vlad
Any other options or ideas? 

Thanks,
Dinu 


On Oct 31, 2013, at 6:35 PM, Dinu Vlad  wrote:

> 
> I tested the osd performance from a single node. For this purpose I deployed 
> a new cluster (using ceph-deploy, as before) and on fresh/repartitioned 
> drives. I created a single pool, 1800 pgs. I ran the rados bench both on the 
> osd server and on a remote one. Cluster configuration stayed "default", with 
> the same additions about xfs mount & mkfs.xfs as before. 
> 
> With a single host, the pgs were "stuck unclean" (active only, not 
> active+clean):
> 
> # ceph -s
>  cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062
>   health HEALTH_WARN 1800 pgs stuck unclean
>   monmap e1: 3 mons at 
> {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0},
>  election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3
>   osdmap e101: 18 osds: 18 up, 18 in
>pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / 
> 16759 GB avail
>   mdsmap e1: 0/0/1 up
> 
> 
> Test results: 
> Local test, 1 process, 16 threads: 241.7 MB/s
> Local test, 8 processes, 128 threads: 374.8 MB/s
> Remote test, 1 process, 16 threads: 231.8 MB/s
> Remote test, 8 processes, 128 threads: 366.1 MB/s
> 
> Maybe it's just me, but it seems on the low side too. 
> 
> Thanks,
> Dinu
> 
> 
> On Oct 30, 2013, at 8:59 PM, Mark Nelson  wrote:
> 
>> On 10/30/2013 01:51 PM, Dinu Vlad wrote:
>>> Mark,
>>> 
>>> The SSDs are 
>>> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021
>>>  and the HDDs are 
>>> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS.
>>> 
>>> The chasis is a "SiliconMechanics C602" - but I don't have the exact model. 
>>> It's based on Supermicro, has 24 slots front and 2 in the back and a SAS 
>>> expander.
>>> 
>>> I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according 
>>> to what the driver reports in dmesg). here are the results (filtered):
>>> 
>>> Sequential:
>>> Run status group 0 (all jobs):
>>>  WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, 
>>> mint=60444msec, maxt=61463msec
>>> 
>>> Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave 
>>> 153:189 MB/s
>> 
>> Ok, that looks like what I'd expect to see given the controller being used.  
>> SSDs are probably limited by total aggregate throughput.
>> 
>>> 
>>> Random:
>>> Run status group 0 (all jobs):
>>>  WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, 
>>> mint=60404msec, maxt=61875msec
>>> 
>>> Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one 
>>> out of 6 doing 101)
>>> 
>>> This is on just one of the osd servers.
>> 
>> Where the ceph tests to one OSD server or across all servers?  It might be 
>> worth trying tests against a single server with no replication using 
>> multiple rados bench instances and just seeing what happens.
>> 
>>> 
>>> Thanks,
>>> Dinu
>>> 
>>> 
>>> On Oct 30, 2013, at 6:38 PM, Mark Nelson  wrote:
>>> 
 On 10/30/2013 09:05 AM, Dinu Vlad wrote:
> Hello,
> 
> I've been doing some tests on a newly installed ceph cluster:
> 
> # ceph osd create bench1 2048 2048
> # ceph osd create bench2 2048 2048
> # rbd -p bench1 create test
> # rbd -p bench1 bench-write test --io-pattern rand
> elapsed:   483  ops:   396579  ops/sec:   820.23  bytes/sec: 2220781.36
> 
> # rados -p bench2 bench 300 write --show-time
> # (run 1)
> Total writes made:  20665
> Write size: 4194304
> Bandwidth (MB/sec): 274.923
> 
> Stddev Bandwidth:   96.3316
> Max bandwidth (MB/sec): 748
> Min bandwidth (MB/sec): 0
> Average Latency:0.23273
> Stddev Latency: 0.262043
> Max latency:1.69475
> Min latency:0.057293
> 
> These results seem to be quite poor for the configuration:
> 
> MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS
> OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board 
> controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for 
> journal, attached to a LSI 9207-8i controller.
> All servers have dual 10GE network cards, connected to a pair of 
> dedicated switches. Each SSD has 3 10 GB partitions for journals.
 
 Agreed, you should see much higher throughput with that kind of storage 
 setup.  What brand/model SSDs are these?  Also, what brand and model of 
 chassis?  With 24 drives and 8 SSDs I could push 2GB/s (no replication 
 though) with a couple of concurrent rados bench processes going on our 
 SC847A chassis, so ~550MB/s aggregate throughput for 18 drives and 6 SSDs 
 is definitely on the low side.
 
 I'm actually not too familiar with what the RBD benchmarking commands are 
 doing behind the scenes.  Typically I've tested fio 

Re: [ceph-users] Openstack Instances and RBDs

2013-11-02 Thread Dinu Vlad
I don't know of any guide besides the official install docs from 
grizzly/havana, but I'm running openstack grizzly on top of rbd storage using 
glance & cinder and it makes (almost) no use of /var/lib/nova/instances. Live 
migrations also work. The only files there should be "config.xml" and "console" 
- otherwise, live-migrations won't work OR the path should be a mounted shared 
storage (NFS, GlusterFS etc).  

Nova-compute stores "disk*" files under that path in the following cases:
- when one starts an instance only by using "--image " argument to 
nova-boot, without a pre-created cinder volume and without the 
"--block-device-mapping" argument
- when one uses a "config disk" for bootstrapping instances
- when one configures a "swap" disk in the flavor used to start the instance 


On Nov 2, 2013, at 2:32 AM, Gaylord Holder  wrote:

> http://www.sebastien-han.fr/blog/2013/06/03/ceph-integration-in-openstack-grizzly-update-and-roadmap-for-havana/
> 
> suggests it is possible to run openstack instances (not only images) off of 
> RBDs in grizzly and havana (which I'm running), and to use RBDs in lieu of a 
> shared file system.
> 
> I've followed
> 
> http://ceph.com/docs/next/rbd/libvirt/
> 
> but I can only get boot-from-volume to work.  Instances still are being 
> housed in /var/lib/nova/instances, making live-migration a non-starter.
> 
> Is there a better guide for running openstack instances out of RBDs, or is it 
> just not ready yet?
> 
> Thanks,
> 
> -Gaylord
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Very frustrated with Ceph!

2013-11-02 Thread Alfredo Deza
On Fri, Nov 1, 2013 at 11:12 PM, Sage Weil  wrote:
> On Sat, 2 Nov 2013, Trivedi, Narendra wrote:
>>
>> Hi Sage,
>>
>> I believe I issued a "ceph-deploy install..." from the admin node as per the
>> documentation and that was almost ok as per the output of the command below
>> except sometimes there?s an error followed by an ?OK? message (see the
>> highlighted item in the red below). I eventually ran into some permission
>> issues but seems things went okay:

Maybe what can be confusing here is that ceph-deploy interprets stderr
as ERROR logging level. Unfortunately, some tools will output normal
informative data to stderr when they are clearly not errors.

stdout, on the other hand, is interpreted by ceph-deploy as DEBUG
level, so you will see logging at that level too.

There is no way for ceph-deploy to tell if you are actually seeing
errors because the tool is in fact sending error messages or because
it decided to use stderr to send information that should go to stdout.



>
> Hmm, the below output makes it look like it was successfully installed on
> node1 node2 and node3.  Can you confirm that /etc/ceph exists on all three
> of those hosts?
>
> Oh, looking back at your original message, it looks like you are trying to
> create OSDs on /tmp/osd*.  I would create directories like /ceph/osdo,
> /ceph/osd1, or similar.  I believe you need to create the directories
> beforehand, too.  (In a normal deployment, you are either feeding ceph raw
> disks (/dev/XXX) or an existing mount point on a dedicated disk you
> already configured and mounted.)
>
> sage
>
>
>  >
>>
>>
>> [ceph@ceph-admin-node-centos-6-4 my-cluster]$ ceph-deploy install
>> ceph-node1-mon-centos-6-4 ceph-node2-osd0-centos-6-4
>> ceph-node3-osd1-centos-6-4
>>
>> [ceph_deploy.cli][INFO  ] Invoked (1.3): /usr/bin/ceph-deploy install
>> ceph-node1-mon-centos-6-4 ceph-node2-osd0-centos-6-4
>> ceph-node3-osd1-centos-6-4
>>
>> [ceph_deploy.install][DEBUG ] Installing stable version dumpling on cluster
>> ceph hosts ceph-node1-mon-centos-6-4 ceph-node2-osd0-centos-6-4
>> ceph-node3-osd1-centos-6-4
>>
>> [ceph_deploy.install][DEBUG ] Detecting platform for host
>> ceph-node1-mon-centos-6-4 ...
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] connected to host:
>> ceph-node1-mon-centos-6-4
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] detect platform information from remote
>> host
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] detect machine type
>>
>> [ceph_deploy.install][INFO  ] Distro info: CentOS 6.4 Final
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] installing ceph on
>> ceph-node1-mon-centos-6-4
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] adding EPEL repository
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] Running command: sudo wget
>> http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] --2013-11-01 19:51:20--
>> http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] Connecting to 10.12.132.208:8080...
>> connected.
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] Proxy request sent, awaiting response...
>> 200 OK
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] Length: 14540 (14K) [application/x-rpm]
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] Saving to:
>> `epel-release-6-8.noarch.rpm.2'
>>
>> [ceph-node1-mon-centos-6-4][ERROR ]
>>
>> [ceph-node1-mon-centos-6-4][ERROR ]  0K ..
>>    100% 4.79M=0.003s
>>
>> [ceph-node1-mon-centos-6-4][ERROR ]
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] Last-modified header invalid --
>> time-stamp ignored.
>>
>> [ceph-node1-mon-centos-6-4][ERROR ] 2013-11-01 19:52:20 (4.79 MB/s) -
>> `epel-release-6-8.noarch.rpm.2' saved [14540/14540]
>>
>> [ceph-node1-mon-centos-6-4][ERROR ]
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] Running command: sudo rpm -Uvh
>> --replacepkgs epel-release-6*.rpm
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] Preparing...
>> ##
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] epel-release
>> ##
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] Running command: sudo rpm --import
>> https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] Running command: sudo rpm -Uvh
>> --replacepkgs
>> http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] Retrieving
>> http://ceph.com/rpm-dumpling/el6/noarch/ceph-release-1-0.el6.noarch.rpm
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] Preparing...
>> ##
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] ceph-release
>> ##
>>
>> [ceph-node1-mon-centos-6-4][INFO  ] Running command: sudo yum -y -q install
>> ceph
>>
>> [ceph-node1-mon-centos-6-4][DEBUG ] Package ceph-0.67.4-0.el6.x86_64 already
>> installed and latest version
>>
>> [ceph-node1-mon-centos-6-4][

Re: [ceph-users] Expanding ceph cluster by adding more OSDs

2013-11-02 Thread Guang
Hi Kyle,
Thanks for you response. Though I haven't tested it, my gut feeling is the 
same, changing the PG number may result in re-shuffling of the data.

In terms of the strategy you mentioned to expand a cluster, I have a few 
questions:
  1. By adding a LITTLE more weight each time, my understanding is to reduce 
the load for the OSD being added, is it? If so, can we use the throttle setting 
to achieve the same goal?
  2. If I would like to expand the cluster every quarter with 30% capacity, by 
using such way, it might take a long time to add new capacity, is my 
understanding correct?
  3. Is there any automatic tool to do this, or I will need to closely monitor, 
and dump the crush rule / edit it and push back?

I am testing a scenario to add one OSD each time (I have 330 OSD in total), the 
weight is using default one. There are a couple of observations: 1) the 
recovery start quick (several hundred MB/s) and then get slower to around 
10MB/s. 2) It impact the online traffic quite a lot (from my observation, 
mainly of the recovering PGs).

I tried to search some best practice to expand a cluster with bad luck, anybody 
would like to share your experience? Thanks very much.

Thanks,
Guang

Date: Thu, 10 Oct 2013 05:15:27 -0700
From: Kyle Bader 
To: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] Expanding ceph cluster by adding more OSDs
Message-ID:

Content-Type: text/plain; charset="utf-8"

I've contracted and expanded clusters by up to a rack of 216 OSDs - 18
nodes, 12 drives each.  New disks are configured with a CRUSH weight of 0
and I slowly add weight (0.1 to 0.01 increments), wait for the cluster to
become active+clean and then add more weight. I was expanding after
contraction so my PG count didn't need to be corrected, I tend to be
liberal and opt for more PGs.  If I hadn't contracted the cluster prior to
expanding it I would probably add PGs after all the new OSDs have finished
being weighted into the cluster.


On Wed, Oct 9, 2013 at 8:55 PM, Michael Lowe wrote:

> I had those same questions, I think the answer I got was that it was
> better to have too few pg's than to have overloaded osd's.  So add osd's
> then add pg's.  I don't know the best increments to grow in, probably
> depends largely on the hardware in your osd's.
> 
> Sent from my iPad
> 
>> On Oct 9, 2013, at 11:34 PM, Guang  wrote:
>> 
>> Thanks Mike. I get your point.
>> 
>> There are still a few things confusing me:
>> 1) We expand Ceph cluster by adding more OSDs, which will trigger
> re-balance PGs across the old & new OSDs, and likely it will break the
> optimized PG numbers for the cluster.
>>  2) We can add more PGs which will trigger re-balance objects across
> old & new PGs.
>> 
>> So:
>> 1) What is the recommended way to expand the cluster by adding OSDs
> (and potentially adding PGs), should we do them at the same time?
>> 2) What is the recommended way to scale a cluster from like 1PB to 2PB,
> should we scale it to like 1.1PB to 1.2PB or move to 2PB directly?
>> 
>> Thanks,
>> Guang
>> 
>>> On Oct 10, 2013, at 11:10 AM, Michael Lowe wrote:
>>> 
>>> There used to be, can't find it right now.  Something like 'ceph osd
> set pg_num ' then 'ceph osd set pgp_num ' to actually move your
> data into the new pg's.  I successfully did it several months ago, when
> bobtail was current.
>>> 
>>> Sent from my iPad
>>> 
 On Oct 9, 2013, at 10:30 PM, Guang  wrote:
 
 Thanks Mike.
 
 Is there any documentation for that?
 
 Thanks,
 Guang
 
> On Oct 9, 2013, at 9:58 PM, Mike Lowe wrote:
> 
> You can add PGs,  the process is called splitting.  I don't think PG
> merging, the reduction in the number of PGs, is ready yet.
> 
>> On Oct 8, 2013, at 11:58 PM, Guang  wrote:
>> 
>> Hi ceph-users,
>> Ceph recommends the PGs number of a pool is (100 * OSDs) / Replicas,
> per my understanding, the number of PGs for a pool should be fixed even we
> scale out / in the cluster by adding / removing OSDs, does that mean if we
> double the OSD numbers, the PG number for a pool is not optimal any more
> and there is no chance to correct it?
>> 
>> 
>> Thanks,
>> Guang
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast

2013-11-02 Thread james



It would be an interesting exercise though. Depending on network
layout (no cluster network) the client could multicast to all 
replicas

and potentially reduce latency by half. I suspect that the client
participating in the replication goes against the internal workings 
of

ceph though and would be a major rework...


Having the client participate directly goes much further than I was 
thinking (I was targeting more the server-side LAN).


I'd like to know how small block writes are replicated - maybe I should 
analyse the traffic and find out.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multicast

2013-11-02 Thread James Harper
> 
> Hi All
> 
> I was wondering whether multicast could be used for the replication
> traffic?  It just seemed that the outbound network bandwidth from the
> source could be halved.
> 

Right now I think ceph traffic is all TCP, which doesn't do multicast.

You'd either need to make ceph use UDP and handle packet loss, congestion, and 
all the other things that TCP handles, or use a reliable multicast protocol 
like PGM (assuming that met all the required criteria).

It would be an interesting exercise though. Depending on network layout (no 
cluster network) the client could multicast to all replicas and potentially 
reduce latency by half. I suspect that the client participating in the 
replication goes against the internal workings of ceph though and would be a 
major rework... 

James
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multicast

2013-11-02 Thread james

Hi All

I was wondering whether multicast could be used for the replication 
traffic?  It just seemed that the outbound network bandwidth from the 
source could be halved.


Cheers
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com