Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
 'ceph osd crush tunables optimal'
 
 or adjust an offline map file via the crushtool command line (more
 annoying) and retest; I suspect that is the problem.
 
 http://ceph.com/docs/master/rados/operations/crush-map/#tunables

That solves the bug with weight 0, thanks.

But is still get the following distribution:

  device 0: 423
  device 1: 453
  device 2: 430
  device 3: 455
  device 4: 657
  device 5: 654

Host with only one osd gets too much data.

 On Fri, 3 Jan 2014, Dietmar Maurer wrote:
 
   In both cases, you only get 2 replicas on the remaining 2 hosts.
 
  OK, I was able to reproduce this with crushtool.
 
   The difference is if you have 4 hosts with 2 osds.  In the choose
   case, you have some fraction of the data that chose the down host in
   the first step (most of the attempts, actually!) and then couldn't
   find a usable osd, leaving you with only 2
 
  This is also reproducible.
 
   replicas.  With chooseleaf that doesn't happen.
  
   The other difference is if you have one of the two OSDs on the host marked
 out.
   In the choose case, the remaining OSD will get allocated 2x the
   data; in the chooseleaf case, usage will remain proportional with
   the rest of the cluster and the data from the out OSD will be
   distributed across other OSDs (at least when there are  3 hosts!).
 
  I see, but data distribution seems not optimal in that case.
 
  For example using this crush map:
 
  # types
  type 0 osd
  type 1 host
  type 2 rack
  type 3 row
  type 4 room
  type 5 datacenter
  type 6 root
 
  # buckets
  host prox-ceph-1 {
  id -2   # do not change unnecessarily
  # weight 7.260
  alg straw
  hash 0  # rjenkins1
  item osd.0 weight 3.630
  item osd.1 weight 3.630
  }
  host prox-ceph-2 {
  id -3   # do not change unnecessarily
  # weight 7.260
  alg straw
  hash 0  # rjenkins1
  item osd.2 weight 3.630
  item osd.3 weight 3.630
  }
  host prox-ceph-3 {
  id -4   # do not change unnecessarily
  # weight 3.630
  alg straw
  hash 0  # rjenkins1
  item osd.4 weight 3.630
  }
 
  host prox-ceph-4 {
  id -5   # do not change unnecessarily
  # weight 3.630
  alg straw
  hash 0  # rjenkins1
  item osd.5 weight 3.630
  }
 
  root default {
  id -1   # do not change unnecessarily
  # weight 21.780
  alg straw
  hash 0  # rjenkins1
  item prox-ceph-1 weight 7.260   # 2 OSDs
  item prox-ceph-2 weight 7.260   # 2 OSDs
  item prox-ceph-3 weight 3.630   # 1 OSD
  item prox-ceph-4 weight 3.630   # 1 OSD
  }
 
  # rules
  rule data {
  ruleset 0
  type replicated
  min_size 1
  max_size 10
  step take default
  step chooseleaf firstn 0 type host
  step emit
  }
  # end crush map
 
  crushtool shows the following utilization:
 
  # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization
device 0: 423
device 1: 452
device 2: 429
device 3: 452
device 4: 661
device 5: 655
 
  Any explanation for that?  Maybe related to the small number of devices?
 
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Sage Weil
On Mon, 6 Jan 2014, Dietmar Maurer wrote:
  'ceph osd crush tunables optimal'
  
  or adjust an offline map file via the crushtool command line (more
  annoying) and retest; I suspect that is the problem.
  
  http://ceph.com/docs/master/rados/operations/crush-map/#tunables
 
 That solves the bug with weight 0, thanks.
 
 But is still get the following distribution:
 
   device 0: 423
   device 1: 453
   device 2: 430
   device 3: 455
   device 4: 657
   device 5: 654
 
 Host with only one osd gets too much data.

I think this is just fundamentally a problem with distributing 3 replicas 
over only 4 hosts.  Every piece of data in the system needs to include 
either host 3 or 4 (and thus device 4 or 5) in order to have 3 replicas 
(on separate hosts).  Add more hosts or disks and the distribution will 
even out.

sage


 
  On Fri, 3 Jan 2014, Dietmar Maurer wrote:
  
In both cases, you only get 2 replicas on the remaining 2 hosts.
  
   OK, I was able to reproduce this with crushtool.
  
The difference is if you have 4 hosts with 2 osds.  In the choose
case, you have some fraction of the data that chose the down host in
the first step (most of the attempts, actually!) and then couldn't
find a usable osd, leaving you with only 2
  
   This is also reproducible.
  
replicas.  With chooseleaf that doesn't happen.
   
The other difference is if you have one of the two OSDs on the host 
marked
  out.
In the choose case, the remaining OSD will get allocated 2x the
data; in the chooseleaf case, usage will remain proportional with
the rest of the cluster and the data from the out OSD will be
distributed across other OSDs (at least when there are  3 hosts!).
  
   I see, but data distribution seems not optimal in that case.
  
   For example using this crush map:
  
   # types
   type 0 osd
   type 1 host
   type 2 rack
   type 3 row
   type 4 room
   type 5 datacenter
   type 6 root
  
   # buckets
   host prox-ceph-1 {
 id -2   # do not change unnecessarily
 # weight 7.260
 alg straw
 hash 0  # rjenkins1
 item osd.0 weight 3.630
 item osd.1 weight 3.630
   }
   host prox-ceph-2 {
 id -3   # do not change unnecessarily
 # weight 7.260
 alg straw
 hash 0  # rjenkins1
 item osd.2 weight 3.630
 item osd.3 weight 3.630
   }
   host prox-ceph-3 {
 id -4   # do not change unnecessarily
 # weight 3.630
 alg straw
 hash 0  # rjenkins1
 item osd.4 weight 3.630
   }
  
   host prox-ceph-4 {
 id -5   # do not change unnecessarily
 # weight 3.630
 alg straw
 hash 0  # rjenkins1
 item osd.5 weight 3.630
   }
  
   root default {
 id -1   # do not change unnecessarily
 # weight 21.780
 alg straw
 hash 0  # rjenkins1
 item prox-ceph-1 weight 7.260   # 2 OSDs
 item prox-ceph-2 weight 7.260   # 2 OSDs
 item prox-ceph-3 weight 3.630   # 1 OSD
 item prox-ceph-4 weight 3.630   # 1 OSD
   }
  
   # rules
   rule data {
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take default
 step chooseleaf firstn 0 type host
 step emit
   }
   # end crush map
  
   crushtool shows the following utilization:
  
   # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization
 device 0:   423
 device 1:   452
 device 2:   429
 device 3:   452
 device 4:   661
 device 5:   655
  
   Any explanation for that?  Maybe related to the small number of devices?
  
  
 
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
  Host with only one osd gets too much data.
 
 I think this is just fundamentally a problem with distributing 3 replicas 
 over only 4
 hosts.  Every piece of data in the system needs to include either host 3 or 4 
 (and
 thus device 4 or 5) in order to have 3 replicas (on separate hosts).  Add more
 hosts or disks and the distribution will even out.

I also thought that, but the same thing also happen with more hosts.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph@HOME: the domestication of a wild cephalopod

2014-01-06 Thread Loic Dachary
Hi,

The Ceph User Committee is proud to present its first use case :-) 

http://ceph.com/use-cases/cephhome-the-domestication-of-a-wild-cephalopod/

Many thanks to Alexandre Oliva for this inspiring story, Nathan Regola and 
Aaron Ten Clay for editing and proofreading and Patrick McGarry for wordpress 
wizardry. 

If you know of a Ceph use case, please tell us about it. We will discuss over 
the phone or via email and write it down for publication at 
http://ceph.com/use-cases

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [Rados] How to get the scrub progressing ?

2014-01-06 Thread Gregory Farnum
On Mon, Dec 30, 2013 at 11:14 PM, Kuo Hugo tonyt...@gmail.com wrote:

 Hi all,

 I have several question about osd scrub.

 Does the scrub job run in the background automatically? Is it working 
 periodically ?

Yes, the OSDs will periodically scrub the PGs they host based on load
and the min/max scrub intervals.

 Need I to trigger scrub or deep-scrub progress ?

No.

 How to know the current scrub progressing ?

I don't believe this is reported anywhere in a user-friendly way, but
I think you can get info on it out of the osd admin socket
perfcounters.

 How to estimate the done time of scrub ?

Likewise. ^

Is there some reason you want to follow this instead of just letting
it do its thing?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd perf question

2014-01-06 Thread Gregory Farnum
On Fri, Jan 3, 2014 at 2:02 AM, Andrei Mikhailovsky and...@arhont.com wrote:
 Hi guys,

 Could someone explain what's the new perf stats show and if the numbers are
 reasonable on my cluster?

 I am concerned about the high fs_commit_latency, which seems to be above
 150ms for all osds. I've tried to find the documentation on what this
 command actually shows, but couldn't find anything.

 I am using 3TB sas drives with 4 osd journals on each ssd. Are the numbers
 below reasonable for a fairly idle ceph cluster (osd utilisation below 10%
 on average)?

This is a report about flushing data out to the backing-store disk,
and fs_commit_latency is generally going to include a syncfs syscall,
so 150-600 ms is not unreasonable. The fs_apply_latency (for applying
updates to the in-memory filesystem) and the numbers on the journal
are the ones to look at.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


 # ceph osd perf
 osdid fs_commit_latency(ms) fs_apply_latency(ms)
 0   1924
 1   2654
 2   1161
 3   1252
 4   1661
 5   2093
 6   1846
 7   1422
 8   2091
 9   1661
10   2161
11   3083
12   1502
13   1251
14   1752
15   1422
16   1504


 when the cluster get's a bit busy (osd utilisation below 50% on average) I
 see:

 # ceph osd perf
 osdid fs_commit_latency(ms) fs_apply_latency(ms)
 0   551   11
 1   284   25
 2   517   41
 3   492   14
 4   625   13
 5   309   26
 6   6509
 7   517   21
 8   634   25
 9   784   32
10   3927
11   5018
12   602   12
13   467   14
14   476   36
15   451   11
16   383   21


 Thanks

 Andrei

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Command Prepending None to output on one node (only)

2014-01-06 Thread Gregory Farnum
I have a vague memory of this being something that happened in an
outdated version of the ceph tool. Are you running an older binary on
the node in question?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, Jan 4, 2014 at 4:34 PM, Zeb Palmer z...@zebpalmer.com wrote:
 I have a small ceph 0.72.2 cluster built with ceph-deploy and running on
 ubuntu 12.04, this cluster is used as primary storage for my home openstack
 sandbox.

 I'm running into an issue I haven't seen before and have had a heck of a
 time searching for similar issues as None doesn't exactly make a good
 keyword.


 On one node, when I run any ceph command that interacts with the cluster, I
 get the appropriate output, but None is prepended to it.

 root@os2:/etc/ceph# ceph health
 None
 HEALTH_OK


 root@os2:/etc/ceph# ceph
 None
 ceph


 Again, this only happens on one of the four ceph nodes. I've verified conf
 files, keys, perms, versions, etc. match on all nodes, no connectivity
 issues, etc. In fact the ceph cluster is still healthy and working great
 with only one exception. Cinder-Volume also runs on this node and since
 None is also getting prepended to json formatted output, Cinder-Volume
 errors out in _get_mon_addrs() when json decoder chokes on the response from
 ceph.  (I'll probably throw a quick pre-decode band-aid on that method to
 get Cinder back online until I can correct this)

 here's my config sans radosgw... although it hasn't changed recently.

 [global]
 fsid = 02a4abf4-3659-4525-bfe8-f1f5ea024030
 mon_initial_members = fs1,os1,cortex,os2
 mon_host = 10.10.3.8,10.10.3.10,10.10.3.7,10.10.3.20
 auth_supported = cephx
 osd_journal_size = 1024
 filestore_xattr_use_omap = true
 public_network = 10.10.3.0/24
 cluster_network = 10.10.150.0/24


 I've tried everything I can think of, hoping someone here can point out what
 I'm missing.

 Thanks
 zeb


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] building librados static library librados.a

2014-01-06 Thread david hong
Hi ceph-users team,

I'm a junior systems developer.

I'm developing some applications using librados (librados only rather than
the whole Ceph package) from Ceph and it turns out the work of building the
librados-only package from the huge Ceph source code would be enormous.

 All I want is just a static library, librados.a. As far as I know, there
is no options in configure script and Makefile to build the static lib
only.
As I know, I must have object files .o to build the librados.a. There are
only four object files in the src/librados/ dir, which are librados.o
RadosClient.o IoCtxImpl.o snap_set_diff.o but these 4 file is not enough (
I built it and it failed to work)

It would be great if you guys could give me some directions here. I'm
looking forward for your response.
Thanks and have a nice day !
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Command Prepending None to output on one node (only)

2014-01-06 Thread Zeb Palmer
I've (re)confirmed that all nodes are the same build.

# ceph --version



ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

ubuntu package version 0.72.2-1precise

I was discussing this with my engineers this morning and a couple of them
vaguely recalled that we had run into this on an earlier version of ceph
when we were testing it, but no one could recall the circumstances or
resolution. In fact, they thought that I had fixed it. :)

Since this is my home sandbox cluster I can easily rebuild that node if
need be, but I wanted to see if anyone could point me toward a better
solution so I don't run into this again.

thanks.





On Mon, Jan 6, 2014 at 10:07 AM, Gregory Farnum g...@inktank.com wrote:

 I have a vague memory of this being something that happened in an
 outdated version of the ceph tool. Are you running an older binary on
 the node in question?
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Sat, Jan 4, 2014 at 4:34 PM, Zeb Palmer z...@zebpalmer.com wrote:
  I have a small ceph 0.72.2 cluster built with ceph-deploy and running on
  ubuntu 12.04, this cluster is used as primary storage for my home
 openstack
  sandbox.
 
  I'm running into an issue I haven't seen before and have had a heck of a
  time searching for similar issues as None doesn't exactly make a good
  keyword.
 
 
  On one node, when I run any ceph command that interacts with the
 cluster, I
  get the appropriate output, but None is prepended to it.
 
  root@os2:/etc/ceph# ceph health
  None
  HEALTH_OK
 
 
  root@os2:/etc/ceph# ceph
  None
  ceph
 
 
  Again, this only happens on one of the four ceph nodes. I've verified
 conf
  files, keys, perms, versions, etc. match on all nodes, no connectivity
  issues, etc. In fact the ceph cluster is still healthy and working great
  with only one exception. Cinder-Volume also runs on this node and since
  None is also getting prepended to json formatted output, Cinder-Volume
  errors out in _get_mon_addrs() when json decoder chokes on the response
 from
  ceph.  (I'll probably throw a quick pre-decode band-aid on that method to
  get Cinder back online until I can correct this)
 
  here's my config sans radosgw... although it hasn't changed recently.
 
  [global]
  fsid = 02a4abf4-3659-4525-bfe8-f1f5ea024030
  mon_initial_members = fs1,os1,cortex,os2
  mon_host = 10.10.3.8,10.10.3.10,10.10.3.7,10.10.3.20
  auth_supported = cephx
  osd_journal_size = 1024
  filestore_xattr_use_omap = true
  public_network = 10.10.3.0/24
  cluster_network = 10.10.150.0/24
 
 
  I've tried everything I can think of, hoping someone here can point out
 what
  I'm missing.
 
  Thanks
  zeb
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw swift java jars

2014-01-06 Thread raj kumar
Hi, could find all necessary jars required to run the java program.  Is
there any place to get all jars for both swift and s3? Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] building librados static library librados.a

2014-01-06 Thread Noah Watkins
The default configuration for a Ceph build should produce a static
rados library. If you actually want to build _only_ librados, that
might require a bit automake tweeks.

nwatkins@kyoto:~$ ls -l projects/ceph_install/lib/
total 691396
-rw-r--r-- 1 nwatkins nwatkins 219465940 Jan  6 09:56 librados.a
-rwxr-xr-x 1 nwatkins nwatkins  1067 Jan  6 09:56 librados.la
lrwxrwxrwx 1 nwatkins nwatkins17 Jan  6 09:56 librados.so -
librados.so.2.0.0
lrwxrwxrwx 1 nwatkins nwatkins17 Jan  6 09:56 librados.so.2 -
librados.so.2.0.0
-rwxr-xr-x 1 nwatkins nwatkins  89043452 Jan  6 09:56 librados.so.2.0.0

On Mon, Jan 6, 2014 at 3:45 AM, david hong davidhong1...@gmail.com wrote:
 Hi ceph-users team,

 I'm a junior systems developer.

 I'm developing some applications using librados (librados only rather than
 the whole Ceph package) from Ceph and it turns out the work of building the
 librados-only package from the huge Ceph source code would be enormous.

  All I want is just a static library, librados.a. As far as I know, there is
 no options in configure script and Makefile to build the static lib only.
 As I know, I must have object files .o to build the librados.a. There are
 only four object files in the src/librados/ dir, which are librados.o
 RadosClient.o IoCtxImpl.o snap_set_diff.o but these 4 file is not enough ( I
 built it and it failed to work)

 It would be great if you guys could give me some directions here. I'm
 looking forward for your response.
 Thanks and have a nice day !


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS files not appearing in DF (or rados ls)

2014-01-06 Thread Gregory Farnum
On Thu, Jan 2, 2014 at 2:18 PM, Alex Pearson a...@apics.co.uk wrote:
 Hi All,
 Victory!  Found the issue, it was a mistake on my part, however it does raise 
 another questions...

 The issue was:
 root@osh1:~# ceph --cluster apics auth list
 installed auth entries:
 SNIP
 client.cuckoo
 key: AQBjTblS4AFAARAAZyumzFyk2JS8d9AjutRoTQ==
 caps: [mon] allow r
 caps: [osd] allow rwx pool=staging, allow rwx pool=media2


 When I recreated the pool I changed from 'media2' to 'media3' - so there 
 wasn't any authorization to the pool.  I've corrected this (see below), then 
 REMOUNTED the filesystem on the client (it didn't work until I'd done this)

 root@osh1:~# ceph --cluster apics auth caps client.cuckoo osd 'allow rwx 
 pool=media3, allow rwx pool=staging' mon 'allow r'
 Ref: http://www.sebastien-han.fr/blog/2013/07/26/ceph-update-cephx-keys/


 The BIG QUESTION though... The data was being stored - I verified this by MD5 
 summing the data after it was written.  But it wasn't being accounted for 
 anywhere and the permissions system looks to have failed.  This looks like a 
 big security hole, surely a permissions denied error should have occurred 
 here?  Also the data was being stored, but didn't appear in any DF commands, 
 and couldn't be seen using 'ceph --cluster name ls -p pool name'?

You aren't doing quite what you think here. What's actually happened
is that the data was locally buffered for writing out, and when you
read it (for the md5 sum) it was looking at that in-memory state. The
CephFS client was then trying to flush that dirty file data out to the
OSDs, and getting EPERM back. This is a pretty tricky situation for us
to handle, and our (lack of a) solution right now is definitely not
great. Unfortunately it's also about all that the POSIX spec lets us
do — if you try and do a flush or a sync you would get back an error
code, but short of that we have no mechanism for communicating to the
user that they can't write to the place they're trying to write to.
We've toyed with some sort of pre-emptive check that the user can
write to the location their file is stored in (and returning an error
on open if they can't), but it's actually quite a hard problem and
hasn't gotten any serious attention yet.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How can I set the warning level?

2014-01-06 Thread Gregory Farnum
On Wed, Dec 25, 2013 at 6:13 PM, vernon1...@126.com vernon1...@126.com wrote:
 Hello,  my Mon's always HEALTH_WARN, and I run ceph health detail, it show
 me like this:

 HEALTH_WARN
 mon.2 addr 192.168.0.7:6789/0 has 30% avail disk space -- low disk space!

 I want to know how to set this warning level? I have to made it give out the
 alarm with the space remaining no more than 10%.

There's not any option to disable specific ceph health warnings. You
can change the threshold at which that warning occurs with the mon
data avail [warn|crit] config options, though (default to 30 and 5).
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cannot see recovery statistics + pgs stuck unclean

2014-01-06 Thread Gregory Farnum
[Hrm, this email was in my spam folder.]

At a quick glance, you're probably running into some issues because
you've got two racks of very different weights. Things will probably
get better if you enable the optimal crush tunables; check out the
docs on that and see if you can switch to them.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Fri, Dec 27, 2013 at 3:58 AM, Sreejith Vijayendran
sreejith.vijayend...@inmobi.com wrote:
 Hello,

 We have a 3 node cluster set up with OSDs created on all the 3 nodes. The
 replication was set to 2.
 1 We are in testing phase and tried to bring down all the OSDs in a
 particular node and was testing the migration of PGs to other OSDs.
 But the PGs are not getting replicated on other OSDs and the status of the
 replication also was not clear at all.
 Below is the ceph status at that point:

 ===
 sreejith@sb1001:/var/run/ceph$ sudo ceph status
 cluster 9b48b60c-bebe-4714-8a61-91ca5b388a17
  health HEALTH_WARN 885 pgs degraded; 885 pgs stuck unclean; recovery
 59/232 objects degraded (25.431%); 22/60 in osds are down
  monmap e2: 3 mons at
 {sb1001=10.2.4.90:6789/0,sb1002=10.2.4.202:6789/0,sb1004=10.2.4.203:6789/0},
 election epoch 22, quorum 0,1,2 sb1001,sb1002,sb1004
  osdmap e378: 68 osds: 38 up, 60 in
   pgmap v4490: 1564 pgs, 25 pools, 1320 MB data, 116 objects
 8415 MB used, 109 TB / 109 TB avail
 59/232 objects degraded (25.431%)
  679 active+clean
  862 active+degraded
   23 active+degraded+remapped
 ===

 We waited for around 4-5 hours and the status just increased marginally to
 (25.431%) from (27%) at the start.

 2 we then tweaked some OSD values to speed up the recovery
 namely(osd_recovery_threads, osd_recovery_max_active,
 osd_recovery_max_chunk, osd_max_backfills, osd_backfill_retry_interval etc)
 as we were only concerned about getting the OSDs rebalanced as of now. But
 this didnt improve at all all over the night till the morning.

 3 We then manually started all the OSDs in that specific node and the
 status came back up:
 But then we could see that there were 23PGs stuck unclean and in
 'active+remapped state'

 =
 sreejith@sb1001:~$ sudo ceph status
 [sudo] password for sreejith:
 cluster 9b48b60c-bebe-4714-8a61-91ca5b388a17
  health HEALTH_WARN 23 pgs stuck unclean
  monmap e2: 3 mons at
 {sb1001=10.2.4.90:6789/0,sb1002=10.2.4.202:6789/0,sb1004=10.2.4.203:6789/0},
 election epoch 22, quorum 0,1,2 sb1001,sb1002,sb1004
  osdmap e382: 68 osds: 61 up, 61 in
   pgmap v4931: 1564 pgs, 25 pools, 1320 MB data, 116 objects
 7931 MB used, 110 TB / 110 TB avail
 1541 active+clean
   23 active+remapped
 =

 The 'pg dump_stuck unclean' was showing  that all the PGs were on 4 OSDs and
 no other PG were on those same OSDs.
 SO:
 4 we took those OSDs out of the cluster using 'ceph osd out {id}'. Then
 the unclean PG number increased to 52. Even after making the OSDs back 'IN',
 the situation didn't improve.

 =
 root@sb1001:/home/sreejith# ceph health detail
 HEALTH_WARN 52 pgs stuck unclean
 pg 9.63 is stuck unclean since forever, current state active+remapped, last
 acting [47,7]
 pg 11.61 is stuck unclean since forever, current state active+remapped, last
 acting [47,7]
 pg 10.62 is stuck unclean since forever, current state active+remapped, last
 acting [47,7]
 pg 13.5f is stuck unclean since forever, current state active+remapped, last
 acting [47,7]
 pg 15.5d is stuck unclean since forever, current state active+remapped, last
 acting [47,7]
 pg 14.5e is stuck unclean since forever, current state active+remapped, last
 acting [47,7]
 pg 9.47 is stuck unclean for 530.594604, current state active+remapped, last
 acting [66,43]
 pg 7.49 is stuck unclean for 530.594593, current state active+remapped, last
 acting [66,43]
 pg 5.4b is stuck unclean for 530.594481, current state active+remapped, last
 acting [66,43]
 pg 3.4d is stuck unclean for 530.594449, current state active+remapped, last
 acting [66,43]
 pg 11.45 is stuck unclean for 530.594635, current state active+remapped,
 last acting [66,43]
 pg 13.43 is stuck unclean for 530.594654, current state active+remapped,
 last acting [66,43]
 pg 15.41 is stuck unclean for 530.594695, current state active+remapped,
 last acting [66,43]
 pg 6.4a is stuck unclean for 530.594366, current state active+remapped, last
 acting [66,43]
 pg 10.46 is stuck unclean for 530.594387, current state active+remapped,
 last acting [66,43]
 pg 14.42 is stuck unclean for 530.594422, current state active+remapped,
 last acting [66,43]
 pg 4.4c is stuck unclean for 530.594341, current state active+remapped, last
 acting [66,43]
 pg 12.44 is stuck unclean for 530.594361, current state active+remapped,
 last acting [66,43]
 pg 8.48 is stuck unclean for 530.594294, current state 

Re: [ceph-users] What's the status of feature: S3 object versioning?

2014-01-06 Thread Gregory Farnum
On Thu, Jan 2, 2014 at 12:40 AM, Ray Lv ra...@yahoo-inc.com wrote:
 Hi there,

 Noted that there is a Blueprint item about S3 object versioning in radosgw
 for Firefly at
 http://wiki.ceph.com/Planning/Blueprints/Firefly/rgw%3A_object_versioning
 And Sage has announced v0.74 release for Firefly. Do you guys know the
 status of this feature?

It was a community blueprint and got some discussion
(http://wiki.ceph.com/Planning/CDS/CDS_Firefly links to
http://www.youtube.com/watch?v=DWK5RrNRhHUfeature=sharet=1h30m00s
and http://pad.ceph.com/p/cdsfirefly-object-versioning), but nobody
was able to contribute developers to it.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw package - missing deps on Ubuntu 13.04

2014-01-06 Thread LaSalle, Jurvis


On 1/2/14, 1:42 PM, Sage Weil s...@inktank.com wrote:

The precise version has a few annoying (though rare)
bugs, and more importantly does not support caching properly.  For
clusters of any size this can become a performance problem, particularly
when the cluster is stressed (lots of OSDs catching up on OSDMaps).

What¹s this about poor ceph performance on ubuntu 12.04?  As a LTS
release, I was leaning towards using it for a production cluster.

Thanks,
JL
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] repair incosistent pg using emperor

2014-01-06 Thread David Zafman

Did the inconsistent flag eventually get cleared?  It might have been you 
didn’t wait long enough for the repair to get through the pg.

David Zafman
Senior Developer
http://www.inktank.com




On Dec 28, 2013, at 12:29 PM, Corin Langosch corin.lango...@netskin.com wrote:

 Hi Sage,
 
 Am 28.12.2013 19:18, schrieb Sage Weil:
 
  ceph pg scrub 6.29f
 
 ...and see if it comes back with errors or not.  If it doesn't, you
 can
 What do you mean with comnes back with error or not?
 
 ~# ceph pg scrub 6.29f
 instructing pg 6.29f on osd.8 to scrub
 
 But the logs don't show any scrubbing.  In fact the command doesn't see
 to do anything at all
 
 
  ceph pg repair 6.29f
 
 to clear the inconsistent flag.
 
 ~# ceph pg repair 6.29f
 instructing pg 6.29f on osd.8 to repair
 
 Again, nothing in the logs. It seems the commands are completely ignored?
 
 I already restarted osd 8 two times and tried again, no change...
 
 Corin
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Current state of OpenStack/Ceph rbd live migration?

2014-01-06 Thread Haomai Wang
On Tue, Jan 7, 2014 at 6:13 AM, Jeff Bachtel
jbach...@bericotechnologies.com wrote:
 I just wanted to get a quick sanity check (and ammunition for updating from
 Grizzly to Havana).

 Per
 https://blueprints.launchpad.net/nova/+spec/bring-rbd-support-libvirt-images-type
 it seems that explicit support for rbd image types has been brought into
 OpenStack/Havana. Does this correspond to live-migration working properly
 yet in Nova?

It doesn't support live-migration now.  But if you want a quick fix,
you can look at
https://review.openstack.org/#/c/56527/. It's provided by my team and use it for
months.

The ideal live-migration patch merging to master maybe late, we are
struggled with
refactor tasks now.


 For background, the nova libvirt driver in Grizzly did not grok how to live
 migrate for rbd (specifically the need to copy instance folders around, see
 http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-March/000536.html
 ). I'm just curious if this situation is rectified?

 Thanks,
 Jeff

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to deploy ceph with a Debian version other than stable (Hello James Page ^o^)

2014-01-06 Thread Christian Balzer

Hello,

I previously created a test cluster using the Argonaut packages available
in Debian testing aka Jessie (atm). 
Since it was pointed out to me that I ought to play with something more
recent, I bumped the machines to sid, which has 0.72.2 packages natively. 

The sid packages do not include ceph-deploy, so I tried mkcephfs, which I
was familiar with from the older version.
It warned about being deprecated, but seemed to create all the correct
config data, keyrings, etc.

However when starting ceph after the install it fired up the monitor OK
but failed at the first OSD with:
---
=== osd.2 === 
Mounting xfs on irt03:/var/lib/ceph/osd/ceph-2
Error ENOENT: osd.2 does not exist.  create it before updating the crush map
failed: 'timeout 10 /usr/bin/ceph   --name=osd.2   
--keyring=/etc/ceph/keyring.osd.2osd crush 
create-or-move--  2 
  0.36   root=default host=irt03  '
---

Guess something wasn't created properly after all or not at the place
where it was expected to be. 

So I grabbed the latest ceph-deploy, ceph-deploy_1.3.4-1~bpo70+1_all.deb,
and tried that, but as I feared from its name this is expecting nothing
newer than Wheezy:
---
[ceph_deploy.cli][INFO  ] Invoked (1.3.4): /usr/bin/ceph-deploy mon create irt03
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts irt03
[ceph_deploy.mon][DEBUG ] detecting platform for host irt03 ...
[irt03][DEBUG ] connected to host: irt03 
[irt03][DEBUG ] detect platform information from remote host
[ceph_deploy][ERROR ] UnsupportedPlatform: Platform is not supported: debian  
jessie/sid
---

Other than a manual deploy (which I tried before and failed, probably
because the documentation for that is not quite in touch with version
reality/variety and at points makes assumptions/omits steps), what's left
to try?

Is this the expected state of affairs, as in:
1. The current Debian Sid package can't create a new cluster by itself
2. ceph-deploy from ceph.com isn't available for Jessie or sid


Regards,

Christian
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] crush chooseleaf vs. choose

2014-01-06 Thread Dietmar Maurer
  I think this is just fundamentally a problem with distributing 3
  replicas over only 4 hosts.  Every piece of data in the system needs
  to include either host 3 or 4 (and thus device 4 or 5) in order to
  have 3 replicas (on separate hosts).  Add more hosts or disks and the
 distribution will even out.
 
 I also thought that, but the same thing also happen with more hosts.

OK, I have done more test now, and it seems that distribution is quite
good if num_hosts = 2*num_repl


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw swift java jars

2014-01-06 Thread raj kumar
I meant could not find required jar files to run java swift program.


On Mon, Jan 6, 2014 at 11:35 PM, raj kumar rajkumar600...@gmail.com wrote:

 Hi, could find all necessary jars required to run the java program.  Is
 there any place to get all jars for both swift and s3? Thanks.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw swift java jars

2014-01-06 Thread Wido den Hollander

On 01/07/2014 08:15 AM, raj kumar wrote:

I meant could not find required jar files to run java swift program.



I don't think nobody has a clue of what you mean.

Ceph is completely written in C++, so there are no Java JARs. The only 
piece of Java is the CephFS JNI integration and the RADOS Java bindings.


If you have a problem with a Swift tool that requires Java JARs I 
recommend you ask the maintainers of that project.


Wido



On Mon, Jan 6, 2014 at 11:35 PM, raj kumar rajkumar600...@gmail.com
mailto:rajkumar600...@gmail.com wrote:

Hi, could find all necessary jars required to run the java program.
  Is there any place to get all jars for both swift and s3? Thanks.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com