date:20170620

[ceph-users] Config parameters for system tuning

2017-06-20 Thread Maged Mokhtar


Hi,

1) I am trying to set some of the following config values which seems to 
be present in most config examples relating to performance tuning:

journal_queue_max_ops
journal_queue_max_bytes
filestore_queue_committing_max_bytes
filestore_queue_committing_max_ops

I am using 10.2.7 but not able to set these parameters either via conf 
file or injections, also ceph --show-config does not list them. Have 
they been deprecated and should be ignored ?


2) For osd_op_threads i have seen some examples (not the official docs) 
fixing this to the number of cpu cores, is this the best recommendation 
or can could we use more threads than cores ?


Cheers
Maged Mokhtar
PetaSAN
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph packages for Debian Stretch?

2017-06-20 Thread Alfredo Deza

On Mon, Jun 19, 2017 at 8:25 PM, Christian Balzer  wrote:
>
> Hello,
>
> can we have the status, projected release date of the Ceph packages for
> Debian Stretch?

We don't have anything yet as a projected release date.

The current status is that this has not been prioritized. I anticipate
that this will not be hard to accommodate in our repositories but
it will require quite the effort to add in all of our tooling.

In case anyone would like to help us out before the next stable
release, these are places that would need to be updated for "stretch"

https://github.com/ceph/ceph-build/tree/master/ceph-build
https://github.com/ceph/chacra

"grepping" for "jessie" should indicate every spot that might need to
be updated.

I am happy to review and answer questions to get these changes in!

>
> Christian
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Rakuten Communications
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coding: Wrong content of data and coding chunks?

2017-06-20 Thread Jonas Jaszkowic


> Am 20.06.2017 um 16:06 schrieb David Turner :
> 
> Ceph is a large scale storage system. You're hoping that it is going to care 
> about and split files that are 9 bytes in size. Do this same test with a 4MB 
> file and see how it splits up the content of the file.
> 
> 

Makes sense. I was just hoping to reproduce the behavior depicted in the figure 
at 
http://docs.ceph.com/docs/master/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
 

 with the exact same values. Thanks for the help!

> 
> On Tue, Jun 20, 2017, 6:48 AM Jonas Jaszkowic  > wrote:
> I am currently evaluating erasure coding in Ceph. I wanted to know where my 
> data and coding chunks are located, so I 
> followed the example at 
> http://docs.ceph.com/docs/master/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
>  
> 
>  
> and setup an erasure coded pool with k=3 data chunks and m=2 coding chunks. I 
> stored an object named 'NYAN‘ with content
> ‚ABCDEFGHI‘ in the pool.
> 
> The output of ceph osd map ecpool NYAN is following, which seems correct:
> 
> osdmap e97 pool 'ecpool' (6) object 'NYAN' -> pg 6.bf243b9 (6.39) -> up 
> ([3,1,0,2,4], p3) acting ([3,1,0,2,4], p3)
> 
> But when I have a look at the chunks stored on the corresponding OSDs, I see 
> three chunks containing the whole content of the original file (padded with 
> zeros to a size of 4.0K)
> and two chunks containing nothing but zeros. I do not understand this 
> behavior. According to the link above: "The NYAN object will be divided in 
> three (K=3) and two additional chunks will be created (M=2).“, but what I 
> experience is that the file is replicated three times in its whole and what 
> appears to be the coding chunks (i.e. holding parity information) are objects 
> containing nothing but zeros? Am I doing something wrong here? 
> 
> Any help is appreciated!
> 
> Attached is the output on each OSD node with the path to the chunk and its 
> content as hexdump:
> 
> osd.0
> path: 
> /var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
> md5sum: 1666ba51af756693678da9efc443ef44  
> /var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
> hexdump:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> ||
> *
> 0560
> 
> osd.1
> path: 
> /var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
> md5sum: 1666ba51af756693678da9efc443ef44  
> /var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
> hexdump:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
> ||
> *
> 0560
> 
> osd.2
> path: 
> /var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
> md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4  
> /var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
> hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00  
> |ABCDEFGHI...|
> 0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 0560
> 
> osd.3
> path: 
> /var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
> md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4  
> /var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
> hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00  
> |ABCDEFGHI...|
> 0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 0560
> 
> osd.4
> path: 
> /var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
> md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4  
> /var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
> hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00  
> |ABCDEFGHI...|
> 0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
> *
> 0560
> 
> 
> The erasure code profile used:
> 
> jerasure-per-chunk-alignment=false
> k=3
> m=2
> plugin=jerasure
> ruleset-failure-domain=host
>

Re: [ceph-users] cephfs-data-scan pg_files missing

2017-06-20 Thread John Spray

On Tue, Jun 20, 2017 at 4:06 PM, Mazzystr  wrote:
>
> I'm on Red Hat Storage 2.2 (ceph-10.2.7-0.el7.x86_64) and I see this...
> # cephfs-data-scan
> Usage:
>   cephfs-data-scan init [--force-init]
>   cephfs-data-scan scan_extents [--force-pool] 
>   cephfs-data-scan scan_inodes [--force-pool] [--force-corrupt]  name>
>
> --force-corrupt: overrite apparently corrupt structures
> --force-init: write root inodes even if they exist
> --force-pool: use data pool even if it is not in FSMap
>
>   cephfs-data-scan scan_frags [--force-corrupt]
>
>   cephfs-data-scan tmap_upgrade 
>
>   --conf/-c FILEread configuration from the given configuration file
>   --id/-i IDset ID portion of my name
>   --name/-n TYPE.ID set name
>   --cluster NAMEset cluster name (default: ceph)
>   --setuser USERset uid to user or uid (and gid to user's gid)
>   --setgroup GROUP  set gid to group or gid
>   --version show version and quit
>
>
> Anyone know where "cephfs-data-scan pg_files   [...]"
> went per docs?

The docs you're looking at are from the master branch of Ceph, i.e.
the forthcoming 12.2.x series -- pg_files is a new command that isn't
present in 10.2.x.

John

>
> Thanks,
> /Chris Callegari
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Recovering rgw index pool with large omap size

2017-06-20 Thread Sam Wouters

Hi list,

we need to recover an index pool distributed over 4 ssd based osd's. We
needed to kick out one of the OSDs cause it was blocking all rgw access
due to leveldb compacting. Since then we've restarted the OSD with
"leveldb compact on mount = true" and noup flag set, running the leveldb
compact offline, but the index pg's are now running in degraded mode.

Goal is to make the recovery as fast as possible during a small
maintenance window and/or with minimal client impact.

Cluster is running jewel 10.2.7 (recently upgraded from hammer) and has
ongoing backfill operations (from changing the tunables).
We have some buckets with a large amount of objects in it. Bucket index
re-sharding would be needed, but we don't have the opportunity to do
that right now.

Plan so far:
* set global I/O scheduling priority to 7 (lowest)
* set index pool osd's specifics:
- set recovery prio to highest (63)
- set client prio to lowest (1)
- increase recovery threads to 2
- set disk thread prio to highest (0)
- limit omap entries per chunk for recovery to 32k (64k seems to give
timeouts)
* unset noup flag to let the misbehaving OSD kick in and start recovery

Any further ideas, experience or remarks would be very much appreciated...

r,
Sam

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs-data-scan pg_files missing

2017-06-20 Thread Mazzystr

I'm on Red Hat Storage 2.2 (ceph-10.2.7-0.el7.x86_64) and I see this...
# cephfs-data-scan
Usage:
  cephfs-data-scan init [--force-init]
  cephfs-data-scan scan_extents [--force-pool] 
  cephfs-data-scan scan_inodes [--force-pool] [--force-corrupt] 

--force-corrupt: overrite apparently corrupt structures
--force-init: write root inodes even if they exist
--force-pool: use data pool even if it is not in FSMap

  cephfs-data-scan scan_frags [--force-corrupt]

  cephfs-data-scan tmap_upgrade 

  --conf/-c FILEread configuration from the given configuration file
  --id/-i IDset ID portion of my name
  --name/-n TYPE.ID set name
  --cluster NAMEset cluster name (default: ceph)
  --setuser USERset uid to user or uid (and gid to user's gid)
  --setgroup GROUP  set gid to group or gid
  --version show version and quit


Anyone know where "cephfs-data-scan pg_files   [...]"
went per docs ?

Thanks,
/Chris Callegari
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread David Turner

Setting an osd to 0.0 in the crush map will tell all PGs to move off of the
osd. It's right the same as removing the osd from the closer, except it
allows the osd to help move the data that it has and prevents having
degraded PGs and objects while you do it. The limit to weighting osds to
0.0 is how full your cluster and remaining osds will be when the 0.0 osds
are empty.

On Tue, Jun 20, 2017, 10:29 AM Peter Maloney <
peter.malo...@brockmann-consult.de> wrote:

> these settings are on a specific OSD:
>
> osd recovery max active = 1
> osd max backfills = 1
>
>
> I don't know if it will behave as you expect if you set 0... (I tested
> setting 0 which didn't complain, but is 0 actually 0 or unlimited or an
> error?)
>
> Maybe you could parse the ceph pg dump, then look at the pgs that list
> your special osds, then set all of the listed osds (not just special ones)
> config to 1 and the rest 0. But this will not prioritize specific pgs... or
> even specific osds, and maybe it'll end up being all osds.
>
> To further add to your criteria, you could select ones where the direction
> of movement is how you want it... like if up (where CRUSH wants the data
> after recovery is done) says [1,2,3] and acting (where it is now, even
> partial pgs I think) says [1,2,7] and you want to empty 7, then you have to
> set the numbers non-zero for osd 3 and 7, but maybe not 1 or 2 (although
> these could be read as part of recovery).
>
> I'm sure it's doomed to fail, but you can try it out on a test cluster.
>
> My guess is it will either not accept 0 like you expect, or it will only
> be a small fraction of your osds that you can set to 0.
>
>
>
> On 06/20/17 14:44, Richard Hesketh wrote:
>
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen PGs on 
> the disks I want to replace, which are all remapped and waiting to backfill - 
> but there are 2200 other PGs also waiting to backfill because they've moved 
> around too, and it's extremely frustating to be sat waiting to see when the 
> ones I care about will finally be handled so I can get on with replacing 
> those disks.
>
> Rich
>
>
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
>
> 
> Peter Maloney
> Brockmann Consult
> Max-Planck-Str. 2
> 21502 Geesthacht
> Germany
> Tel: +49 4152 889 300
> Fax: +49 4152 889 333
> E-mail: peter.malo...@brockmann-consult.de
> Internet: http://www.brockmann-consult.de
> 
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Peter Maloney

these settings are on a specific OSD:
> osd recovery max active = 1
> osd max backfills = 1

I don't know if it will behave as you expect if you set 0... (I tested
setting 0 which didn't complain, but is 0 actually 0 or unlimited or an
error?)

Maybe you could parse the ceph pg dump, then look at the pgs that list
your special osds, then set all of the listed osds (not just special
ones) config to 1 and the rest 0. But this will not prioritize specific
pgs... or even specific osds, and maybe it'll end up being all osds.

To further add to your criteria, you could select ones where the
direction of movement is how you want it... like if up (where CRUSH
wants the data after recovery is done) says [1,2,3] and acting (where it
is now, even partial pgs I think) says [1,2,7] and you want to empty 7,
then you have to set the numbers non-zero for osd 3 and 7, but maybe not
1 or 2 (although these could be read as part of recovery).

I'm sure it's doomed to fail, but you can try it out on a test cluster.

My guess is it will either not accept 0 like you expect, or it will only
be a small fraction of your osds that you can set to 0.

On 06/20/17 14:44, Richard Hesketh wrote:
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen PGs on 
> the disks I want to replace, which are all remapped and waiting to backfill - 
> but there are 2200 other PGs also waiting to backfill because they've moved 
> around too, and it's extremely frustating to be sat waiting to see when the 
> ones I care about will finally be handled so I can get on with replacing 
> those disks.
>
> Rich
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coding: Wrong content of data and coding chunks?

2017-06-20 Thread David Turner

Ceph is a large scale storage system. You're hoping that it is going to
care about and split files that are 9 bytes in size. Do this same test with
a 4MB file and see how it splits up the content of the file.

On Tue, Jun 20, 2017, 6:48 AM Jonas Jaszkowic 
wrote:

> I am currently evaluating erasure coding in Ceph. I wanted to know where
> my data and coding chunks are located, so I
> followed the example at
> http://docs.ceph.com/docs/master/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
>
> and setup an erasure coded pool with k=3 data chunks and m=2 coding
> chunks. I stored an object named 'NYAN‘ with content
> ‚ABCDEFGHI‘ in the pool.
>
> The output of ceph osd map ecpool NYAN is following, which seems correct:
>
> osdmap e97 pool 'ecpool' (6) object 'NYAN' -> pg 6.bf243b9 (6.39) -> up
> ([3,1,0,2,4], p3) acting ([3,1,0,2,4], p3)
>
> But when I have a look at the chunks stored on the corresponding OSDs, I
> see three chunks containing the *whole *content of the original file
> (padded with zeros to a size of 4.0K)
> and two chunks containing nothing but zeros. I do not understand this
> behavior. According to the link above: "The NYAN object will be divided in
> three (K=3) and two additional chunks will be created (M=2).“, but what I
> experience is that the file is replicated three times *in its whole *and
> what appears to be the coding chunks (i.e. holding parity information) are
> objects containing *nothing but zeros*? Am I doing something wrong here?
>
> Any help is appreciated!
>
> Attached is the output on each OSD node with the path to the chunk and its
> content as hexdump:
>
> osd.0
> path:
> /var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
> md5sum: 1666ba51af756693678da9efc443ef44
>  
> /var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
> hexdump:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>  ||
> *
> 0560
>
> osd.1
> path:
> /var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
> md5sum: 1666ba51af756693678da9efc443ef44
>  
> /var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
> hexdump:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>  ||
> *
> 0560
>
> osd.2
> path:
> /var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
> md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4
>  
> /var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
> hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00
>  |ABCDEFGHI...|
> 0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>  ||
> *
> 0560
>
> osd.3
> path:
> /var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
> md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4
>  
> /var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
> hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00
>  |ABCDEFGHI...|
> 0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>  ||
> *
> 0560
>
> osd.4
> path:
> /var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
> md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4
>  
> /var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
> filesize: 4.0K
> /var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
> hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00
>  |ABCDEFGHI...|
> 0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
>  ||
> *
> 0560
>
>
> The erasure code profile used:
>
> jerasure-per-chunk-alignment=false
> k=3
> m=2
> plugin=jerasure
> ruleset-failure-domain=host
> ruleset-root=default
> technique=reed_sol_van
> w=8
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread David Turner

If you're planning to remove the next set of disks, I would recommend
weighting them to 0.0 in the crush map if you have the room for it. The
process at this point would be weighting the next set to 0.0 when you add
the previous set back in. That way when you finish removing the next set
there is no additional data movement until you add them back in. Also, you
can tell when they are done because they'll be empty.

To increase the likelihood that a specific osd finishes backfilling sooner,
you can increase osd_max_backfills on it.

On Tue, Jun 20, 2017, 9:48 AM Logan Kuhn  wrote:

> Is there a way to prioritize specific pools during recovery?  I know there
> are issues open for it, but I wasn't aware it was implemented yet...
>
> Regards,
> Logan
>
> - On Jun 20, 2017, at 8:20 AM, Sam Wouters  wrote:
>
> Hi,
>
> Are they all in the same pool? Otherwise you could prioritize pool
> recovery.
> If not, maybe you can play with the osd max backfills number, no idea if
> it accepts a value of 0 to actually disable it for specific OSDs.
>
> r,
> Sam
>
> On 20-06-17 14:44, Richard Hesketh wrote:
>
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen PGs on 
> the disks I want to replace, which are all remapped and waiting to backfill - 
> but there are 2200 other PGs also waiting to backfill because they've moved 
> around too, and it's extremely frustating to be sat waiting to see when the 
> ones I care about will finally be handled so I can get on with replacing 
> those disks.
>
> Rich
>
>
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Sam Wouters

Yes, don't know exactly since which release it was introduced, but in
latest jewel and beyond there is:


Please use pool level options recovery_priority and recovery_op_priority
for enabling pool level recovery priority feature:
Raw
# ceph osd pool set default.rgw.buckets.index recovery_priority 5
# ceph osd pool set default.rgw.buckets.index recovery_op_priority 5
Recovery value 5 will help because the default is 3 in jewel release,
use below command to check if both options are set properly
 Is there a way to prioritize specific pools during recovery?  I know
> there are issues open for it, but I wasn't aware it was implemented yet...
>
> Regards,
> Logan
>
> - On Jun 20, 2017, at 8:20 AM, Sam Wouters  wrote:
>
> Hi,
>
> Are they all in the same pool? Otherwise you could prioritize pool
> recovery.
> If not, maybe you can play with the osd max backfills number, no
> idea if it accepts a value of 0 to actually disable it for
> specific OSDs.
>
> r,
> Sam
>
> On 20-06-17 14:44, Richard Hesketh wrote:
>
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen 
> PGs on the disks I want to replace, which are all remapped and waiting to 
> backfill - but there are 2200 other PGs also waiting to backfill because 
> they've moved around too, and it's extremely frustating to be sat waiting to 
> see when the ones I care about will finally be handled so I can get on with 
> replacing those disks.
>
> Rich
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Logan Kuhn

Is there a way to prioritize specific pools during recovery? I know there are 
issues open for it, but I wasn't aware it was implemented yet... 

Regards, 
Logan 

- On Jun 20, 2017, at 8:20 AM, Sam Wouters  wrote: 

| Hi,

| Are they all in the same pool? Otherwise you could prioritize pool recovery.
| If not, maybe you can play with the osd max backfills number, no idea if it
| accepts a value of 0 to actually disable it for specific OSDs.

| r,
| Sam

| On 20-06-17 14:44, Richard Hesketh wrote:

|| Is there a way, either by individual PG or by OSD, I can prioritise
|| backfill/recovery on a set of PGs which are currently particularly important 
to
|| me?

|| For context, I am replacing disks in a 5-node Jewel cluster, on a 
node-by-node
|| basis - mark out the OSDs on a node, wait for them to clear, replace OSDs,
|| bring up and in, mark out the OSDs on the next set, etc. I've done my first
|| node, but the significant CRUSH map changes means most of my data is moving. 
I
|| only currently care about the PGs on my next set of OSDs to replace - the 
other
|| remapped PGs I don't care about settling because they're only going to end up
|| moving around again after I do the next set of disks. I do want the PGs
|| specifically on the OSDs I am about to replace to backfill because I don't 
want
|| to compromise data integrity by downing them while they host active PGs. If I
|| could specifically prioritise the backfill on those PGs/OSDs, I could get on
|| with replacing disks without worrying about causing degraded PGs.

|| I'm in a situation right now where there is merely a couple of dozen PGs on 
the
|| disks I want to replace, which are all remapped and waiting to backfill - but
|| there are 2200 other PGs also waiting to backfill because they've moved 
around
|| too, and it's extremely frustating to be sat waiting to see when the ones I
|| care about will finally be handled so I can get on with replacing those 
disks.

|| Rich

|| ___
|| ceph-users mailing list [ mailto:ceph-users@lists.ceph.com |
|| ceph-users@lists.ceph.com ] [
|| http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
|| http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]

| ___
| ceph-users mailing list
| ceph-users@lists.ceph.com
| http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Sam Wouters

Hi,

Are they all in the same pool? Otherwise you could prioritize pool recovery.
If not, maybe you can play with the osd max backfills number, no idea if
it accepts a value of 0 to actually disable it for specific OSDs.

r,
Sam

On 20-06-17 14:44, Richard Hesketh wrote:
> Is there a way, either by individual PG or by OSD, I can prioritise 
> backfill/recovery on a set of PGs which are currently particularly important 
> to me?
>
> For context, I am replacing disks in a 5-node Jewel cluster, on a 
> node-by-node basis - mark out the OSDs on a node, wait for them to clear, 
> replace OSDs, bring up and in, mark out the OSDs on the next set, etc. I've 
> done my first node, but the significant CRUSH map changes means most of my 
> data is moving. I only currently care about the PGs on my next set of OSDs to 
> replace - the other remapped PGs I don't care about settling because they're 
> only going to end up moving around again after I do the next set of disks. I 
> do want the PGs specifically on the OSDs I am about to replace to backfill 
> because I don't want to compromise data integrity by downing them while they 
> host active PGs. If I could specifically prioritise the backfill on those 
> PGs/OSDs, I could get on with replacing disks without worrying about causing 
> degraded PGs.
>
> I'm in a situation right now where there is merely a couple of dozen PGs on 
> the disks I want to replace, which are all remapped and waiting to backfill - 
> but there are 2200 other PGs also waiting to backfill because they've moved 
> around too, and it's extremely frustating to be sat waiting to see when the 
> ones I care about will finally be handled so I can get on with replacing 
> those disks.
>
> Rich
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Prioritise recovery on specific PGs/OSDs?

2017-06-20 Thread Richard Hesketh

Is there a way, either by individual PG or by OSD, I can prioritise 
backfill/recovery on a set of PGs which are currently particularly important to 
me?

For context, I am replacing disks in a 5-node Jewel cluster, on a node-by-node 
basis - mark out the OSDs on a node, wait for them to clear, replace OSDs, 
bring up and in, mark out the OSDs on the next set, etc. I've done my first 
node, but the significant CRUSH map changes means most of my data is moving. I 
only currently care about the PGs on my next set of OSDs to replace - the other 
remapped PGs I don't care about settling because they're only going to end up 
moving around again after I do the next set of disks. I do want the PGs 
specifically on the OSDs I am about to replace to backfill because I don't want 
to compromise data integrity by downing them while they host active PGs. If I 
could specifically prioritise the backfill on those PGs/OSDs, I could get on 
with replacing disks without worrying about causing degraded PGs.

I'm in a situation right now where there is merely a couple of dozen PGs on the 
disks I want to replace, which are all remapped and waiting to backfill - but 
there are 2200 other PGs also waiting to backfill because they've moved around 
too, and it's extremely frustating to be sat waiting to see when the ones I 
care about will finally be handled so I can get on with replacing those disks.

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Erasure Coding: Wrong content of data and coding chunks?

2017-06-20 Thread Jonas Jaszkowic

I am currently evaluating erasure coding in Ceph. I wanted to know where my 
data and coding chunks are located, so I 
followed the example at 
http://docs.ceph.com/docs/master/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
 

 
and setup an erasure coded pool with k=3 data chunks and m=2 coding chunks. I 
stored an object named 'NYAN‘ with content
‚ABCDEFGHI‘ in the pool.

The output of ceph osd map ecpool NYAN is following, which seems correct:

osdmap e97 pool 'ecpool' (6) object 'NYAN' -> pg 6.bf243b9 (6.39) -> up 
([3,1,0,2,4], p3) acting ([3,1,0,2,4], p3)

But when I have a look at the chunks stored on the corresponding OSDs, I see 
three chunks containing the whole content of the original file (padded with 
zeros to a size of 4.0K)
and two chunks containing nothing but zeros. I do not understand this behavior. 
According to the link above: "The NYAN object will be divided in three (K=3) 
and two additional chunks will be created (M=2).“, but what I experience is 
that the file is replicated three times in its whole and what appears to be the 
coding chunks (i.e. holding parity information) are objects containing nothing 
but zeros? Am I doing something wrong here? 

Any help is appreciated!

Attached is the output on each OSD node with the path to the chunk and its 
content as hexdump:

osd.0
path: 
/var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
md5sum: 1666ba51af756693678da9efc443ef44  
/var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
filesize: 4.0K  
/var/lib/ceph/osd/ceph-0/current/6.39s2_head/NYAN__head_0BF243B9__6__2
hexdump:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
||
*
0560

osd.1
path: 
/var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
md5sum: 1666ba51af756693678da9efc443ef44  
/var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
filesize: 4.0K  
/var/lib/ceph/osd/ceph-1/current/6.39s1_head/NYAN__head_0BF243B9__6__1
hexdump:   00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  
||
*
0560

osd.2
path: 
/var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4  
/var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
filesize: 4.0K  
/var/lib/ceph/osd/ceph-2/current/6.39s3_head/NYAN__head_0BF243B9__6__3
hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00  
|ABCDEFGHI...|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0560

osd.3
path: 
/var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4  
/var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
filesize: 4.0K  
/var/lib/ceph/osd/ceph-3/current/6.39s0_head/NYAN__head_0BF243B9__6__0
hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00  
|ABCDEFGHI...|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0560

osd.4
path: 
/var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
md5sum: ff6a7f77674e23fd7e3a0c11d7b36ed4  
/var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
filesize: 4.0K  
/var/lib/ceph/osd/ceph-4/current/6.39s4_head/NYAN__head_0BF243B9__6__4
hexdump:   41 42 43 44 45 46 47 48  49 0a 00 00 00 00 00 00  
|ABCDEFGHI...|
0010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ||
*
0560


The erasure code profile used:

jerasure-per-chunk-alignment=false
k=3
m=2
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS | flapping OSD locked up NFS

2017-06-20 Thread John Spray

On Tue, Jun 20, 2017 at 11:13 AM, David  wrote:
> Hi John
>
> I've had nfs-ganesha testing on the to do list for a while, I think I might
> move it closer to the top!  I'll certainly report back with the results.
>
> I'd still be interested to hear any kernel nfs experiences/tips, my
> understanding is nfs is included in the ceph testing suite so there is an
> expectation people will want to use it.

It is indeed part of the automated tests, although the coverage (in
the "knfs" suite) is fairly light, and does not do any thrashing to
simulate failures the way we do on the main cephfs tests.

John

>
> Thanks,
> David
>
>
> On 19 Jun 2017 3:56 p.m., "John Petrini"  wrote:
>>
>> Hi David,
>>
>> While I have no personal experience with this; from what I've been told,
>> if you're going to export cephfs over NFS it's recommended that you use a
>> userspace implementation of NFS (like nfs-ganesha) rather than
>> nfs-kernel-server. This may be the source of you issues and might be worth
>> testing. I'd be interested to hear the results if you do.
>>
>> ___
>>
>> John Petrini
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS | flapping OSD locked up NFS

2017-06-20 Thread David

Hi John

I've had nfs-ganesha testing on the to do list for a while, I think I might
move it closer to the top!  I'll certainly report back with the results.

I'd still be interested to hear any kernel nfs experiences/tips, my
understanding is nfs is included in the ceph testing suite so there is an
expectation people will want to use it.

Thanks,
David

On 19 Jun 2017 3:56 p.m., "John Petrini"  wrote:

> Hi David,
>
> While I have no personal experience with this; from what I've been told,
> if you're going to export cephfs over NFS it's recommended that you use a
> userspace implementation of NFS (like nfs-ganesha) rather than
> nfs-kernel-server. This may be the source of you issues and might be worth
> testing. I'd be interested to hear the results if you do.
>
> ___
>
> John Petrini
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Erasure Coding: Determine location of data and coding chunks

2017-06-20 Thread Jonas Jaszkowic

Thank you! I already knew about the ceph osd map command, but I am not sure how 
to interpret the output. For example, on the described
erasure coded pool, the output is:

osdmap e30 pool 'ecpool' (1) object 'sample-obj' -> pg 1.fa0b8566 (1.66) -> up 
([1,4,2,0,3], p1) acting ([1,4,2,0,3], p1)

This basically tells me that the object 'sample-obj‘ is located on pg 
1.fa0b8566 (1.66) and has it’s chunks stored on OSDs [1,4,2,0,3].

But I cannot tell where the data chunks and where the coding chunks  are 
stored, can I? Or is the order of the OSDs in the 
[1,4,2,0,3] list of relevance? I am missing some documentation on how to 
interpret the output, I guess.

- Jonas

> Am 19.06.2017 um 23:45 schrieb Marko Sluga :
> 
> Hi Jonas,
> 
> ceph osd map [poolname] [objectname] 
> 
> should provide you with more information about where the object and chunks 
> are stored on the cluster.
> 
> Regards,
> 
> Marko Sluga
> Independent Trainer
> 
> <1487020143233.jpg>
> 
> W: http://markocloud.com 
> T: +1 (647) 546-4365
> 
> L + M Consulting Inc.
> Ste 212, 2121 Lake Shore Blvd W
> M8E 4E9, Etobicoke, ON
> 
> 
>  On Mon, 19 Jun 2017 14:56:57 -0400 Jonas Jaszkowic 
>  wrote 
> 
> Hello all, I have a simple question: 
> 
> I have an erasure coded pool with k = 2 data chunks and m = 3 coding chunks, 
> how can I determine the location of the data and coding chunks? Given an 
> object A 
> that is stored on n = k + m different OSDs I want to find out where (i.e. on 
> which OSDs) 
> the data chunks are stored and where the coding chunks are stored. 
> 
> Thank you! 
> 
> - Jonas 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com  
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>  
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries

2017-06-20 Thread Pavan Rallabhandi

Hi Orit,

No, we do not use multi-site.

Thanks,
-Pavan.

From: Orit Wasserman 
Date: Tuesday, 20 June 2017 at 12:49 PM
To: Pavan Rallabhandi 
Cc: "ceph-users@lists.ceph.com" 
Subject: EXT: Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries

Hi Pavan, 

On Tue, Jun 20, 2017 at 8:29 AM, Pavan Rallabhandi 
 wrote:
Trying one more time with ceph-users

On 19/06/17, 11:07 PM, "Pavan Rallabhandi"  wrote:

    On many of our clusters running Jewel (10.2.5+), am running into a strange 
problem of having stale bucket index entries left over for (some of the) 
objects deleted. Though it is not reproducible at will, it has been pretty 
consistent of late and am clueless at this point for the possible reasons to 
happen so.

    The symptoms are that the actual delete operation of an object is reported 
successful in the RGW logs, but a bucket list on the container would still show 
the deleted object. An attempt to download/stat of the object appropriately 
results in a failure. No failures are seen in the respective OSDs where the 
bucket index object is located. And rebuilding the bucket index by running 
‘radosgw-admin bucket check –fix’ would fix the issue.

    Though I could simulate the problem by instrumenting the code, to not to 
have invoked `complete_del` on the bucket index op 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that 
call is always seem to be made unless there is a cascading error from the 
actual delete operation of the object, which doesn’t seem to be the case here.

    I wanted to know the possible reasons where the bucket index would be left 
in such limbo, any pointers would be much appreciated. FWIW, we are not 
sharding the buckets and very recently I’ve seen this happen with buckets 
having as low as
    < 10 objects, and we are using swift for all the operations.

Do you use multisite? 

Regards,
Orit

    Thanks,
    -Pavan.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries

2017-06-20 Thread Orit Wasserman

Hi Pavan,

On Tue, Jun 20, 2017 at 8:29 AM, Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> Trying one more time with ceph-users
>
> On 19/06/17, 11:07 PM, "Pavan Rallabhandi" 
> wrote:
>
> On many of our clusters running Jewel (10.2.5+), am running into a
> strange problem of having stale bucket index entries left over for (some of
> the) objects deleted. Though it is not reproducible at will, it has been
> pretty consistent of late and am clueless at this point for the possible
> reasons to happen so.
>
> The symptoms are that the actual delete operation of an object is
> reported successful in the RGW logs, but a bucket list on the container
> would still show the deleted object. An attempt to download/stat of the
> object appropriately results in a failure. No failures are seen in the
> respective OSDs where the bucket index object is located. And rebuilding
> the bucket index by running ‘radosgw-admin bucket check –fix’ would fix the
> issue.
>
> Though I could simulate the problem by instrumenting the code, to not
> to have invoked `complete_del` on the bucket index op
> https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but
> that call is always seem to be made unless there is a cascading error from
> the actual delete operation of the object, which doesn’t seem to be the
> case here.
>
> I wanted to know the possible reasons where the bucket index would be
> left in such limbo, any pointers would be much appreciated. FWIW, we are
> not sharding the buckets and very recently I’ve seen this happen with
> buckets having as low as
> < 10 objects, and we are using swift for all the operations.
>
>
Do you use multisite?

Regards,
Orit


> Thanks,
> -Pavan.
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW not working after upgrade to Hammer

2017-06-20 Thread Gerson Jamal

Hi everyone,

I upgrade ceph from firefly to hammer and everything looks OK on upgrade
but after that RadosGW not working, I can list all buckets but i cant list
the objects inside the buckets, and I receive the following error:

format=json 400 Bad Request   []{"Code":"InvalidArgument"}

On Radosgw log I got the following error:

2017-06-17 01:37:25.325505 7f0108801700 10 ver=v1 first= req=
2017-06-17 01:37:25.325508 7f0108801700 10 s->object= s->bucket=
2017-06-17 01:37:25.325513 7f0108801700  2 req 21:0.49:swift:GET
/swift/v1/::getting op
2017-06-17 01:37:25.325516 7f0108801700  2 req 21:0.53:swift:GET
/swift/v1/:list_buckets:authorizing
2017-06-17 01:37:25.325529 7f0108801700 10 swift_user=sysmonitor:xx
2017-06-17 01:37:25.325541 7f0108801700 20 build_token
token=15007379736d6f6e69746f723a6c69676874686f7573652c7d7d41da54e3ba35bd45595a0df912
2017-06-17 01:37:25.325572 7f0108801700  2 req 21:0.000108:swift:GET
/swift/v1/:list_buckets:reading permissions
2017-06-17 01:37:25.325580 7f0108801700  2 req 21:0.000116:swift:GET
/swift/v1/:list_buckets:init op
2017-06-17 01:37:25.325582 7f0108801700  2 req 21:0.000119:swift:GET
/swift/v1/:list_buckets:verifying op mask
2017-06-17 01:37:25.325584 7f0108801700 20 required_mask= 1 user.op_mask=7
2017-06-17 01:37:25.325586 7f0108801700  2 req 21:0.000122:swift:GET
/swift/v1/:list_buckets:verifying op permissions
2017-06-17 01:37:25.325588 7f0108801700  2 req 21:0.000125:swift:GET
/swift/v1/:list_buckets:verifying op params
2017-06-17 01:37:25.325590 7f0108801700  2 req 21:0.000127:swift:GET
/swift/v1/:list_buckets:executing
2017-06-17 01:37:25.328258 7f0108801700 20 reading from
.rgw:.bucket.meta.CHECK_CEPH:default.4576.17572
2017-06-17 01:37:25.328284 7f0108801700 20 get_obj_state:
rctx=0x7f01087ff250 obj=.rgw:.bucket.meta.CHECK_CEPH:default.4576.17572
state=0x7f05641389c0 s->prefetch_data=0
2017-06-17 01:37:25.328294 7f0108801700 10 cache get:
name=.rgw+.bucket.meta.CHECK_CEPH:default.4576.17572 : hit
2017-06-17 01:37:25.328304 7f0108801700 20 get_obj_state: s->obj_tag was
set empty
2017-06-17 01:37:25.328308 7f0108801700 10 cache get:
name=.rgw+.bucket.meta.CHECK_CEPH:default.4576.17572 : hit
2017-06-17 01:37:25.330351 7f0108801700  0 ERROR: could not get stats for
buckets
2017-06-17 01:37:25.330378 7f0108801700 10 WARNING: failed on
rgw_get_user_buckets uid=sysmonitor
2017-06-17 01:37:25.330407 7f0108801700  2 req 21:0.004943:swift:GET
/swift/v1/:list_buckets:http status=400
2017-06-17 01:37:25.330412 7f0108801700  1 == req done
req=0x7f053023c0a0 http_status=400 ==
2017-06-17 01:37:25.330418 7f0108801700 20 process_request() returned -22
2017-06-17 01:37:28.470724 7f05837fe700  2
RGWDataChangesLog::ChangesRenewThread: start



Anyone can help me

-- 
Regards,

Gerson Razaque Jamal
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Config parameters for system tuning

Re: [ceph-users] Ceph packages for Debian Stretch?

Re: [ceph-users] Erasure Coding: Wrong content of data and coding chunks?

Re: [ceph-users] cephfs-data-scan pg_files missing

[ceph-users] Recovering rgw index pool with large omap size

[ceph-users] cephfs-data-scan pg_files missing

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

Re: [ceph-users] Erasure Coding: Wrong content of data and coding chunks?

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

Re: [ceph-users] Prioritise recovery on specific PGs/OSDs?

[ceph-users] Prioritise recovery on specific PGs/OSDs?

[ceph-users] Erasure Coding: Wrong content of data and coding chunks?

Re: [ceph-users] CephFS | flapping OSD locked up NFS

Re: [ceph-users] CephFS | flapping OSD locked up NFS

Re: [ceph-users] Erasure Coding: Determine location of data and coding chunks

Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries

Re: [ceph-users] FW: radosgw: stale/leaked bucket index entries

[ceph-users] RadosGW not working after upgrade to Hammer

21 matches

Site Navigation

Mail list logo

Footer information