Re: active+remapped after remove osd via ceph osd out

2014-08-27 Thread Dominik Mostowiec
Hi,
After set chooseleaf_descend_once=0, and migration 20% PGs ceph is HEALTH_OK.
chooseleaf_descend_once optimal value is 1 :-(

--
Regards
Dominik


2014-08-21 15:59 GMT+02:00 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 I have 2 PG in active+remapped state.

 ceph health detail
 HEALTH_WARN 2 pgs stuck unclean; recovery 24/348041229 degraded (0.000%)
 pg 3.1a07 is stuck unclean for 29239.046024, current state
 active+remapped, last acting [167,80,145]
 pg 3.154a is stuck unclean for 29239.039777, current state
 active+remapped, last acting [377,224,292]
 recovery 24/348041229 degraded (0.000%)

 This happend when i call ceph osd reweight-by-utilization 102

 What can be wrong ?

 ceph -v - ceph version 0.67.10 (9d446bd416c52cd785ccf048ca67737ceafcdd7f)

 Tunables:
 ceph osd crush dump | tail -n 4
   tunables: { choose_local_tries: 0,
   choose_local_fallback_tries: 0,
   choose_total_tries: 60,
   chooseleaf_descend_once: 1}}

 Cluster:
 6 racks X 3 hosts X 22 OSDs. (396 osds: 396 up, 396 in)

 crushtool -i ../crush2  --min-x 0 --num-rep 3  --max-x 10624 --test 
 --show-bad-mappings
 is clean.

 When 'ceph osd reweight' for all osd is 1.0 is ok, but i have nearfull OSD's.

 There is no missing OSD's in crushmap
 grep device /tmp/crush.txt | grep -v osd
 # devices

 ceph osd dump | grep -i pool
 pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0
 crash_replay_interval 45
 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0
 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0
 pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner
 0
 pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 2048 pgp_num 2048 last_change 90517 owner 0
 pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0
 pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0
 pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0
 pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28467 owner
 18446744073709551615
 pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28468 owner
 18446744073709551615
 pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner
 18446744073709551615
 pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 33487 owner
 18446744073709551615
 pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0
 pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
 pg_num 8 pgp_num 8 last_change 46912 owner 0

 ceph pg 3.1a07 query
 { state: active+remapped,
   epoch: 181721,
   up: [
 167,
 80],
   acting: [
 167,
 80,
 145],
   info: { pgid: 3.1a07,
   last_update: 181719'94809,
   last_complete: 181719'94809,
   log_tail: 159997'91808,
   last_backfill: MAX,
   purged_snaps: [],
   history: { epoch_created: 4,
   last_epoch_started: 179611,
   last_epoch_clean: 179611,
   last_epoch_split: 11522,
   same_up_since: 179610,
   same_interval_since: 179610,
   same_primary_since: 179610,
   last_scrub: 160655'94695,
   last_scrub_stamp: 2014-08-19 04:16:20.308318,
   last_deep_scrub: 158290'91157,
   last_deep_scrub_stamp: 2014-08-12 05:15:25.557591,
   last_clean_scrub_stamp: 2014-08-19 04:16:20.308318},
   stats: { version: 181719'94809,
   reported_seq: 995830,
   reported_epoch: 181721,
   state: active+remapped,
   last_fresh: 2014-08-21 14:53:14.050284,
   last_change: 2014-08-21 09:42:07.473356,
   last_active: 2014-08-21 14:53:14.050284,
   last_clean: 2014-08-21 07:38:51.366084,
   last_became_active: 2013-10-25 13:59:36.125019,
   last_unstale: 2014-08-21 14:53:14.050284,
   mapping_epoch: 179606,
   log_start: 159997'91808,
   ondisk_log_start: 159997'91808,
   created: 4,
   last_epoch_clean: 179611,
   parent: 0.0,
   parent_split_bits: 0,
   last_scrub: 160655'94695,
   last_scrub_stamp: 2014-08-19 04:16:20.308318,
   last_deep_scrub: 158290'91157,
   last_deep_scrub_stamp: 2014-08-12 05:15:25.557591,
   last_clean_scrub_stamp: 2014-08-19 04:16

Re: active+remapped after remove osd via ceph osd out

2014-08-21 Thread Dominik Mostowiec
,
  num_objects_unfound: 0,
  num_read: 645471,
  num_read_kb: 16973620,
  num_write: 111416,
  num_write_kb: 2459459,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 48440,
  num_bytes_recovered: 10006953676,
  num_keys_recovered: 0},
  stat_cat_sum: {},
  up: [
167,
80],
  acting: [
167,
80,
145]},
  empty: 0,
  dne: 0,
  incomplete: 0,
  last_epoch_started: 179611},
  recovery_state: [
{ name: Started\/Primary\/Active,
  enter_time: 2014-08-21 09:42:07.473030,
  might_have_unfound: [],
  recovery_progress: { backfill_target: -1,
  waiting_on_backfill: 0,
  backfill_pos: 0\/\/0\/\/-1,
  backfill_info: { begin: 0\/\/0\/\/-1,
  end: 0\/\/0\/\/-1,
  objects: []},
  peer_backfill_info: { begin: 0\/\/0\/\/-1,
  end: 0\/\/0\/\/-1,
  objects: []},
  backfills_in_flight: [],
  pull_from_peer: [],
  pushing: []},
  scrub: { scrubber.epoch_start: 0,
  scrubber.active: 0,
  scrubber.block_writes: 0,
  scrubber.finalizing: 0,
  scrubber.waiting_on: 0,
  scrubber.waiting_on_whom: []}},
{ name: Started,
  enter_time: 2014-08-21 09:42:06.410951}]}

--
Regards
Dominik

2014-08-18 23:27 GMT+02:00 Dominik Mostowiec dominikmostow...@gmail.com:
 After replace broken disk and ceph osd in it, cluster:
 ceph health detail
 HEALTH_WARN 2 pgs stuck unclean; recovery 60/346857819 degraded (0.000%)
 pg 3.884 is stuck unclean for 570722.873270, current state
 active+remapped, last acting [143,261,314]
 pg 3.154a is stuck unclean for 577659.917066, current state
 active+remapped, last acting [85,224,64]
 recovery 60/346857819 degraded (0.000%)

 What can be wrong?
 It is possible this is caused by 'ceph osd reweight-by-utilization' ?

 More info:
 ceph -v
 ceph version 0.67.9 (ba340a97c3dafc9155023da8d515eecc675c619a)

 Enabled tunnables:
 # begin crush map
 tunable choose_local_tries 0
 tunable choose_local_fallback_tries 0
 tunable choose_total_tries 50
 tunable chooseleaf_descend_once 1

 df osd:
 143 - 78%
 261 - 78%
 314 - 80%

 85 - 76%
 224  76%
 64 - 75%

 ceph osd dump | grep -i pool
 pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0
 crash_replay_interval 45
 pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0
 pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash
 rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0
 pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner
 0
 pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 2048 pgp_num 2048 last_change 90517 owner 0
 pool 5 '.rgw' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 128 pgp_num 128 last_change 72467 owner 0
 pool 6 '.users.uid' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28465 owner 0
 pool 7 '.users' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28466 owner 0
 pool 8 '.usage' rep size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28467 owner
 18446744073709551615
 pool 9 '.intent-log' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 28468 owner
 18446744073709551615
 pool 10 '.rgw.control' rep size 3 min_size 1 crush_ruleset 0
 object_hash rjenkins pg_num 8 pgp_num 8 last_change 33485 owner
 18446744073709551615
 pool 11 '.rgw.gc' rep size 3 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 33487 owner
 18446744073709551615
 pool 12 '.rgw.root' rep size 2 min_size 1 crush_ruleset 0 object_hash
 rjenkins pg_num 8 pgp_num 8 last_change 44540 owner 0
 pool 13 '' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins
 pg_num 8 pgp_num 8 last_change 46912 owner 0

 ceph pg 3.884 query
 { state: active+remapped,
   epoch: 160655,
   up: [
 143],
   acting: [
 143,
 261,
 314],
   info: { pgid: 3.884,
   last_update: 160655'111533,
   last_complete: 160655'111533,
   log_tail: 159997'108532,
   last_backfill: MAX,
   purged_snaps: [],
   history: { epoch_created: 4,
   last_epoch_started: 160261,
   last_epoch_clean: 160261,
   last_epoch_split: 11488,
   same_up_since: 160252,
   same_interval_since: 160260

Re: [ceph-users] poor data distribution

2014-03-24 Thread Dominik Mostowiec
Hi,
 FWIW the tunable that fixes this was just merged today but won't
 appear in a release for another 3 weeks or so.
This is vary_r tunable ?

Can I use this in production?

--
Regards
Dominik


2014-02-12 3:24 GMT+01:00 Sage Weil s...@inktank.com:
 On Wed, 12 Feb 2014, Dominik Mostowiec wrote:
 Hi,
 If this problem (with stucked active+remapped pgs after
 reweight-by-utilisation) affects all ceph configurations or only
 specific ones?
 If specific: what is the reason in my case? Is this caused by crush
 configuration (cluster architecture, crush tunnables, ...), cluster
 size, architecture design mistakes, or something else?

 It seems to just be the particular structure of your map.  In your case
 you have a few different racks (or hosts? I forget) in the upper level up
 the hierarchy and then a handful of devices in the leaves that are marked
 out or reweighted down.  With that combination CRUSH runs out of placement
 choices at the upper level and keeps trying the same values in the lower
 level.  FWIW the tunable that fixes this was just merged today but won't
 appear in a release for another 3 weeks or so.

 Second question.
 Distribution PGs on OSDs is better for large clusters (where pg_num is
 higher). It is possible(for small clusters) to chagne crush
 distribution algorithm to more linear? (I realize that it will be less
 efficient).

 It really related to the ratio of pg_num to total OSDs, not the absolute
 number.  For small clusters it is probably more tolerable to have a larger
 pg_num count though because many of the costs normally associated with
 that (e.g., more peers) run up against the total host count before they
 start to matter.

 Again, I think the right answer here is picking a good pg to osd ratio and
 using reweight-by-utilization (which will be fixed soon).

 sage



 --
 Regards
 Dominik

 2014-02-06 21:31 GMT+01:00 Dominik Mostowiec dominikmostow...@gmail.com:
  Great!
  Thanks for Your help.
 
  --
  Regards
  Dominik
 
  2014-02-06 21:10 GMT+01:00 Sage Weil s...@inktank.com:
  On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
  Hi,
  Thanks !!
  Can You suggest any workaround for now?
 
  You can adjust the crush weights on the overfull nodes slightly.  You'd
  need to do it by hand, but that will do the trick.  For example,
 
ceph osd crush reweight osd.123 .96
 
  (if the current weight is 1.0).
 
  sage
 
 
  --
  Regards
  Dominik
 
 
  2014-02-06 18:39 GMT+01:00 Sage Weil s...@inktank.com:
   Hi,
  
   Just an update here.  Another user saw this and after playing with it I
   identified a problem with CRUSH.  There is a branch outstanding
   (wip-crush) that is pending review, but it's not a quick fix because of
   compatibility issues.
  
   sage
  
  
   On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
  
   Hi,
   Mabye this info can help to find what is wrong.
   For one PG (3.1e4a) which is active+remapped:
   { state: active+remapped,
 epoch: 96050,
 up: [
   119,
   69],
 acting: [
   119,
   69,
   7],
   Logs:
   On osd.7:
   2014-02-04 09:45:54.966913 7fa618afe700  1 osd.7 pg_epoch: 94460
   pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
   n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1
   lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY]
   stateStart: transitioning to Stray
   2014-02-04 09:45:55.781278 7fa6172fb700  1 osd.7 pg_epoch: 94461
   pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
   n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
   [119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod
   94459'207003 remapped NOTIFY] stateStart: transitioning to Stray
   2014-02-04 09:49:01.124510 7fa618afe700  1 osd.7 pg_epoch: 94495
   pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
   n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
   r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped]
   stateStart: transitioning to Stray
  
   On osd.119:
   2014-02-04 09:45:54.981707 7f37f07c5700  1 osd.119 pg_epoch: 94460
   pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
   n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0
   lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] stateStart:
   transitioning to Primary
   2014-02-04 09:45:55.805712 7f37ecfbe700  1 osd.119 pg_epoch: 94461
   pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
   n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
   [119,69]/[119,69,7,142] r=0 lpr=94461 pi=93485-94460/2 mlcod 0'0
   remapped] stateStart: transitioning to Primary
   2014-02-04 09:45:56.794015 7f37edfc0700  0 log [INF] : 3.1e4a
   restarting backfill on osd.69 from (0'0,0'0] MAX to 94459'207004
   2014-02-04 09:49:01.156627 7f37ef7c3700  1 osd.119 pg_epoch: 94495
   pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
   n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7

Re: [ceph-users] poor data distribution

2014-02-11 Thread Dominik Mostowiec
Hi,
If this problem (with stucked active+remapped pgs after
reweight-by-utilisation) affects all ceph configurations or only
specific ones?
If specific: what is the reason in my case? Is this caused by crush
configuration (cluster architecture, crush tunnables, ...), cluster
size, architecture design mistakes, or something else?

Second question.
Distribution PGs on OSDs is better for large clusters (where pg_num is
higher). It is possible(for small clusters) to chagne crush
distribution algorithm to more linear? (I realize that it will be less
efficient).

--
Regards
Dominik

2014-02-06 21:31 GMT+01:00 Dominik Mostowiec dominikmostow...@gmail.com:
 Great!
 Thanks for Your help.

 --
 Regards
 Dominik

 2014-02-06 21:10 GMT+01:00 Sage Weil s...@inktank.com:
 On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
 Hi,
 Thanks !!
 Can You suggest any workaround for now?

 You can adjust the crush weights on the overfull nodes slightly.  You'd
 need to do it by hand, but that will do the trick.  For example,

   ceph osd crush reweight osd.123 .96

 (if the current weight is 1.0).

 sage


 --
 Regards
 Dominik


 2014-02-06 18:39 GMT+01:00 Sage Weil s...@inktank.com:
  Hi,
 
  Just an update here.  Another user saw this and after playing with it I
  identified a problem with CRUSH.  There is a branch outstanding
  (wip-crush) that is pending review, but it's not a quick fix because of
  compatibility issues.
 
  sage
 
 
  On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
 
  Hi,
  Mabye this info can help to find what is wrong.
  For one PG (3.1e4a) which is active+remapped:
  { state: active+remapped,
epoch: 96050,
up: [
  119,
  69],
acting: [
  119,
  69,
  7],
  Logs:
  On osd.7:
  2014-02-04 09:45:54.966913 7fa618afe700  1 osd.7 pg_epoch: 94460
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1
  lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY]
  stateStart: transitioning to Stray
  2014-02-04 09:45:55.781278 7fa6172fb700  1 osd.7 pg_epoch: 94461
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
  [119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod
  94459'207003 remapped NOTIFY] stateStart: transitioning to Stray
  2014-02-04 09:49:01.124510 7fa618afe700  1 osd.7 pg_epoch: 94495
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
  n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
  r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped]
  stateStart: transitioning to Stray
 
  On osd.119:
  2014-02-04 09:45:54.981707 7f37f07c5700  1 osd.119 pg_epoch: 94460
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0
  lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] stateStart:
  transitioning to Primary
  2014-02-04 09:45:55.805712 7f37ecfbe700  1 osd.119 pg_epoch: 94461
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
  [119,69]/[119,69,7,142] r=0 lpr=94461 pi=93485-94460/2 mlcod 0'0
  remapped] stateStart: transitioning to Primary
  2014-02-04 09:45:56.794015 7f37edfc0700  0 log [INF] : 3.1e4a
  restarting backfill on osd.69 from (0'0,0'0] MAX to 94459'207004
  2014-02-04 09:49:01.156627 7f37ef7c3700  1 osd.119 pg_epoch: 94495
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
  n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
  r=0 lpr=94495 pi=94461-94494/1 mlcod 0'0 remapped] stateStart:
  transitioning to Primary
 
  On osd.69:
  2014-02-04 09:45:56.845695 7f2231372700  1 osd.69 pg_epoch: 94462
  pg[3.1e4a( empty local-les=0 n=0 ec=4 les/c 93486/93486
  94460/94461/92233) [119,69]/[119,69,7,142] r=1 lpr=94462
  pi=93485-94460/2 inactive] stateStart: transitioning to Stray
  2014-02-04 09:49:01.153695 7f2229b63700  1 osd.69 pg_epoch: 94495
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
  n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
  r=1 lpr=94495 pi=93485-94494/3 remapped] stateStart: transitioning
  to Stray
 
  pq query recovery state:
recovery_state: [
  { name: Started\/Primary\/Active,
enter_time: 2014-02-04 09:49:02.070724,
might_have_unfound: [],
recovery_progress: { backfill_target: -1,
waiting_on_backfill: 0,
backfill_pos: 0\/\/0\/\/-1,
backfill_info: { begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []},
peer_backfill_info: { begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []},
backfills_in_flight: [],
pull_from_peer: [],
pushing: []},
scrub

Re: [ceph-users] poor data distribution

2014-02-06 Thread Dominik Mostowiec
Hi,
Mabye this info can help to find what is wrong.
For one PG (3.1e4a) which is active+remapped:
{ state: active+remapped,
  epoch: 96050,
  up: [
119,
69],
  acting: [
119,
69,
7],
Logs:
On osd.7:
2014-02-04 09:45:54.966913 7fa618afe700  1 osd.7 pg_epoch: 94460
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1
lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY]
stateStart: transitioning to Stray
2014-02-04 09:45:55.781278 7fa6172fb700  1 osd.7 pg_epoch: 94461
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
[119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod
94459'207003 remapped NOTIFY] stateStart: transitioning to Stray
2014-02-04 09:49:01.124510 7fa618afe700  1 osd.7 pg_epoch: 94495
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped]
stateStart: transitioning to Stray

On osd.119:
2014-02-04 09:45:54.981707 7f37f07c5700  1 osd.119 pg_epoch: 94460
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0
lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] stateStart:
transitioning to Primary
2014-02-04 09:45:55.805712 7f37ecfbe700  1 osd.119 pg_epoch: 94461
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
[119,69]/[119,69,7,142] r=0 lpr=94461 pi=93485-94460/2 mlcod 0'0
remapped] stateStart: transitioning to Primary
2014-02-04 09:45:56.794015 7f37edfc0700  0 log [INF] : 3.1e4a
restarting backfill on osd.69 from (0'0,0'0] MAX to 94459'207004
2014-02-04 09:49:01.156627 7f37ef7c3700  1 osd.119 pg_epoch: 94495
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
r=0 lpr=94495 pi=94461-94494/1 mlcod 0'0 remapped] stateStart:
transitioning to Primary

On osd.69:
2014-02-04 09:45:56.845695 7f2231372700  1 osd.69 pg_epoch: 94462
pg[3.1e4a( empty local-les=0 n=0 ec=4 les/c 93486/93486
94460/94461/92233) [119,69]/[119,69,7,142] r=1 lpr=94462
pi=93485-94460/2 inactive] stateStart: transitioning to Stray
2014-02-04 09:49:01.153695 7f2229b63700  1 osd.69 pg_epoch: 94495
pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
r=1 lpr=94495 pi=93485-94494/3 remapped] stateStart: transitioning
to Stray

pq query recovery state:
  recovery_state: [
{ name: Started\/Primary\/Active,
  enter_time: 2014-02-04 09:49:02.070724,
  might_have_unfound: [],
  recovery_progress: { backfill_target: -1,
  waiting_on_backfill: 0,
  backfill_pos: 0\/\/0\/\/-1,
  backfill_info: { begin: 0\/\/0\/\/-1,
  end: 0\/\/0\/\/-1,
  objects: []},
  peer_backfill_info: { begin: 0\/\/0\/\/-1,
  end: 0\/\/0\/\/-1,
  objects: []},
  backfills_in_flight: [],
  pull_from_peer: [],
  pushing: []},
  scrub: { scrubber.epoch_start: 77502,
  scrubber.active: 0,
  scrubber.block_writes: 0,
  scrubber.finalizing: 0,
  scrubber.waiting_on: 0,
  scrubber.waiting_on_whom: []}},
{ name: Started,
  enter_time: 2014-02-04 09:49:01.156626}]}

---
Regards
Dominik

2014-02-04 12:09 GMT+01:00 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 Thanks for Your help !!
 We've done again 'ceph osd reweight-by-utilization 105'
 Cluster stack on 10387 active+clean, 237 active+remapped;
 More info in attachments.

 --
 Regards
 Dominik


 2014-02-04 Sage Weil s...@inktank.com:
 Hi,

 I spent a couple hours looking at your map because it did look like there
 was something wrong.  After some experimentation and adding a bucnh of
 improvements to osdmaptool to test the distribution, though, I think
 everything is working as expected.  For pool 3, your map has a standard
 deviation in utilizations of ~8%, and we should expect ~9% for this number
 of PGs.  For all pools, it is slightly higher (~9% vs expected ~8%).
 This is either just in the noise, or slightly confounded by the lack of
 the hashpspool flag on the pools (which slightly amplifies placement
 nonuniformity with multiple pools... not enough that it is worth changing
 anything though).

 The bad news is that that order of standard deviation results in pretty
 wide min/max range of 118 to 202 pgs.  That seems a *bit* higher than we a
 perfectly random placement generates (I'm seeing a spread in that is
 usually 50-70 pgs), but I think *that* is where the pool overlap (no
 hashpspool) is rearing its head

Re: [ceph-users] poor data distribution

2014-02-06 Thread Dominik Mostowiec
Hi,
Thanks !!
Can You suggest any workaround for now?

--
Regards
Dominik


2014-02-06 18:39 GMT+01:00 Sage Weil s...@inktank.com:
 Hi,

 Just an update here.  Another user saw this and after playing with it I
 identified a problem with CRUSH.  There is a branch outstanding
 (wip-crush) that is pending review, but it's not a quick fix because of
 compatibility issues.

 sage


 On Thu, 6 Feb 2014, Dominik Mostowiec wrote:

 Hi,
 Mabye this info can help to find what is wrong.
 For one PG (3.1e4a) which is active+remapped:
 { state: active+remapped,
   epoch: 96050,
   up: [
 119,
 69],
   acting: [
 119,
 69,
 7],
 Logs:
 On osd.7:
 2014-02-04 09:45:54.966913 7fa618afe700  1 osd.7 pg_epoch: 94460
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
 n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1
 lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY]
 stateStart: transitioning to Stray
 2014-02-04 09:45:55.781278 7fa6172fb700  1 osd.7 pg_epoch: 94461
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
 n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
 [119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod
 94459'207003 remapped NOTIFY] stateStart: transitioning to Stray
 2014-02-04 09:49:01.124510 7fa618afe700  1 osd.7 pg_epoch: 94495
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
 n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
 r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped]
 stateStart: transitioning to Stray

 On osd.119:
 2014-02-04 09:45:54.981707 7f37f07c5700  1 osd.119 pg_epoch: 94460
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
 n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0
 lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] stateStart:
 transitioning to Primary
 2014-02-04 09:45:55.805712 7f37ecfbe700  1 osd.119 pg_epoch: 94461
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
 n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
 [119,69]/[119,69,7,142] r=0 lpr=94461 pi=93485-94460/2 mlcod 0'0
 remapped] stateStart: transitioning to Primary
 2014-02-04 09:45:56.794015 7f37edfc0700  0 log [INF] : 3.1e4a
 restarting backfill on osd.69 from (0'0,0'0] MAX to 94459'207004
 2014-02-04 09:49:01.156627 7f37ef7c3700  1 osd.119 pg_epoch: 94495
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
 n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
 r=0 lpr=94495 pi=94461-94494/1 mlcod 0'0 remapped] stateStart:
 transitioning to Primary

 On osd.69:
 2014-02-04 09:45:56.845695 7f2231372700  1 osd.69 pg_epoch: 94462
 pg[3.1e4a( empty local-les=0 n=0 ec=4 les/c 93486/93486
 94460/94461/92233) [119,69]/[119,69,7,142] r=1 lpr=94462
 pi=93485-94460/2 inactive] stateStart: transitioning to Stray
 2014-02-04 09:49:01.153695 7f2229b63700  1 osd.69 pg_epoch: 94495
 pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
 n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
 r=1 lpr=94495 pi=93485-94494/3 remapped] stateStart: transitioning
 to Stray

 pq query recovery state:
   recovery_state: [
 { name: Started\/Primary\/Active,
   enter_time: 2014-02-04 09:49:02.070724,
   might_have_unfound: [],
   recovery_progress: { backfill_target: -1,
   waiting_on_backfill: 0,
   backfill_pos: 0\/\/0\/\/-1,
   backfill_info: { begin: 0\/\/0\/\/-1,
   end: 0\/\/0\/\/-1,
   objects: []},
   peer_backfill_info: { begin: 0\/\/0\/\/-1,
   end: 0\/\/0\/\/-1,
   objects: []},
   backfills_in_flight: [],
   pull_from_peer: [],
   pushing: []},
   scrub: { scrubber.epoch_start: 77502,
   scrubber.active: 0,
   scrubber.block_writes: 0,
   scrubber.finalizing: 0,
   scrubber.waiting_on: 0,
   scrubber.waiting_on_whom: []}},
 { name: Started,
   enter_time: 2014-02-04 09:49:01.156626}]}

 ---
 Regards
 Dominik

 2014-02-04 12:09 GMT+01:00 Dominik Mostowiec dominikmostow...@gmail.com:
  Hi,
  Thanks for Your help !!
  We've done again 'ceph osd reweight-by-utilization 105'
  Cluster stack on 10387 active+clean, 237 active+remapped;
  More info in attachments.
 
  --
  Regards
  Dominik
 
 
  2014-02-04 Sage Weil s...@inktank.com:
  Hi,
 
  I spent a couple hours looking at your map because it did look like there
  was something wrong.  After some experimentation and adding a bucnh of
  improvements to osdmaptool to test the distribution, though, I think
  everything is working as expected.  For pool 3, your map has a standard
  deviation in utilizations of ~8%, and we should expect ~9% for this number
  of PGs.  For all pools, it is slightly higher (~9% vs expected ~8%).
  This is either just

Re: [ceph-users] poor data distribution

2014-02-06 Thread Dominik Mostowiec
Great!
Thanks for Your help.

--
Regards
Dominik

2014-02-06 21:10 GMT+01:00 Sage Weil s...@inktank.com:
 On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
 Hi,
 Thanks !!
 Can You suggest any workaround for now?

 You can adjust the crush weights on the overfull nodes slightly.  You'd
 need to do it by hand, but that will do the trick.  For example,

   ceph osd crush reweight osd.123 .96

 (if the current weight is 1.0).

 sage


 --
 Regards
 Dominik


 2014-02-06 18:39 GMT+01:00 Sage Weil s...@inktank.com:
  Hi,
 
  Just an update here.  Another user saw this and after playing with it I
  identified a problem with CRUSH.  There is a branch outstanding
  (wip-crush) that is pending review, but it's not a quick fix because of
  compatibility issues.
 
  sage
 
 
  On Thu, 6 Feb 2014, Dominik Mostowiec wrote:
 
  Hi,
  Mabye this info can help to find what is wrong.
  For one PG (3.1e4a) which is active+remapped:
  { state: active+remapped,
epoch: 96050,
up: [
  119,
  69],
acting: [
  119,
  69,
  7],
  Logs:
  On osd.7:
  2014-02-04 09:45:54.966913 7fa618afe700  1 osd.7 pg_epoch: 94460
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=-1
  lpr=94460 pi=92546-94459/5 lcod 94459'207003 inactive NOTIFY]
  stateStart: transitioning to Stray
  2014-02-04 09:45:55.781278 7fa6172fb700  1 osd.7 pg_epoch: 94461
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
  [119,69]/[119,69,7,142] r=2 lpr=94461 pi=92546-94460/6 lcod
  94459'207003 remapped NOTIFY] stateStart: transitioning to Stray
  2014-02-04 09:49:01.124510 7fa618afe700  1 osd.7 pg_epoch: 94495
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
  n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
  r=2 lpr=94495 pi=92546-94494/7 lcod 94459'207003 remapped]
  stateStart: transitioning to Stray
 
  On osd.119:
  2014-02-04 09:45:54.981707 7f37f07c5700  1 osd.119 pg_epoch: 94460
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94460/92233) [119,69] r=0
  lpr=94460 pi=93485-94459/1 mlcod 0'0 inactive] stateStart:
  transitioning to Primary
  2014-02-04 09:45:55.805712 7f37ecfbe700  1 osd.119 pg_epoch: 94461
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=93486
  n=6718 ec=4 les/c 93486/93486 94460/94461/92233)
  [119,69]/[119,69,7,142] r=0 lpr=94461 pi=93485-94460/2 mlcod 0'0
  remapped] stateStart: transitioning to Primary
  2014-02-04 09:45:56.794015 7f37edfc0700  0 log [INF] : 3.1e4a
  restarting backfill on osd.69 from (0'0,0'0] MAX to 94459'207004
  2014-02-04 09:49:01.156627 7f37ef7c3700  1 osd.119 pg_epoch: 94495
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
  n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
  r=0 lpr=94495 pi=94461-94494/1 mlcod 0'0 remapped] stateStart:
  transitioning to Primary
 
  On osd.69:
  2014-02-04 09:45:56.845695 7f2231372700  1 osd.69 pg_epoch: 94462
  pg[3.1e4a( empty local-les=0 n=0 ec=4 les/c 93486/93486
  94460/94461/92233) [119,69]/[119,69,7,142] r=1 lpr=94462
  pi=93485-94460/2 inactive] stateStart: transitioning to Stray
  2014-02-04 09:49:01.153695 7f2229b63700  1 osd.69 pg_epoch: 94495
  pg[3.1e4a( v 94459'207004 (72275'204004,94459'207004] local-les=94462
  n=6718 ec=4 les/c 94462/94494 94460/94495/92233) [119,69]/[119,69,7]
  r=1 lpr=94495 pi=93485-94494/3 remapped] stateStart: transitioning
  to Stray
 
  pq query recovery state:
recovery_state: [
  { name: Started\/Primary\/Active,
enter_time: 2014-02-04 09:49:02.070724,
might_have_unfound: [],
recovery_progress: { backfill_target: -1,
waiting_on_backfill: 0,
backfill_pos: 0\/\/0\/\/-1,
backfill_info: { begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []},
peer_backfill_info: { begin: 0\/\/0\/\/-1,
end: 0\/\/0\/\/-1,
objects: []},
backfills_in_flight: [],
pull_from_peer: [],
pushing: []},
scrub: { scrubber.epoch_start: 77502,
scrubber.active: 0,
scrubber.block_writes: 0,
scrubber.finalizing: 0,
scrubber.waiting_on: 0,
scrubber.waiting_on_whom: []}},
  { name: Started,
enter_time: 2014-02-04 09:49:01.156626}]}
 
  ---
  Regards
  Dominik
 
  2014-02-04 12:09 GMT+01:00 Dominik Mostowiec dominikmostow...@gmail.com:
   Hi,
   Thanks for Your help !!
   We've done again 'ceph osd reweight-by-utilization 105'
   Cluster stack on 10387 active+clean, 237 active+remapped;
   More info in attachments.
  
   --
   Regards
   Dominik
  
  
   2014-02-04 Sage Weil

Re: [ceph-users] poor data distribution

2014-02-04 Thread Dominik Mostowiec
Hi,
Thanks for Your help !!
We've done again 'ceph osd reweight-by-utilization 105'
Cluster stack on 10387 active+clean, 237 active+remapped;
More info in attachments.

--
Regards
Dominik


2014-02-04 Sage Weil s...@inktank.com:
 Hi,

 I spent a couple hours looking at your map because it did look like there
 was something wrong.  After some experimentation and adding a bucnh of
 improvements to osdmaptool to test the distribution, though, I think
 everything is working as expected.  For pool 3, your map has a standard
 deviation in utilizations of ~8%, and we should expect ~9% for this number
 of PGs.  For all pools, it is slightly higher (~9% vs expected ~8%).
 This is either just in the noise, or slightly confounded by the lack of
 the hashpspool flag on the pools (which slightly amplifies placement
 nonuniformity with multiple pools... not enough that it is worth changing
 anything though).

 The bad news is that that order of standard deviation results in pretty
 wide min/max range of 118 to 202 pgs.  That seems a *bit* higher than we a
 perfectly random placement generates (I'm seeing a spread in that is
 usually 50-70 pgs), but I think *that* is where the pool overlap (no
 hashpspool) is rearing its head; for just pool three the spread of 50 is
 about what is expected.

 Long story short: you have two options.  One is increasing the number of
 PGs.  Note that this helps but has diminishing returns (doubling PGs
 only takes you from ~8% to ~6% standard deviation, quadrupling to ~4%).

 The other is to use reweight-by-utilization.  That is the best approach,
 IMO.  I'm not sure why you were seeing PGs stuck in the remapped state
 after you did that, though, but I'm happy to dig into that too.

 BTW, the osdmaptool addition I was using to play with is here:
 https://github.com/ceph/ceph/pull/1178

 sage


 On Mon, 3 Feb 2014, Dominik Mostowiec wrote:

 In other words,
 1. we've got 3 racks ( 1 replica per rack )
 2. in every rack we have 3 hosts
 3. every host has 22 OSD's
 4. all pg_num's are 2^n for every pool
 5. we enabled crush tunables optimal.
 6. on every machine we disabled 4 unused disk's (osd out, osd reweight
 0 and osd rm)

 Pool .rgw.buckets: one osd has 105 PGs and other one (on the same
 machine) has 144 PGs (37% more!).
 Other pools also have got this problem. It's not efficient placement.

 --
 Regards
 Dominik


 2014-02-02 Dominik Mostowiec dominikmostow...@gmail.com:
  Hi,
  For more info:
crush: http://dysk.onet.pl/link/r4wGK
osd_dump: http://dysk.onet.pl/link/I3YMZ
pg_dump: http://dysk.onet.pl/link/4jkqM
 
  --
  Regards
  Dominik
 
  2014-02-02 Dominik Mostowiec dominikmostow...@gmail.com:
  Hi,
  Hmm,
  You think about sumarize PGs from different pools on one OSD's i think.
  But for one pool (.rgw.buckets) where i have almost of all my data, PG
  count on OSDs is aslo different.
  For example 105 vs 144 PGs from pool .rgw.buckets. In first case it is
  52% disk usage, second 74%.
 
  --
  Regards
  Dominik
 
 
  2014-02-02 Sage Weil s...@inktank.com:
  It occurs to me that this (and other unexplain variance reports) could
  easily be the 'hashpspool' flag not being set.  The old behavior had the
  misfeature where consecutive pool's pg's would 'line up' on the same 
  osds,
  so that 1.7 == 2.6 == 3.5 == 4.4 etc would map to the same nodes.  This
  tends to 'amplify' any variance in the placement.  The default is still 
  to
  use the old behavior for compatibility (this will finally change in
  firefly).
 
  You can do
 
   ceph osd pool set poolname hashpspool true
 
  to enable the new placement logic on an existing pool, but be warned that
  this will rebalance *all* of the data in the pool, which can be a very
  heavyweight operation...
 
  sage
 
 
  On Sun, 2 Feb 2014, Dominik Mostowiec wrote:
 
  Hi,
  After scrubbing almost all PGs has equal(~) num of objects.
  I found something else.
  On one host PG coun on OSDs:
  OSD with small(52%) disk usage:
  count, pool
  105 3
   18 4
3 5
 
  Osd with larger(74%) disk usage:
  144 3
   31 4
2 5
 
  Pool 3 is .rgw.buckets (where is almost of all data).
  Pool 4 is .log, where is no data.
 
  Count of PGs shouldn't be the same per OSD ?
  Or maybe PG hash algorithm is disrupted by wrong count of PG for pool
  '4'. There is 1440 PGs ( this is not power of 2 ).
 
  ceph osd dump:
  pool 0 'data' rep size 3 min_size 1 crush_ruleset 0 object_hash
  rjenkins pg_num 64 pgp_num 64 last_change 28459 owner 0
  crash_replay_interval 45
  pool 1 'metadata' rep size 3 min_size 1 crush_ruleset 1 object_hash
  rjenkins pg_num 64 pgp_num 64 last_change 28460 owner 0
  pool 2 'rbd' rep size 3 min_size 1 crush_ruleset 2 object_hash
  rjenkins pg_num 64 pgp_num 64 last_change 28461 owner 0
  pool 3 '.rgw.buckets' rep size 3 min_size 1 crush_ruleset 0
  object_hash rjenkins pg_num 8192 pgp_num 8192 last_change 73711 owner
  0
  pool 4 '.log' rep size 3 min_size 1 crush_ruleset 0 object_hash

Re: [ceph-users] many meta files in osd

2014-01-28 Thread Dominik Mostowiec
Hi,
Thanks for Your response.
ceph -v
ceph version 0.67.5 (a60ac9194718083a4b6a225fc17cad6096c69bd1)

grep -i rgw /etc/ceph/ceph.conf  | grep -v socket
rgw_cache_enabled = true
rgw_cache_lru_size = 1
rgw_thread_pool_size = 2048
rgw op thread timeout = 6000
rgw print continue = false
rgw_enable_ops_log = false
debug rgw = 10
rgw dns name = ocdn.eu


On my test cluster (the same version, and symulation of this case)
command radosgw-admin gc process didn't help :-(

--
Regards
Dominik


2014-01-27 Gregory Farnum g...@inktank.com:
 Looks like you got lost over the Christmas holidays; sorry!
 I'm not an expert on running rgw but it sounds like garbage collection
 isn't running or something. What version are you on, and have you done
 anything to set it up?
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Sun, Jan 26, 2014 at 12:59 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 It is safe to remove this files
 rados -p .rgw ls | grep '.bucket.meta.my_deleted_bucket:'
 for deleted bucket via
 rados -p .rgw rm .bucket.meta.my_deleted_bucket:default.4576.1

 I have a problem with eaten inodes on disks where is many of such files.

 --
 Regards
 Dominik

 2013-12-10 Dominik Mostowiec dominikmostow...@gmail.com:
 Is there any posibility to remove this meta files? (whithout recreate 
 cluster)
 Files names:
 {path}.bucket.meta.test1:default.4110.{sequence number}__head_...

 --
 Regards
 Dominik

 2013/12/8 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 My api app to put files to s3/ceph checks if bucket exists by create
 this bucket.
 Each bucket create command adds 2 meta files.

 -
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 44
 root@vm-1:/vol0/ceph/osd# s3 -u create test1
 Bucket successfully created.
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 46
 -

 Unfortunately:
 -
 root@vm-1:/vol0/ceph/osd# s3 -u delete test1
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 46
 -

 Is there some way to remove this meta files from ceph?

 --
 Regards
 Dominik



 --
 Pozdrawiam
 Dominik



 --
 Pozdrawiam
 Dominik
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: many meta files in osd

2014-01-26 Thread Dominik Mostowiec
Hi,
It is safe to remove this files
 rados -p .rgw ls | grep '.bucket.meta.my_deleted_bucket:'
for deleted bucket via
rados -p .rgw rm .bucket.meta.my_deleted_bucket:default.4576.1

I have a problem with eaten inodes on disks where is many of such files.

--
Regards
Dominik

2013-12-10 Dominik Mostowiec dominikmostow...@gmail.com:
 Is there any posibility to remove this meta files? (whithout recreate cluster)
 Files names:
 {path}.bucket.meta.test1:default.4110.{sequence number}__head_...

 --
 Regards
 Dominik

 2013/12/8 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 My api app to put files to s3/ceph checks if bucket exists by create
 this bucket.
 Each bucket create command adds 2 meta files.

 -
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 44
 root@vm-1:/vol0/ceph/osd# s3 -u create test1
 Bucket successfully created.
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 46
 -

 Unfortunately:
 -
 root@vm-1:/vol0/ceph/osd# s3 -u delete test1
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 46
 -

 Is there some way to remove this meta files from ceph?

 --
 Regards
 Dominik



 --
 Pozdrawiam
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: many meta files in osd

2013-12-09 Thread Dominik Mostowiec
Is there any posibility to remove this meta files? (whithout recreate cluster)
Files names:
{path}.bucket.meta.test1:default.4110.{sequence number}__head_...

--
Regards
Dominik

2013/12/8 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 My api app to put files to s3/ceph checks if bucket exists by create
 this bucket.
 Each bucket create command adds 2 meta files.

 -
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 44
 root@vm-1:/vol0/ceph/osd# s3 -u create test1
 Bucket successfully created.
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 46
 -

 Unfortunately:
 -
 root@vm-1:/vol0/ceph/osd# s3 -u delete test1
 root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
 46
 -

 Is there some way to remove this meta files from ceph?

 --
 Regards
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


many meta files in osd

2013-12-08 Thread Dominik Mostowiec
Hi,
My api app to put files to s3/ceph checks if bucket exists by create
this bucket.
Each bucket create command adds 2 meta files.

-
root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
44
root@vm-1:/vol0/ceph/osd# s3 -u create test1
Bucket successfully created.
root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
46
-

Unfortunately:
-
root@vm-1:/vol0/ceph/osd# s3 -u delete test1
root@vm-1:/vol0/ceph/osd# find | grep meta | grep test1 | wc -l
46
-

Is there some way to remove this meta files from ceph?

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] recreate bucket error

2013-12-07 Thread Dominik Mostowiec
 Are you running on latest dumpling
Yes. It was installed, not upgraded from prev version.
This is new crated bucket.
I rebuilded cluster from scratch.

User add command radosgw-admin user create ... shows:
WARNING: cannot read region map

Create first time bucket 'test1' via 's3 -u craete test1' got error.
But in radosgw logs it seems ok:
---
2013-12-07 10:40:04.328277 7f7746ff5700  1 == starting new request
req=0x1f5c7b0 =
2013-12-07 10:40:04.328339 7f7746ff5700  2 req 2:0.63::PUT
/test1/::initializing
2013-12-07 10:40:04.328361 7f7746ff5700 10 meta HTTP_X_AMZ_DATE
2013-12-07 10:40:04.328368 7f7746ff5700 10 x x-amz-date:Sat, 07 Dec
2013 09:40:04 GMT
2013-12-07 10:40:04.328383 7f7746ff5700 10 s-object=NULL s-bucket=test1
2013-12-07 10:40:04.328390 7f7746ff5700  2 req 2:0.000114:s3:PUT
/test1/::getting op
2013-12-07 10:40:04.328394 7f7746ff5700  2 req 2:0.000118:s3:PUT
/test1/:create_bucket:authorizing
2013-12-07 10:40:04.334424 7f7746ff5700 10 get_canon_resource(): dest=
2013-12-07 10:40:04.334433 7f7746ff5700 10 auth_hdr:
PUT



x-amz-date:Sat, 07 Dec 2013 09:40:04 GMT
/test1/
2013-12-07 10:40:04.334476 7f7746ff5700  2 req 2:0.006200:s3:PUT
/test1/:create_bucket:reading permissions
2013-12-07 10:40:04.334481 7f7746ff5700  2 req 2:0.006205:s3:PUT
/test1/:create_bucket:verifying op mask
2013-12-07 10:40:04.334483 7f7746ff5700  2 req 2:0.006207:s3:PUT
/test1/:create_bucket:verifying op permissions
2013-12-07 10:40:04.335391 7f7746ff5700  2 req 2:0.007115:s3:PUT
/test1/:create_bucket:verifying op params
2013-12-07 10:40:04.335402 7f7746ff5700  2 req 2:0.007126:s3:PUT
/test1/:create_bucket:executing
2013-12-07 10:40:05.420459 7f7746ff5700  2 req 2:1.092182:s3:PUT
/test1/:create_bucket:http status=200
2013-12-07 10:40:05.420621 7f7746ff5700  1 == req done
req=0x1f5c7b0 http_status=200 ==
---


Second try:
---
2013-12-07 10:40:06.421876 7f778d95d780 10 allocated request req=0x1f5c710
2013-12-07 10:40:06.421936 7f77317ca700  1 == starting new request
req=0x1f5bc10 =
2013-12-07 10:40:06.422038 7f77317ca700  2 req 3:0.000104::PUT
/test1/::initializing
2013-12-07 10:40:06.422069 7f77317ca700 10 meta HTTP_X_AMZ_DATE
2013-12-07 10:40:06.422080 7f77317ca700 10 x x-amz-date:Sat, 07 Dec
2013 09:40:06 GMT
2013-12-07 10:40:06.422101 7f77317ca700 10 s-object=NULL s-bucket=test1
2013-12-07 10:40:06.422110 7f77317ca700  2 req 3:0.000176:s3:PUT
/test1/::getting op
2013-12-07 10:40:06.422117 7f77317ca700  2 req 3:0.000183:s3:PUT
/test1/:create_bucket:authorizing
2013-12-07 10:40:06.429576 7f77317ca700 10 get_canon_resource(): dest=
2013-12-07 10:40:06.429592 7f77317ca700 10 auth_hdr:
PUT



x-amz-date:Sat, 07 Dec 2013 09:40:06 GMT
/test1/
2013-12-07 10:40:06.429679 7f77317ca700  2 req 3:0.007745:s3:PUT
/test1/:create_bucket:reading permissions
2013-12-07 10:40:06.429690 7f77317ca700  2 req 3:0.007756:s3:PUT
/test1/:create_bucket:verifying op mask
2013-12-07 10:40:06.429693 7f77317ca700  2 req 3:0.007759:s3:PUT
/test1/:create_bucket:verifying op permissions
2013-12-07 10:40:06.430945 7f77317ca700  2 req 3:0.009010:s3:PUT
/test1/:create_bucket:verifying op params
2013-12-07 10:40:06.430964 7f77317ca700  2 req 3:0.009030:s3:PUT
/test1/:create_bucket:executing
2013-12-07 10:40:06.436301 7f77317ca700  0 WARNING: couldn't find acl
header for object, generating default
2013-12-07 10:40:06.451160 7f77317ca700  0 get_bucket_info returned -125
2013-12-07 10:40:06.451188 7f77317ca700  0 WARNING: set_req_state_err
err_no=125 resorting to 500
2013-12-07 10:40:06.451223 7f77317ca700  2 req 3:0.029289:s3:PUT
/test1/:create_bucket:http status=500
2013-12-07 10:40:06.451335 7f77317ca700  1 == req done
req=0x1f5bc10 http_status=500 ==
---

ceph -s
  cluster 92833861-954f-4c66-a72b-5a83090a2b3f
   health HEALTH_OK

ceph -v
ceph version 0.67.4 (ad85b8bfafea6232d64cb7ba76a8b6e8252fa0c7)

I have strange behavior after cluster create.
All PGs were on osd.0 and marked as stale degraded.
After add more osd's it didn't ifxed.
ceph pg force_create_pg helps.

After setup cluster I changed crushmap.
Removed osd.0 and added it again to cluster.

It can be a reason?


Regards
Dominik

2013/12/6 Yehuda Sadeh yeh...@inktank.com:
 I'm having trouble reproducing this one. Are you running on latest
 dumpling? Does it happen with any newly created bucket, or just with
 buckets that existed before?

 Yehuda

 On Fri, Dec 6, 2013 at 5:07 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 In version dumpling upgraded from bobtail working create the same bucket.

 root@vm-1:/etc/apache2/sites-enabled# s3 -u create testcreate
 Bucket successfully created.
 root@vm-1:/etc/apache2/sites-enabled# s3 -u create testcreate
 Bucket successfully created.

 I installed new dumpling cluster and:
 root@s1:/var/log/radosgw# s3 -u create test1
 Bucket successfully created.
 root@s1:/var/log/radosgw# s3 -u create test1

 ERROR: ErrorUnknown

 In radosgw logs:

 2013-12-06 13:59:56.083109 7f162d7c2700  1

Re: [ceph-users] recreate bucket error

2013-12-07 Thread Dominik Mostowiec
  1 -- 10.174.33.11:0/1270294
-- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:125 test1
[getxattrs,stat] 4.bddbf0b9 e192) v4 -- ?+0 0x7ff794008100 con
0xd8b210
2013-12-07 17:32:42.761759 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
== osd.0 10.174.33.11:6802/225929 61  osd_op_reply(125 test1
[getxattrs,stat] ondisk = 0) v4  146+0+91 (167695905 0 2385301879)
0x7ffbb4000a30 con 0xd8b210
2013-12-07 17:32:42.761908 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
-- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:126 test1 [call
version.check_conds,call version.read,read 0~524288] 4.bddbf0b9 e192)
v4 -- ?+0 0x7ff79400c6e0 con 0xd8b210
2013-12-07 17:32:42.762712 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
== osd.0 10.174.33.11:6802/225929 62  osd_op_reply(126 test1
[call,call,read 0~111] ondisk = 0) v4  188+0+159 (3524501340 0
3138215039) 0x7ffbb4000a30 con 0xd8b210
2013-12-07 17:32:42.762848 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
-- 10.174.33.13:6800/296091 -- osd_op(client.95391.0:127
.bucket.meta.test1:default.78189.1 [call version.check_conds,call
version.read,getxattrs,stat] 4.50558ec5 e192) v4 -- ?+0 0x7ff79400c6e0
con 0xd8f710
2013-12-07 17:32:42.764173 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
== osd.24 10.174.33.13:6800/296091 21  osd_op_reply(127
.bucket.meta.test1:default.78189.1 [call,call,getxattrs,stat] ondisk =
-125 (Operation canceled)) v4  259+0+0 (3728706219 0 0)
0x7ffbac002090 con 0xd8f710
2013-12-07 17:32:42.764338 7ff79b1c6700  0 get_bucket_info returned -125
2013-12-07 17:32:42.764368 7ff79b1c6700  0 WARNING: set_req_state_err
err_no=125 resorting to 500
2013-12-07 17:32:42.764431 7ff79b1c6700  2 req 1:0.027995:s3:PUT
/test1/:create_bucket:http status=500
2013-12-07 17:32:42.764561 7ff79b1c6700  1 == req done
req=0xe60860 http_status=500 ==
2013-12-07 17:32:42.794156 7ffbd1ffb700  2
RGWDataChangesLog::ChangesRenewThread: start

---


--
Regards
Dominik

2013/12/7 Yehuda Sadeh yeh...@inktank.com:
 Not sure what could be the reason. Can you turn set 'debug ms = 1',
 and 'debug rgw = 20'?

 Thanks,
 Yehuda

 On Sat, Dec 7, 2013 at 4:33 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Are you running on latest dumpling
 Yes. It was installed, not upgraded from prev version.
 This is new crated bucket.
 I rebuilded cluster from scratch.

 User add command radosgw-admin user create ... shows:
 WARNING: cannot read region map

 Create first time bucket 'test1' via 's3 -u craete test1' got error.
 But in radosgw logs it seems ok:
 ---
 2013-12-07 10:40:04.328277 7f7746ff5700  1 == starting new request
 req=0x1f5c7b0 =
 2013-12-07 10:40:04.328339 7f7746ff5700  2 req 2:0.63::PUT
 /test1/::initializing
 2013-12-07 10:40:04.328361 7f7746ff5700 10 meta HTTP_X_AMZ_DATE
 2013-12-07 10:40:04.328368 7f7746ff5700 10 x x-amz-date:Sat, 07 Dec
 2013 09:40:04 GMT
 2013-12-07 10:40:04.328383 7f7746ff5700 10 s-object=NULL s-bucket=test1
 2013-12-07 10:40:04.328390 7f7746ff5700  2 req 2:0.000114:s3:PUT
 /test1/::getting op
 2013-12-07 10:40:04.328394 7f7746ff5700  2 req 2:0.000118:s3:PUT
 /test1/:create_bucket:authorizing
 2013-12-07 10:40:04.334424 7f7746ff5700 10 get_canon_resource(): dest=
 2013-12-07 10:40:04.334433 7f7746ff5700 10 auth_hdr:
 PUT



 x-amz-date:Sat, 07 Dec 2013 09:40:04 GMT
 /test1/
 2013-12-07 10:40:04.334476 7f7746ff5700  2 req 2:0.006200:s3:PUT
 /test1/:create_bucket:reading permissions
 2013-12-07 10:40:04.334481 7f7746ff5700  2 req 2:0.006205:s3:PUT
 /test1/:create_bucket:verifying op mask
 2013-12-07 10:40:04.334483 7f7746ff5700  2 req 2:0.006207:s3:PUT
 /test1/:create_bucket:verifying op permissions
 2013-12-07 10:40:04.335391 7f7746ff5700  2 req 2:0.007115:s3:PUT
 /test1/:create_bucket:verifying op params
 2013-12-07 10:40:04.335402 7f7746ff5700  2 req 2:0.007126:s3:PUT
 /test1/:create_bucket:executing
 2013-12-07 10:40:05.420459 7f7746ff5700  2 req 2:1.092182:s3:PUT
 /test1/:create_bucket:http status=200
 2013-12-07 10:40:05.420621 7f7746ff5700  1 == req done
 req=0x1f5c7b0 http_status=200 ==
 ---


 Second try:
 ---
 2013-12-07 10:40:06.421876 7f778d95d780 10 allocated request req=0x1f5c710
 2013-12-07 10:40:06.421936 7f77317ca700  1 == starting new request
 req=0x1f5bc10 =
 2013-12-07 10:40:06.422038 7f77317ca700  2 req 3:0.000104::PUT
 /test1/::initializing
 2013-12-07 10:40:06.422069 7f77317ca700 10 meta HTTP_X_AMZ_DATE
 2013-12-07 10:40:06.422080 7f77317ca700 10 x x-amz-date:Sat, 07 Dec
 2013 09:40:06 GMT
 2013-12-07 10:40:06.422101 7f77317ca700 10 s-object=NULL s-bucket=test1
 2013-12-07 10:40:06.422110 7f77317ca700  2 req 3:0.000176:s3:PUT
 /test1/::getting op
 2013-12-07 10:40:06.422117 7f77317ca700  2 req 3:0.000183:s3:PUT
 /test1/:create_bucket:authorizing
 2013-12-07 10:40:06.429576 7f77317ca700 10 get_canon_resource(): dest=
 2013-12-07 10:40:06.429592 7f77317ca700 10 auth_hdr:
 PUT



 x-amz-date:Sat, 07 Dec 2013 09:40:06 GMT
 /test1/
 2013-12-07 10:40:06.429679 7f77317ca700  2 req

Re: [ceph-users] recreate bucket error

2013-12-07 Thread Dominik Mostowiec
Yes, it is disabled
grep 'cache' /etc/ceph/ceph.conf  | grep rgw
rgw_cache_enabled = false ;rgw cache enabled
rgw_cache_lru_size = 1 ;num of entries in rgw cache

--
Regards
Dominik

2013/12/7 Yehuda Sadeh yeh...@inktank.com:
 Did you disable the cache by any chance (e.g., 'rgw cache enabled = false')?



 On Sat, Dec 7, 2013 at 8:34 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 Log:
 -
 2013-12-07 17:32:42.736396 7ffbe36d3780 10 allocated request req=0xe66f40
 2013-12-07 17:32:42.736438 7ff79b1c6700  1 == starting new request
 req=0xe60860 =
 2013-12-07 17:32:42.736590 7ff79b1c6700  2 req 1:0.000153::PUT
 /test1/::initializing
 2013-12-07 17:32:42.736659 7ff79b1c6700 10 meta HTTP_X_AMZ_DATE
 2013-12-07 17:32:42.736686 7ff79b1c6700 10 x x-amz-date:Sat, 07 Dec
 2013 16:32:42 GMT
 2013-12-07 17:32:42.736758 7ff79b1c6700 10 s-object=NULL s-bucket=test1
 2013-12-07 17:32:42.736788 7ff79b1c6700  2 req 1:0.000351:s3:PUT
 /test1/::getting op
 2013-12-07 17:32:42.736800 7ff79b1c6700  2 req 1:0.000364:s3:PUT
 /test1/:create_bucket:authorizing
 2013-12-07 17:32:42.736907 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:107
 F7P3E755K6BN5LR3N85U [getxattrs,stat] 7.62531457 e192) v4 -- ?+0 0x7ff
 794005a90 con 0xd908d0
 2013-12-07 17:32:42.738159 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.1 10.174.33.11:6804/161220 23  osd_op_reply(107
 F7P3E755K6BN5LR3N85U [getxattrs,stat] ondisk = 0) v4  161+0+20 (
 1539386342 0 1044315090) 0x7ffba40019c0 con 0xd908d0
 2013-12-07 17:32:42.738341 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:108
 F7P3E755K6BN5LR3N85U [getxattrs,stat] 7.62531457 e192) v4 -- ?+0 0x7ff
 794005c20 con 0xd908d0
 2013-12-07 17:32:42.739230 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.1 10.174.33.11:6804/161220 24  osd_op_reply(108
 F7P3E755K6BN5LR3N85U [getxattrs,stat] ondisk = 0) v4  161+0+20 (
 1539386342 0 1044315090) 0x7ffba40019c0 con 0xd908d0
 2013-12-07 17:32:42.739349 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:109
 F7P3E755K6BN5LR3N85U [read 0~524288] 7.62531457 e192) v4 -- ?+0 0x7ff7
 94005a90 con 0xd908d0
 2013-12-07 17:32:42.740172 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.1 10.174.33.11:6804/161220 25  osd_op_reply(109
 F7P3E755K6BN5LR3N85U [read 0~5] ondisk = 0) v4  119+0+5 (9359852
 51 0 150087197) 0x7ffba40019c0 con 0xd908d0
 2013-12-07 17:32:42.740310 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:110 0
 [getxattrs,stat] 6.f18a3536 e192) v4 -- ?+0 0x7ff794006570 con 0xd8b
 210
 2013-12-07 17:32:42.741208 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 49  osd_op_reply(110 0
 [getxattrs,stat] ondisk = 0) v4  142+0+91 (2507304458 0 349659
 2801) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.741326 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:111 0
 [getxattrs,stat] 6.f18a3536 e192) v4 -- ?+0 0x7ff7940063a0 con 0xd8b
 210
 2013-12-07 17:32:42.742153 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 50  osd_op_reply(111 0
 [getxattrs,stat] ondisk = 0) v4  142+0+91 (2507304458 0 349659
 2801) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.742268 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:112 0 [read
 0~524288] 6.f18a3536 e192) v4 -- ?+0 0x7ff7940063a0 con 0xd8b2
 10
 2013-12-07 17:32:42.743159 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 51  osd_op_reply(112 0 [read
 0~254] ondisk = 0) v4  100+0+254 (3890176355 0 345783013
 2) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.743315 7ff79b1c6700 10 get_canon_resource(): dest=
 2013-12-07 17:32:42.743326 7ff79b1c6700 10 auth_hdr:
 PUT



 x-amz-date:Sat, 07 Dec 2013 16:32:42 GMT
 /test1/
 2013-12-07 17:32:42.743459 7ff79b1c6700  2 req 1:0.007023:s3:PUT
 /test1/:create_bucket:reading permissions
 2013-12-07 17:32:42.743475 7ff79b1c6700  2 req 1:0.007038:s3:PUT
 /test1/:create_bucket:verifying op mask
 2013-12-07 17:32:42.743479 7ff79b1c6700  2 req 1:0.007043:s3:PUT
 /test1/:create_bucket:verifying op permissions
 2013-12-07 17:32:42.743542 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:113 0.buckets
 [omap-get-vals 0~16] 6.f01626b7 e192) v4 -- ?+0 0x7ff7940066
 00 con 0xd8b210
 2013-12-07 17:32:42.744679 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 52  osd_op_reply(113 0.buckets
 [omap-get-vals 0~16] ondisk = 0) v4  108+0+288 (360462
 5994 0 2810901914) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.744782 7ff79b1c6700  2 req 1:0.008346:s3:PUT
 /test1/:create_bucket:verifying op params
 2013-12-07 17:32:42.744791 7ff79b1c6700  2 req 1

Re: [ceph-users] recreate bucket error

2013-12-07 Thread Dominik Mostowiec
ok, enabling cache helps :-)
What was wrong ?

--
Dominik

2013/12/7 Dominik Mostowiec dominikmostow...@gmail.com:
 Yes, it is disabled
 grep 'cache' /etc/ceph/ceph.conf  | grep rgw
 rgw_cache_enabled = false ;rgw cache enabled
 rgw_cache_lru_size = 1 ;num of entries in rgw cache

 --
 Regards
 Dominik

 2013/12/7 Yehuda Sadeh yeh...@inktank.com:
 Did you disable the cache by any chance (e.g., 'rgw cache enabled = false')?



 On Sat, Dec 7, 2013 at 8:34 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 Log:
 -
 2013-12-07 17:32:42.736396 7ffbe36d3780 10 allocated request req=0xe66f40
 2013-12-07 17:32:42.736438 7ff79b1c6700  1 == starting new request
 req=0xe60860 =
 2013-12-07 17:32:42.736590 7ff79b1c6700  2 req 1:0.000153::PUT
 /test1/::initializing
 2013-12-07 17:32:42.736659 7ff79b1c6700 10 meta HTTP_X_AMZ_DATE
 2013-12-07 17:32:42.736686 7ff79b1c6700 10 x x-amz-date:Sat, 07 Dec
 2013 16:32:42 GMT
 2013-12-07 17:32:42.736758 7ff79b1c6700 10 s-object=NULL s-bucket=test1
 2013-12-07 17:32:42.736788 7ff79b1c6700  2 req 1:0.000351:s3:PUT
 /test1/::getting op
 2013-12-07 17:32:42.736800 7ff79b1c6700  2 req 1:0.000364:s3:PUT
 /test1/:create_bucket:authorizing
 2013-12-07 17:32:42.736907 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:107
 F7P3E755K6BN5LR3N85U [getxattrs,stat] 7.62531457 e192) v4 -- ?+0 0x7ff
 794005a90 con 0xd908d0
 2013-12-07 17:32:42.738159 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.1 10.174.33.11:6804/161220 23  osd_op_reply(107
 F7P3E755K6BN5LR3N85U [getxattrs,stat] ondisk = 0) v4  161+0+20 (
 1539386342 0 1044315090) 0x7ffba40019c0 con 0xd908d0
 2013-12-07 17:32:42.738341 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:108
 F7P3E755K6BN5LR3N85U [getxattrs,stat] 7.62531457 e192) v4 -- ?+0 0x7ff
 794005c20 con 0xd908d0
 2013-12-07 17:32:42.739230 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.1 10.174.33.11:6804/161220 24  osd_op_reply(108
 F7P3E755K6BN5LR3N85U [getxattrs,stat] ondisk = 0) v4  161+0+20 (
 1539386342 0 1044315090) 0x7ffba40019c0 con 0xd908d0
 2013-12-07 17:32:42.739349 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:109
 F7P3E755K6BN5LR3N85U [read 0~524288] 7.62531457 e192) v4 -- ?+0 0x7ff7
 94005a90 con 0xd908d0
 2013-12-07 17:32:42.740172 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.1 10.174.33.11:6804/161220 25  osd_op_reply(109
 F7P3E755K6BN5LR3N85U [read 0~5] ondisk = 0) v4  119+0+5 (9359852
 51 0 150087197) 0x7ffba40019c0 con 0xd908d0
 2013-12-07 17:32:42.740310 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:110 0
 [getxattrs,stat] 6.f18a3536 e192) v4 -- ?+0 0x7ff794006570 con 0xd8b
 210
 2013-12-07 17:32:42.741208 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 49  osd_op_reply(110 0
 [getxattrs,stat] ondisk = 0) v4  142+0+91 (2507304458 0 349659
 2801) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.741326 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:111 0
 [getxattrs,stat] 6.f18a3536 e192) v4 -- ?+0 0x7ff7940063a0 con 0xd8b
 210
 2013-12-07 17:32:42.742153 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 50  osd_op_reply(111 0
 [getxattrs,stat] ondisk = 0) v4  142+0+91 (2507304458 0 349659
 2801) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.742268 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:112 0 [read
 0~524288] 6.f18a3536 e192) v4 -- ?+0 0x7ff7940063a0 con 0xd8b2
 10
 2013-12-07 17:32:42.743159 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 51  osd_op_reply(112 0 [read
 0~254] ondisk = 0) v4  100+0+254 (3890176355 0 345783013
 2) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.743315 7ff79b1c6700 10 get_canon_resource(): dest=
 2013-12-07 17:32:42.743326 7ff79b1c6700 10 auth_hdr:
 PUT



 x-amz-date:Sat, 07 Dec 2013 16:32:42 GMT
 /test1/
 2013-12-07 17:32:42.743459 7ff79b1c6700  2 req 1:0.007023:s3:PUT
 /test1/:create_bucket:reading permissions
 2013-12-07 17:32:42.743475 7ff79b1c6700  2 req 1:0.007038:s3:PUT
 /test1/:create_bucket:verifying op mask
 2013-12-07 17:32:42.743479 7ff79b1c6700  2 req 1:0.007043:s3:PUT
 /test1/:create_bucket:verifying op permissions
 2013-12-07 17:32:42.743542 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
 -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:113 0.buckets
 [omap-get-vals 0~16] 6.f01626b7 e192) v4 -- ?+0 0x7ff7940066
 00 con 0xd8b210
 2013-12-07 17:32:42.744679 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
 == osd.0 10.174.33.11:6802/225929 52  osd_op_reply(113 0.buckets
 [omap-get-vals 0~16] ondisk = 0) v4  108+0+288 (360462
 5994 0 2810901914) 0x7ffbb4000a60 con 0xd8b210
 2013-12-07 17:32:42.744782

Re: [ceph-users] recreate bucket error

2013-12-07 Thread Dominik Mostowiec
Thanks for Your help !!

---
Regards
Dominik

On Dec 7, 2013 6:34 PM, Yehuda Sadeh yeh...@inktank.com wrote:

 Sounds like disabling the cache triggers some bug. I'll open a relevant 
 ticket.

 Thanks,
 Yehuda

 On Sat, Dec 7, 2013 at 9:29 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
  ok, enabling cache helps :-)
  What was wrong ?
 
  --
  Dominik
 
  2013/12/7 Dominik Mostowiec dominikmostow...@gmail.com:
  Yes, it is disabled
  grep 'cache' /etc/ceph/ceph.conf  | grep rgw
  rgw_cache_enabled = false ;rgw cache enabled
  rgw_cache_lru_size = 1 ;num of entries in rgw cache
 
  --
  Regards
  Dominik
 
  2013/12/7 Yehuda Sadeh yeh...@inktank.com:
  Did you disable the cache by any chance (e.g., 'rgw cache enabled = 
  false')?
 
 
 
  On Sat, Dec 7, 2013 at 8:34 AM, Dominik Mostowiec
  dominikmostow...@gmail.com wrote:
  Hi,
  Log:
  -
  2013-12-07 17:32:42.736396 7ffbe36d3780 10 allocated request req=0xe66f40
  2013-12-07 17:32:42.736438 7ff79b1c6700  1 == starting new request
  req=0xe60860 =
  2013-12-07 17:32:42.736590 7ff79b1c6700  2 req 1:0.000153::PUT
  /test1/::initializing
  2013-12-07 17:32:42.736659 7ff79b1c6700 10 meta HTTP_X_AMZ_DATE
  2013-12-07 17:32:42.736686 7ff79b1c6700 10 x x-amz-date:Sat, 07 Dec
  2013 16:32:42 GMT
  2013-12-07 17:32:42.736758 7ff79b1c6700 10 s-object=NULL 
  s-bucket=test1
  2013-12-07 17:32:42.736788 7ff79b1c6700  2 req 1:0.000351:s3:PUT
  /test1/::getting op
  2013-12-07 17:32:42.736800 7ff79b1c6700  2 req 1:0.000364:s3:PUT
  /test1/:create_bucket:authorizing
  2013-12-07 17:32:42.736907 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:107
  F7P3E755K6BN5LR3N85U [getxattrs,stat] 7.62531457 e192) v4 -- ?+0 0x7ff
  794005a90 con 0xd908d0
  2013-12-07 17:32:42.738159 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
  == osd.1 10.174.33.11:6804/161220 23  osd_op_reply(107
  F7P3E755K6BN5LR3N85U [getxattrs,stat] ondisk = 0) v4  161+0+20 (
  1539386342 0 1044315090) 0x7ffba40019c0 con 0xd908d0
  2013-12-07 17:32:42.738341 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:108
  F7P3E755K6BN5LR3N85U [getxattrs,stat] 7.62531457 e192) v4 -- ?+0 0x7ff
  794005c20 con 0xd908d0
  2013-12-07 17:32:42.739230 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
  == osd.1 10.174.33.11:6804/161220 24  osd_op_reply(108
  F7P3E755K6BN5LR3N85U [getxattrs,stat] ondisk = 0) v4  161+0+20 (
  1539386342 0 1044315090) 0x7ffba40019c0 con 0xd908d0
  2013-12-07 17:32:42.739349 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11:6804/161220 -- osd_op(client.95391.0:109
  F7P3E755K6BN5LR3N85U [read 0~524288] 7.62531457 e192) v4 -- ?+0 0x7ff7
  94005a90 con 0xd908d0
  2013-12-07 17:32:42.740172 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
  == osd.1 10.174.33.11:6804/161220 25  osd_op_reply(109
  F7P3E755K6BN5LR3N85U [read 0~5] ondisk = 0) v4  119+0+5 (9359852
  51 0 150087197) 0x7ffba40019c0 con 0xd908d0
  2013-12-07 17:32:42.740310 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:110 0
  [getxattrs,stat] 6.f18a3536 e192) v4 -- ?+0 0x7ff794006570 con 0xd8b
  210
  2013-12-07 17:32:42.741208 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
  == osd.0 10.174.33.11:6802/225929 49  osd_op_reply(110 0
  [getxattrs,stat] ondisk = 0) v4  142+0+91 (2507304458 0 349659
  2801) 0x7ffbb4000a60 con 0xd8b210
  2013-12-07 17:32:42.741326 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:111 0
  [getxattrs,stat] 6.f18a3536 e192) v4 -- ?+0 0x7ff7940063a0 con 0xd8b
  210
  2013-12-07 17:32:42.742153 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
  == osd.0 10.174.33.11:6802/225929 50  osd_op_reply(111 0
  [getxattrs,stat] ondisk = 0) v4  142+0+91 (2507304458 0 349659
  2801) 0x7ffbb4000a60 con 0xd8b210
  2013-12-07 17:32:42.742268 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11:6802/225929 -- osd_op(client.95391.0:112 0 [read
  0~524288] 6.f18a3536 e192) v4 -- ?+0 0x7ff7940063a0 con 0xd8b2
  10
  2013-12-07 17:32:42.743159 7ffbd96ec700  1 -- 10.174.33.11:0/1270294
  == osd.0 10.174.33.11:6802/225929 51  osd_op_reply(112 0 [read
  0~254] ondisk = 0) v4  100+0+254 (3890176355 0 345783013
  2) 0x7ffbb4000a60 con 0xd8b210
  2013-12-07 17:32:42.743315 7ff79b1c6700 10 get_canon_resource(): dest=
  2013-12-07 17:32:42.743326 7ff79b1c6700 10 auth_hdr:
  PUT
 
 
 
  x-amz-date:Sat, 07 Dec 2013 16:32:42 GMT
  /test1/
  2013-12-07 17:32:42.743459 7ff79b1c6700  2 req 1:0.007023:s3:PUT
  /test1/:create_bucket:reading permissions
  2013-12-07 17:32:42.743475 7ff79b1c6700  2 req 1:0.007038:s3:PUT
  /test1/:create_bucket:verifying op mask
  2013-12-07 17:32:42.743479 7ff79b1c6700  2 req 1:0.007043:s3:PUT
  /test1/:create_bucket:verifying op permissions
  2013-12-07 17:32:42.743542 7ff79b1c6700  1 -- 10.174.33.11:0/1270294
  -- 10.174.33.11

recreate bucket error

2013-12-06 Thread Dominik Mostowiec
Hi,
In version dumpling upgraded from bobtail working create the same bucket.

root@vm-1:/etc/apache2/sites-enabled# s3 -u create testcreate
Bucket successfully created.
root@vm-1:/etc/apache2/sites-enabled# s3 -u create testcreate
Bucket successfully created.

I installed new dumpling cluster and:
root@s1:/var/log/radosgw# s3 -u create test1
Bucket successfully created.
root@s1:/var/log/radosgw# s3 -u create test1

ERROR: ErrorUnknown

In radosgw logs:

2013-12-06 13:59:56.083109 7f162d7c2700  1 == starting new request
req=0xb7d480 =
2013-12-06 13:59:56.083227 7f162d7c2700  2 req 5:0.000119::PUT
/test1/::initializing
2013-12-06 13:59:56.083261 7f162d7c2700 10 meta HTTP_X_AMZ_DATE
2013-12-06 13:59:56.083274 7f162d7c2700 10 x x-amz-date:Fri, 06 Dec
2013 12:59:56 GMT
2013-12-06 13:59:56.083298 7f162d7c2700 10 s-object=NULL s-bucket=test1
2013-12-06 13:59:56.083307 7f162d7c2700  2 req 5:0.000199:s3:PUT
/test1/::getting op
2013-12-06 13:59:56.083315 7f162d7c2700  2 req 5:0.000207:s3:PUT
/test1/:create_bucket:authorizing
2013-12-06 13:59:56.091724 7f162d7c2700 10 get_canon_resource(): dest=
2013-12-06 13:59:56.091742 7f162d7c2700 10 auth_hdr:
PUT



x-amz-date:Fri, 06 Dec 2013 12:59:56 GMT
/test1/
2013-12-06 13:59:56.091836 7f162d7c2700  2 req 5:0.008728:s3:PUT
/test1/:create_bucket:reading permissions
2013-12-06 13:59:56.091848 7f162d7c2700  2 req 5:0.008740:s3:PUT
/test1/:create_bucket:verifying op mask
2013-12-06 13:59:56.091852 7f162d7c2700  2 req 5:0.008744:s3:PUT
/test1/:create_bucket:verifying op permissions
2013-12-06 13:59:56.093858 7f162d7c2700  2 req 5:0.010750:s3:PUT
/test1/:create_bucket:verifying op params
2013-12-06 13:59:56.093882 7f162d7c2700  2 req 5:0.010773:s3:PUT
/test1/:create_bucket:executing
2013-12-06 13:59:56.104819 7f162d7c2700  0 WARNING: couldn't find acl
header for object, generating default
2013-12-06 13:59:56.132625 7f162d7c2700  0 get_bucket_info returned -125
2013-12-06 13:59:56.132656 7f162d7c2700  0 WARNING: set_req_state_err
err_no=125 resorting to 500
2013-12-06 13:59:56.132693 7f162d7c2700  2 req 5:0.049584:s3:PUT
/test1/:create_bucket:http status=500
2013-12-06 13:59:56.132890 7f162d7c2700  1 == req done
req=0xb7d480 http_status=500 ==

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw Segmentation fault on obj copy

2013-12-03 Thread Dominik Mostowiec
Thanks.

--
Regards
Dominik

2013/12/3 Yehuda Sadeh yeh...@inktank.com:
 For bobtail at this point yes. You can try the unofficial version with
 that fix off the gitbuilder. Another option is to upgrade everything
 to dumpling.

 Yehuda

 On Mon, Dec 2, 2013 at 10:24 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Thanks
 Workaround, don't use multipart when obj size == 0 ?

 On Dec 3, 2013 6:43 AM, Yehuda Sadeh yeh...@inktank.com wrote:

 I created earlier an issue (6919) and updated it with the relevant
 issue. This has been fixed in dumpling, although I don't remember
 hitting the scenario that you did. Was probably hitting it as part of
 the development work that was done then.
 In any case I created a branch with the relevant fixes in it (wip-6919).

 Thanks,
 Yehuda

 On Mon, Dec 2, 2013 at 8:39 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
  for another object.
  http://pastebin.com/VkVAYgwn
 
 
  2013/12/3 Yehuda Sadeh yeh...@inktank.com:
  I see. Do you have backtrace for the crash?
 
  On Mon, Dec 2, 2013 at 6:19 PM, Dominik Mostowiec
  dominikmostow...@gmail.com wrote:
  0.56.7
 
  W dniu poniedziałek, 2 grudnia 2013 użytkownik Yehuda Sadeh napisał:
 
  I'm having trouble reproducing the issue. What version are you using?
 
  Thanks,
  Yehuda
 
  On Mon, Dec 2, 2013 at 2:16 PM, Yehuda Sadeh yeh...@inktank.com
  wrote:
   Actually, I read that differently. It only says that if there's
   more
   than 1 part, all parts except for the last one need to be  5M.
   Which
   means that for uploads that are smaller than 5M there should be
   zero
   or one parts.
  
   On Mon, Dec 2, 2013 at 12:54 PM, Dominik Mostowiec
   dominikmostow...@gmail.com wrote:
   You're right.
  
   S3 api doc:
  
   http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html
   Err:EntityTooSmall
   Your proposed upload is smaller than the minimum allowed object
   size.
   Each part must be at least 5 MB in size, except the last part.
  
   Thanks.
  
   This error should be triggered from radosgw also.
  
   --
   Regards
   Dominik
  
   2013/12/2 Yehuda Sadeh yeh...@inktank.com:
   Looks like it. There should be a guard against it (mulitpart
   upload
   minimum is 5M).
  
   On Mon, Dec 2, 2013 at 12:32 PM, Dominik Mostowiec
   dominikmostow...@gmail.com wrote:
   Yes, this is probably upload empty file.
   This is the problem?
  
   --
   Regards
   Dominik
  
  
   2013/12/2 Yehuda Sadeh yeh...@inktank.com:
   By any chance are you uploading empty objects through the
   multipart
   upload api?
  
   On Mon, Dec 2, 2013 at 12:08 PM, Dominik Mostowiec
   dominikmostow...@gmail.com wrote:
   Hi,
   Another file with the same problems:
  
   2013-12-01 11:37:15.556687 7f7891fd3700  1 == starting new
   request
   req=0x25406d0 =
   2013-12-01 11:37:15.556739 7f7891fd3700  2 req
   1314:0.52initializing
   2013-12-01 11:37:15.556789 7f7891fd3700 10
   s-object=files/192.txt
   s-bucket=testbucket
   2013-12-01 11:37:15.556799 7f7891fd3700  2 req
   1314:0.000112:s3:POST
   /testbucket/files/192.txt::getting op
   2013-12-01 11:37:15.556804 7f7891fd3700  2 req
   1314:0.000118:s3:POST
   /testbucket/files/192.txt:complete_multipart:authorizing
   2013-12-01 11:37:15.560013 7f7891fd3700 10
   get_canon_resource():
  
  
   dest=/testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
   2013-12-01 11:37:15.560027 7f7891fd3700 10 auth_hdr:
   POST
  
   application/xml
   Sun, 01 Dec 2013 10:37:10 GMT
  
   /testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
   2013-12-01 11:37:15.560085 7f7891fd3700  2 req
   1314:0.003399:s3:POST
   /testbucket/files/192.txt:complete_multipart:reading
   permissions
   2013-12-01 11:37:15.562356 7f7891fd3700  2 req
   1314:0.005670:s3:POST
   /testbucket/files/192.txt:complete_multipart:verifying op
   permissions
   2013-12-01 11:37:15.562373 7f7891fd3700  5 Searching
   permissions
   for
   uid=0 mask=2
   2013-12-01 11:37:15.562377 7f7891fd3700  5 Found permission:
   15
   2013-12-01 11:37:15.562378 7f7891fd3700 10  uid=0 requested
   perm
   (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
   2013-12-01 11:37:15.562381 7f7891fd3700  2 req
   1314:0.005695:s3:POST
   /testbucket/files/192.txt:complete_multipart:verifying op
   params
   2013-12-01 11:37:15.562384 7f7891fd3700  2 req
   1314:0.005698:s3:POST
   /testbucket/files/192.txt:complete_multipart:executing
   2013-12-01 11:37:15.565461 7f7891fd3700 10 calculated etag:
   d41d8cd98f00b204e9800998ecf8427e-0
   2013-12-01 11:37:15.566718 7f7891fd3700 10 can't clone object
   testbucket:files/192.txt to shadow object, tag/shadow_obj
   haven't
   been
   set
   2013-12-01 11:37:15.566777 7f7891fd3700  0 setting object
   tag=_leyAzxCw7YxpKv8P3v3QGwcsw__9VmP
   2013-12-01 11:37:15.678973 7f7891fd3700  2 req
   1314:0.122286:s3:POST
   /testbucket/files/192.txt:complete_multipart:http status=200
   2013-12-01 11:37:15.679192 7f7891fd3700  1 == req done

radosgw Segmentation fault on obj copy

2013-12-02 Thread Dominik Mostowiec
Hi,
I have strange problem.
Obj copy (0 size) killing radosgw.

Head for this file:
Content-Type: application/octet-stream
Server: Apache/2.2.22 (Ubuntu)
ETag: d41d8cd98f00b204e9800998ecf8427e-0
Last-Modified: 2013-12-01T10:37:15Z

rgw log.
2013-12-02 08:18:59.196651 7f5308ff1700  1 == starting new request
req=0x2be6fa0 =
2013-12-02 08:18:59.196709 7f5308ff1700  2 req 237:0.58initializing
2013-12-02 08:18:59.196752 7f5308ff1700 10 meta HTTP_X_AMZ_ACL=public-read
2013-12-02 08:18:59.196760 7f5308ff1700 10 meta
HTTP_X_AMZ_COPY_SOURCE=/testbucket/testfile.xml
2013-12-02 08:18:59.196766 7f5308ff1700 10 meta
HTTP_X_AMZ_METADATA_DIRECTIVE=COPY
2013-12-02 08:18:59.196771 7f5308ff1700 10 x x-amz-acl:public-read
2013-12-02 08:18:59.196772 7f5308ff1700 10 x
x-amz-copy-source:/testbucket/testfile.xml
2013-12-02 08:18:59.196773 7f5308ff1700 10 x x-amz-metadata-directive:COPY
2013-12-02 08:18:59.196786 7f5308ff1700 10
s-object=/testbucket/new_testfile.ini s-bucket=testbucket
2013-12-02 08:18:59.196792 7f5308ff1700  2 req 237:0.000141:s3:PUT
/testbucket/new_testfile.ini::getting op
2013-12-02 08:18:59.196797 7f5308ff1700  2 req 237:0.000146:s3:PUT
/testbucket/new_testfile.ini:copy_obj:authorizing
2013-12-02 08:18:59.200648 7f5308ff1700 10 get_canon_resource():
dest=/testbucket/new_testfile.ini
2013-12-02 08:18:59.200661 7f5308ff1700 10 auth_hdr:
PUT
1B2M2Y8AsgTpgAmY7PhCfg==
application/octet-stream
Mon, 02 Dec 2013 07:18:55 GMT
x-amz-acl:public-read
x-amz-copy-source:/testbucket/testfile.xml
x-amz-metadata-directive:COPY
/testbucket/new_testfile.ini
2013-12-02 08:18:59.200717 7f5308ff1700  2 req 237:0.004066:s3:PUT
/testbucket/new_testfile.ini:copy_obj:reading permissions
2013-12-02 08:18:59.203330 7f5308ff1700  2 req 237:0.006679:s3:PUT
/testbucket/new_testfile.ini:copy_obj:verifying op permissions
2013-12-02 08:18:59.207627 7f5308ff1700 10 manifest: total_size = 0
2013-12-02 08:18:59.207649 7f5308ff1700  5 Searching permissions for
uid=0 mask=1
2013-12-02 08:18:59.207652 7f5308ff1700  5 Found permission: 15
2013-12-02 08:18:59.207654 7f5308ff1700 10  uid=0 requested perm
(type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2013-12-02 08:18:59.207669 7f5308ff1700  5 Searching permissions for
uid=0 mask=2
2013-12-02 08:18:59.207670 7f5308ff1700  5 Found permission: 15
2013-12-02 08:18:59.207671 7f5308ff1700 10  uid=0 requested perm
(type)=2, policy perm=2, user_perm_mask=15, acl perm=2
2013-12-02 08:18:59.207681 7f5308ff1700  2 req 237:0.011030:s3:PUT
/testbucket/new_testfile.ini:copy_obj:verifying op params
2013-12-02 08:18:59.207686 7f5308ff1700  2 req 237:0.011035:s3:PUT
/testbucket/new_testfile.ini:copy_obj:executing
2013-12-02 08:18:59.207699 7f5308ff1700 10 x x-amz-acl:public-read
2013-12-02 08:18:59.207704 7f5308ff1700 10 x
x-amz-copy-source:/testbucket/testfile.xml
2013-12-02 08:18:59.207709 7f5308ff1700 10 x x-amz-metadata-directive:COPY
2013-12-02 08:18:59.207759 7f5308ff1700  5 Copy object
testbucket(@.rgw.buckets[406250.1]):testfile.ini =
testbucket(@.rgw.buckets[406250.1]):new_testfile.ini
2013-12-02 08:18:59.208903 7f5308ff1700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7f5308ff1700


-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw Segmentation fault on obj copy

2013-12-02 Thread Dominik Mostowiec
Hi,
I found that issue is related with ETag: -0 (ends -0)
This is known bug ?

--
Regards
Dominik

2013/12/2 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 I have strange problem.
 Obj copy (0 size) killing radosgw.

 Head for this file:
 Content-Type: application/octet-stream
 Server: Apache/2.2.22 (Ubuntu)
 ETag: d41d8cd98f00b204e9800998ecf8427e-0
 Last-Modified: 2013-12-01T10:37:15Z

 rgw log.
 2013-12-02 08:18:59.196651 7f5308ff1700  1 == starting new request
 req=0x2be6fa0 =
 2013-12-02 08:18:59.196709 7f5308ff1700  2 req 237:0.58initializing
 2013-12-02 08:18:59.196752 7f5308ff1700 10 meta HTTP_X_AMZ_ACL=public-read
 2013-12-02 08:18:59.196760 7f5308ff1700 10 meta
 HTTP_X_AMZ_COPY_SOURCE=/testbucket/testfile.xml
 2013-12-02 08:18:59.196766 7f5308ff1700 10 meta
 HTTP_X_AMZ_METADATA_DIRECTIVE=COPY
 2013-12-02 08:18:59.196771 7f5308ff1700 10 x x-amz-acl:public-read
 2013-12-02 08:18:59.196772 7f5308ff1700 10 x
 x-amz-copy-source:/testbucket/testfile.xml
 2013-12-02 08:18:59.196773 7f5308ff1700 10 x x-amz-metadata-directive:COPY
 2013-12-02 08:18:59.196786 7f5308ff1700 10
 s-object=/testbucket/new_testfile.ini s-bucket=testbucket
 2013-12-02 08:18:59.196792 7f5308ff1700  2 req 237:0.000141:s3:PUT
 /testbucket/new_testfile.ini::getting op
 2013-12-02 08:18:59.196797 7f5308ff1700  2 req 237:0.000146:s3:PUT
 /testbucket/new_testfile.ini:copy_obj:authorizing
 2013-12-02 08:18:59.200648 7f5308ff1700 10 get_canon_resource():
 dest=/testbucket/new_testfile.ini
 2013-12-02 08:18:59.200661 7f5308ff1700 10 auth_hdr:
 PUT
 1B2M2Y8AsgTpgAmY7PhCfg==
 application/octet-stream
 Mon, 02 Dec 2013 07:18:55 GMT
 x-amz-acl:public-read
 x-amz-copy-source:/testbucket/testfile.xml
 x-amz-metadata-directive:COPY
 /testbucket/new_testfile.ini
 2013-12-02 08:18:59.200717 7f5308ff1700  2 req 237:0.004066:s3:PUT
 /testbucket/new_testfile.ini:copy_obj:reading permissions
 2013-12-02 08:18:59.203330 7f5308ff1700  2 req 237:0.006679:s3:PUT
 /testbucket/new_testfile.ini:copy_obj:verifying op permissions
 2013-12-02 08:18:59.207627 7f5308ff1700 10 manifest: total_size = 0
 2013-12-02 08:18:59.207649 7f5308ff1700  5 Searching permissions for
 uid=0 mask=1
 2013-12-02 08:18:59.207652 7f5308ff1700  5 Found permission: 15
 2013-12-02 08:18:59.207654 7f5308ff1700 10  uid=0 requested perm
 (type)=1, policy perm=1, user_perm_mask=15, acl perm=1
 2013-12-02 08:18:59.207669 7f5308ff1700  5 Searching permissions for
 uid=0 mask=2
 2013-12-02 08:18:59.207670 7f5308ff1700  5 Found permission: 15
 2013-12-02 08:18:59.207671 7f5308ff1700 10  uid=0 requested perm
 (type)=2, policy perm=2, user_perm_mask=15, acl perm=2
 2013-12-02 08:18:59.207681 7f5308ff1700  2 req 237:0.011030:s3:PUT
 /testbucket/new_testfile.ini:copy_obj:verifying op params
 2013-12-02 08:18:59.207686 7f5308ff1700  2 req 237:0.011035:s3:PUT
 /testbucket/new_testfile.ini:copy_obj:executing
 2013-12-02 08:18:59.207699 7f5308ff1700 10 x x-amz-acl:public-read
 2013-12-02 08:18:59.207704 7f5308ff1700 10 x
 x-amz-copy-source:/testbucket/testfile.xml
 2013-12-02 08:18:59.207709 7f5308ff1700 10 x x-amz-metadata-directive:COPY
 2013-12-02 08:18:59.207759 7f5308ff1700  5 Copy object
 testbucket(@.rgw.buckets[406250.1]):testfile.ini =
 testbucket(@.rgw.buckets[406250.1]):new_testfile.ini
 2013-12-02 08:18:59.208903 7f5308ff1700 -1 *** Caught signal
 (Segmentation fault) **
  in thread 7f5308ff1700


 --
 Regards
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] radosgw Segmentation fault on obj copy

2013-12-02 Thread Dominik Mostowiec
Hi,
Another file with the same problems:

2013-12-01 11:37:15.556687 7f7891fd3700  1 == starting new request
req=0x25406d0 =
2013-12-01 11:37:15.556739 7f7891fd3700  2 req 1314:0.52initializing
2013-12-01 11:37:15.556789 7f7891fd3700 10 s-object=files/192.txt
s-bucket=testbucket
2013-12-01 11:37:15.556799 7f7891fd3700  2 req 1314:0.000112:s3:POST
/testbucket/files/192.txt::getting op
2013-12-01 11:37:15.556804 7f7891fd3700  2 req 1314:0.000118:s3:POST
/testbucket/files/192.txt:complete_multipart:authorizing
2013-12-01 11:37:15.560013 7f7891fd3700 10 get_canon_resource():
dest=/testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
2013-12-01 11:37:15.560027 7f7891fd3700 10 auth_hdr:
POST

application/xml
Sun, 01 Dec 2013 10:37:10 GMT
/testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
2013-12-01 11:37:15.560085 7f7891fd3700  2 req 1314:0.003399:s3:POST
/testbucket/files/192.txt:complete_multipart:reading permissions
2013-12-01 11:37:15.562356 7f7891fd3700  2 req 1314:0.005670:s3:POST
/testbucket/files/192.txt:complete_multipart:verifying op permissions
2013-12-01 11:37:15.562373 7f7891fd3700  5 Searching permissions for
uid=0 mask=2
2013-12-01 11:37:15.562377 7f7891fd3700  5 Found permission: 15
2013-12-01 11:37:15.562378 7f7891fd3700 10  uid=0 requested perm
(type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2013-12-01 11:37:15.562381 7f7891fd3700  2 req 1314:0.005695:s3:POST
/testbucket/files/192.txt:complete_multipart:verifying op params
2013-12-01 11:37:15.562384 7f7891fd3700  2 req 1314:0.005698:s3:POST
/testbucket/files/192.txt:complete_multipart:executing
2013-12-01 11:37:15.565461 7f7891fd3700 10 calculated etag:
d41d8cd98f00b204e9800998ecf8427e-0
2013-12-01 11:37:15.566718 7f7891fd3700 10 can't clone object
testbucket:files/192.txt to shadow object, tag/shadow_obj haven't been
set
2013-12-01 11:37:15.566777 7f7891fd3700  0 setting object
tag=_leyAzxCw7YxpKv8P3v3QGwcsw__9VmP
2013-12-01 11:37:15.678973 7f7891fd3700  2 req 1314:0.122286:s3:POST
/testbucket/files/192.txt:complete_multipart:http status=200
2013-12-01 11:37:15.679192 7f7891fd3700  1 == req done
req=0x25406d0 http_status=200 ==

Yes, I can read oryginal object.

--
Regards
Dominik

2013/12/2 Yehuda Sadeh yeh...@inktank.com:
 That's unknown bug. I have a guess as to how the original object was
 created. Can you read the original object, but only copy fails?

 On Dec 2, 2013 4:53 AM, Dominik Mostowiec dominikmostow...@gmail.com
 wrote:

 Hi,
 I found that issue is related with ETag: -0 (ends -0)
 This is known bug ?

 --
 Regards
 Dominik

 2013/12/2 Dominik Mostowiec dominikmostow...@gmail.com:
  Hi,
  I have strange problem.
  Obj copy (0 size) killing radosgw.
 
  Head for this file:
  Content-Type: application/octet-stream
  Server: Apache/2.2.22 (Ubuntu)
  ETag: d41d8cd98f00b204e9800998ecf8427e-0
  Last-Modified: 2013-12-01T10:37:15Z
 
  rgw log.
  2013-12-02 08:18:59.196651 7f5308ff1700  1 == starting new request
  req=0x2be6fa0 =
  2013-12-02 08:18:59.196709 7f5308ff1700  2 req
  237:0.58initializing
  2013-12-02 08:18:59.196752 7f5308ff1700 10 meta
  HTTP_X_AMZ_ACL=public-read
  2013-12-02 08:18:59.196760 7f5308ff1700 10 meta
  HTTP_X_AMZ_COPY_SOURCE=/testbucket/testfile.xml
  2013-12-02 08:18:59.196766 7f5308ff1700 10 meta
  HTTP_X_AMZ_METADATA_DIRECTIVE=COPY
  2013-12-02 08:18:59.196771 7f5308ff1700 10 x x-amz-acl:public-read
  2013-12-02 08:18:59.196772 7f5308ff1700 10 x
  x-amz-copy-source:/testbucket/testfile.xml
  2013-12-02 08:18:59.196773 7f5308ff1700 10 x
  x-amz-metadata-directive:COPY
  2013-12-02 08:18:59.196786 7f5308ff1700 10
  s-object=/testbucket/new_testfile.ini s-bucket=testbucket
  2013-12-02 08:18:59.196792 7f5308ff1700  2 req 237:0.000141:s3:PUT
  /testbucket/new_testfile.ini::getting op
  2013-12-02 08:18:59.196797 7f5308ff1700  2 req 237:0.000146:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:authorizing
  2013-12-02 08:18:59.200648 7f5308ff1700 10 get_canon_resource():
  dest=/testbucket/new_testfile.ini
  2013-12-02 08:18:59.200661 7f5308ff1700 10 auth_hdr:
  PUT
  1B2M2Y8AsgTpgAmY7PhCfg==
  application/octet-stream
  Mon, 02 Dec 2013 07:18:55 GMT
  x-amz-acl:public-read
  x-amz-copy-source:/testbucket/testfile.xml
  x-amz-metadata-directive:COPY
  /testbucket/new_testfile.ini
  2013-12-02 08:18:59.200717 7f5308ff1700  2 req 237:0.004066:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:reading permissions
  2013-12-02 08:18:59.203330 7f5308ff1700  2 req 237:0.006679:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:verifying op permissions
  2013-12-02 08:18:59.207627 7f5308ff1700 10 manifest: total_size = 0
  2013-12-02 08:18:59.207649 7f5308ff1700  5 Searching permissions for
  uid=0 mask=1
  2013-12-02 08:18:59.207652 7f5308ff1700  5 Found permission: 15
  2013-12-02 08:18:59.207654 7f5308ff1700 10  uid=0 requested perm
  (type)=1, policy perm=1, user_perm_mask=15, acl perm=1
  2013-12-02 08:18:59.207669 7f5308ff1700  5 Searching

Re: [ceph-users] radosgw Segmentation fault on obj copy

2013-12-02 Thread Dominik Mostowiec
Yes, this is probably upload empty file.
This is the problem?

--
Regards
Dominik


2013/12/2 Yehuda Sadeh yeh...@inktank.com:
 By any chance are you uploading empty objects through the multipart upload 
 api?

 On Mon, Dec 2, 2013 at 12:08 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 Another file with the same problems:

 2013-12-01 11:37:15.556687 7f7891fd3700  1 == starting new request
 req=0x25406d0 =
 2013-12-01 11:37:15.556739 7f7891fd3700  2 req 1314:0.52initializing
 2013-12-01 11:37:15.556789 7f7891fd3700 10 s-object=files/192.txt
 s-bucket=testbucket
 2013-12-01 11:37:15.556799 7f7891fd3700  2 req 1314:0.000112:s3:POST
 /testbucket/files/192.txt::getting op
 2013-12-01 11:37:15.556804 7f7891fd3700  2 req 1314:0.000118:s3:POST
 /testbucket/files/192.txt:complete_multipart:authorizing
 2013-12-01 11:37:15.560013 7f7891fd3700 10 get_canon_resource():
 dest=/testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
 2013-12-01 11:37:15.560027 7f7891fd3700 10 auth_hdr:
 POST

 application/xml
 Sun, 01 Dec 2013 10:37:10 GMT
 /testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
 2013-12-01 11:37:15.560085 7f7891fd3700  2 req 1314:0.003399:s3:POST
 /testbucket/files/192.txt:complete_multipart:reading permissions
 2013-12-01 11:37:15.562356 7f7891fd3700  2 req 1314:0.005670:s3:POST
 /testbucket/files/192.txt:complete_multipart:verifying op permissions
 2013-12-01 11:37:15.562373 7f7891fd3700  5 Searching permissions for
 uid=0 mask=2
 2013-12-01 11:37:15.562377 7f7891fd3700  5 Found permission: 15
 2013-12-01 11:37:15.562378 7f7891fd3700 10  uid=0 requested perm
 (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
 2013-12-01 11:37:15.562381 7f7891fd3700  2 req 1314:0.005695:s3:POST
 /testbucket/files/192.txt:complete_multipart:verifying op params
 2013-12-01 11:37:15.562384 7f7891fd3700  2 req 1314:0.005698:s3:POST
 /testbucket/files/192.txt:complete_multipart:executing
 2013-12-01 11:37:15.565461 7f7891fd3700 10 calculated etag:
 d41d8cd98f00b204e9800998ecf8427e-0
 2013-12-01 11:37:15.566718 7f7891fd3700 10 can't clone object
 testbucket:files/192.txt to shadow object, tag/shadow_obj haven't been
 set
 2013-12-01 11:37:15.566777 7f7891fd3700  0 setting object
 tag=_leyAzxCw7YxpKv8P3v3QGwcsw__9VmP
 2013-12-01 11:37:15.678973 7f7891fd3700  2 req 1314:0.122286:s3:POST
 /testbucket/files/192.txt:complete_multipart:http status=200
 2013-12-01 11:37:15.679192 7f7891fd3700  1 == req done
 req=0x25406d0 http_status=200 ==

 Yes, I can read oryginal object.

 --
 Regards
 Dominik

 2013/12/2 Yehuda Sadeh yeh...@inktank.com:
 That's unknown bug. I have a guess as to how the original object was
 created. Can you read the original object, but only copy fails?

 On Dec 2, 2013 4:53 AM, Dominik Mostowiec dominikmostow...@gmail.com
 wrote:

 Hi,
 I found that issue is related with ETag: -0 (ends -0)
 This is known bug ?

 --
 Regards
 Dominik

 2013/12/2 Dominik Mostowiec dominikmostow...@gmail.com:
  Hi,
  I have strange problem.
  Obj copy (0 size) killing radosgw.
 
  Head for this file:
  Content-Type: application/octet-stream
  Server: Apache/2.2.22 (Ubuntu)
  ETag: d41d8cd98f00b204e9800998ecf8427e-0
  Last-Modified: 2013-12-01T10:37:15Z
 
  rgw log.
  2013-12-02 08:18:59.196651 7f5308ff1700  1 == starting new request
  req=0x2be6fa0 =
  2013-12-02 08:18:59.196709 7f5308ff1700  2 req
  237:0.58initializing
  2013-12-02 08:18:59.196752 7f5308ff1700 10 meta
  HTTP_X_AMZ_ACL=public-read
  2013-12-02 08:18:59.196760 7f5308ff1700 10 meta
  HTTP_X_AMZ_COPY_SOURCE=/testbucket/testfile.xml
  2013-12-02 08:18:59.196766 7f5308ff1700 10 meta
  HTTP_X_AMZ_METADATA_DIRECTIVE=COPY
  2013-12-02 08:18:59.196771 7f5308ff1700 10 x x-amz-acl:public-read
  2013-12-02 08:18:59.196772 7f5308ff1700 10 x
  x-amz-copy-source:/testbucket/testfile.xml
  2013-12-02 08:18:59.196773 7f5308ff1700 10 x
  x-amz-metadata-directive:COPY
  2013-12-02 08:18:59.196786 7f5308ff1700 10
  s-object=/testbucket/new_testfile.ini s-bucket=testbucket
  2013-12-02 08:18:59.196792 7f5308ff1700  2 req 237:0.000141:s3:PUT
  /testbucket/new_testfile.ini::getting op
  2013-12-02 08:18:59.196797 7f5308ff1700  2 req 237:0.000146:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:authorizing
  2013-12-02 08:18:59.200648 7f5308ff1700 10 get_canon_resource():
  dest=/testbucket/new_testfile.ini
  2013-12-02 08:18:59.200661 7f5308ff1700 10 auth_hdr:
  PUT
  1B2M2Y8AsgTpgAmY7PhCfg==
  application/octet-stream
  Mon, 02 Dec 2013 07:18:55 GMT
  x-amz-acl:public-read
  x-amz-copy-source:/testbucket/testfile.xml
  x-amz-metadata-directive:COPY
  /testbucket/new_testfile.ini
  2013-12-02 08:18:59.200717 7f5308ff1700  2 req 237:0.004066:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:reading permissions
  2013-12-02 08:18:59.203330 7f5308ff1700  2 req 237:0.006679:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:verifying op permissions
  2013-12-02 08:18:59.207627 7f5308ff1700 10 manifest

Re: [ceph-users] radosgw Segmentation fault on obj copy

2013-12-02 Thread Dominik Mostowiec
You're right.

S3 api doc: http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html
Err:EntityTooSmall
Your proposed upload is smaller than the minimum allowed object size.
Each part must be at least 5 MB in size, except the last part.

Thanks.

This error should be triggered from radosgw also.

--
Regards
Dominik

2013/12/2 Yehuda Sadeh yeh...@inktank.com:
 Looks like it. There should be a guard against it (mulitpart upload
 minimum is 5M).

 On Mon, Dec 2, 2013 at 12:32 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Yes, this is probably upload empty file.
 This is the problem?

 --
 Regards
 Dominik


 2013/12/2 Yehuda Sadeh yeh...@inktank.com:
 By any chance are you uploading empty objects through the multipart upload 
 api?

 On Mon, Dec 2, 2013 at 12:08 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 Another file with the same problems:

 2013-12-01 11:37:15.556687 7f7891fd3700  1 == starting new request
 req=0x25406d0 =
 2013-12-01 11:37:15.556739 7f7891fd3700  2 req 
 1314:0.52initializing
 2013-12-01 11:37:15.556789 7f7891fd3700 10 s-object=files/192.txt
 s-bucket=testbucket
 2013-12-01 11:37:15.556799 7f7891fd3700  2 req 1314:0.000112:s3:POST
 /testbucket/files/192.txt::getting op
 2013-12-01 11:37:15.556804 7f7891fd3700  2 req 1314:0.000118:s3:POST
 /testbucket/files/192.txt:complete_multipart:authorizing
 2013-12-01 11:37:15.560013 7f7891fd3700 10 get_canon_resource():
 dest=/testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
 2013-12-01 11:37:15.560027 7f7891fd3700 10 auth_hdr:
 POST

 application/xml
 Sun, 01 Dec 2013 10:37:10 GMT
 /testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
 2013-12-01 11:37:15.560085 7f7891fd3700  2 req 1314:0.003399:s3:POST
 /testbucket/files/192.txt:complete_multipart:reading permissions
 2013-12-01 11:37:15.562356 7f7891fd3700  2 req 1314:0.005670:s3:POST
 /testbucket/files/192.txt:complete_multipart:verifying op permissions
 2013-12-01 11:37:15.562373 7f7891fd3700  5 Searching permissions for
 uid=0 mask=2
 2013-12-01 11:37:15.562377 7f7891fd3700  5 Found permission: 15
 2013-12-01 11:37:15.562378 7f7891fd3700 10  uid=0 requested perm
 (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
 2013-12-01 11:37:15.562381 7f7891fd3700  2 req 1314:0.005695:s3:POST
 /testbucket/files/192.txt:complete_multipart:verifying op params
 2013-12-01 11:37:15.562384 7f7891fd3700  2 req 1314:0.005698:s3:POST
 /testbucket/files/192.txt:complete_multipart:executing
 2013-12-01 11:37:15.565461 7f7891fd3700 10 calculated etag:
 d41d8cd98f00b204e9800998ecf8427e-0
 2013-12-01 11:37:15.566718 7f7891fd3700 10 can't clone object
 testbucket:files/192.txt to shadow object, tag/shadow_obj haven't been
 set
 2013-12-01 11:37:15.566777 7f7891fd3700  0 setting object
 tag=_leyAzxCw7YxpKv8P3v3QGwcsw__9VmP
 2013-12-01 11:37:15.678973 7f7891fd3700  2 req 1314:0.122286:s3:POST
 /testbucket/files/192.txt:complete_multipart:http status=200
 2013-12-01 11:37:15.679192 7f7891fd3700  1 == req done
 req=0x25406d0 http_status=200 ==

 Yes, I can read oryginal object.

 --
 Regards
 Dominik

 2013/12/2 Yehuda Sadeh yeh...@inktank.com:
 That's unknown bug. I have a guess as to how the original object was
 created. Can you read the original object, but only copy fails?

 On Dec 2, 2013 4:53 AM, Dominik Mostowiec dominikmostow...@gmail.com
 wrote:

 Hi,
 I found that issue is related with ETag: -0 (ends -0)
 This is known bug ?

 --
 Regards
 Dominik

 2013/12/2 Dominik Mostowiec dominikmostow...@gmail.com:
  Hi,
  I have strange problem.
  Obj copy (0 size) killing radosgw.
 
  Head for this file:
  Content-Type: application/octet-stream
  Server: Apache/2.2.22 (Ubuntu)
  ETag: d41d8cd98f00b204e9800998ecf8427e-0
  Last-Modified: 2013-12-01T10:37:15Z
 
  rgw log.
  2013-12-02 08:18:59.196651 7f5308ff1700  1 == starting new request
  req=0x2be6fa0 =
  2013-12-02 08:18:59.196709 7f5308ff1700  2 req
  237:0.58initializing
  2013-12-02 08:18:59.196752 7f5308ff1700 10 meta
  HTTP_X_AMZ_ACL=public-read
  2013-12-02 08:18:59.196760 7f5308ff1700 10 meta
  HTTP_X_AMZ_COPY_SOURCE=/testbucket/testfile.xml
  2013-12-02 08:18:59.196766 7f5308ff1700 10 meta
  HTTP_X_AMZ_METADATA_DIRECTIVE=COPY
  2013-12-02 08:18:59.196771 7f5308ff1700 10 x x-amz-acl:public-read
  2013-12-02 08:18:59.196772 7f5308ff1700 10 x
  x-amz-copy-source:/testbucket/testfile.xml
  2013-12-02 08:18:59.196773 7f5308ff1700 10 x
  x-amz-metadata-directive:COPY
  2013-12-02 08:18:59.196786 7f5308ff1700 10
  s-object=/testbucket/new_testfile.ini s-bucket=testbucket
  2013-12-02 08:18:59.196792 7f5308ff1700  2 req 237:0.000141:s3:PUT
  /testbucket/new_testfile.ini::getting op
  2013-12-02 08:18:59.196797 7f5308ff1700  2 req 237:0.000146:s3:PUT
  /testbucket/new_testfile.ini:copy_obj:authorizing
  2013-12-02 08:18:59.200648 7f5308ff1700 10 get_canon_resource():
  dest=/testbucket/new_testfile.ini
  2013-12-02 08:18:59.200661 7f5308ff1700 10

Re: radosgw Segmentation fault on obj copy

2013-12-02 Thread Dominik Mostowiec
for another object.
http://pastebin.com/VkVAYgwn


2013/12/3 Yehuda Sadeh yeh...@inktank.com:
 I see. Do you have backtrace for the crash?

 On Mon, Dec 2, 2013 at 6:19 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 0.56.7

 W dniu poniedziałek, 2 grudnia 2013 użytkownik Yehuda Sadeh napisał:

 I'm having trouble reproducing the issue. What version are you using?

 Thanks,
 Yehuda

 On Mon, Dec 2, 2013 at 2:16 PM, Yehuda Sadeh yeh...@inktank.com wrote:
  Actually, I read that differently. It only says that if there's more
  than 1 part, all parts except for the last one need to be  5M. Which
  means that for uploads that are smaller than 5M there should be zero
  or one parts.
 
  On Mon, Dec 2, 2013 at 12:54 PM, Dominik Mostowiec
  dominikmostow...@gmail.com wrote:
  You're right.
 
  S3 api doc:
  http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html
  Err:EntityTooSmall
  Your proposed upload is smaller than the minimum allowed object size.
  Each part must be at least 5 MB in size, except the last part.
 
  Thanks.
 
  This error should be triggered from radosgw also.
 
  --
  Regards
  Dominik
 
  2013/12/2 Yehuda Sadeh yeh...@inktank.com:
  Looks like it. There should be a guard against it (mulitpart upload
  minimum is 5M).
 
  On Mon, Dec 2, 2013 at 12:32 PM, Dominik Mostowiec
  dominikmostow...@gmail.com wrote:
  Yes, this is probably upload empty file.
  This is the problem?
 
  --
  Regards
  Dominik
 
 
  2013/12/2 Yehuda Sadeh yeh...@inktank.com:
  By any chance are you uploading empty objects through the multipart
  upload api?
 
  On Mon, Dec 2, 2013 at 12:08 PM, Dominik Mostowiec
  dominikmostow...@gmail.com wrote:
  Hi,
  Another file with the same problems:
 
  2013-12-01 11:37:15.556687 7f7891fd3700  1 == starting new
  request
  req=0x25406d0 =
  2013-12-01 11:37:15.556739 7f7891fd3700  2 req
  1314:0.52initializing
  2013-12-01 11:37:15.556789 7f7891fd3700 10 s-object=files/192.txt
  s-bucket=testbucket
  2013-12-01 11:37:15.556799 7f7891fd3700  2 req
  1314:0.000112:s3:POST
  /testbucket/files/192.txt::getting op
  2013-12-01 11:37:15.556804 7f7891fd3700  2 req
  1314:0.000118:s3:POST
  /testbucket/files/192.txt:complete_multipart:authorizing
  2013-12-01 11:37:15.560013 7f7891fd3700 10 get_canon_resource():
 
  dest=/testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
  2013-12-01 11:37:15.560027 7f7891fd3700 10 auth_hdr:
  POST
 
  application/xml
  Sun, 01 Dec 2013 10:37:10 GMT
  /testbucket/files/192.txt?uploadId=i92xi2olzDtFAeLXlfU2PFP9CDU87BC
  2013-12-01 11:37:15.560085 7f7891fd3700  2 req
  1314:0.003399:s3:POST
  /testbucket/files/192.txt:complete_multipart:reading permissions
  2013-12-01 11:37:15.562356 7f7891fd3700  2 req
  1314:0.005670:s3:POST
  /testbucket/files/192.txt:complete_multipart:verifying op
  permissions
  2013-12-01 11:37:15.562373 7f7891fd3700  5 Searching permissions
  for
  uid=0 mask=2
  2013-12-01 11:37:15.562377 7f7891fd3700  5 Found permission: 15
  2013-12-01 11:37:15.562378 7f7891fd3700 10  uid=0 requested perm
  (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
  2013-12-01 11:37:15.562381 7f7891fd3700  2 req
  1314:0.005695:s3:POST
  /testbucket/files/192.txt:complete_multipart:verifying op params
  2013-12-01 11:37:15.562384 7f7891fd3700  2 req
  1314:0.005698:s3:POST
  /testbucket/files/192.txt:complete_multipart:executing
  2013-12-01 11:37:15.565461 7f7891fd3700 10 calculated etag:
  d41d8cd98f00b204e9800998ecf8427e-0
  2013-12-01 11:37:15.566718 7f7891fd3700 10 can't clone object
  testbucket:files/192.txt to shadow object, tag/shadow_obj haven't
  been
  set
  2013-12-01 11:37:15.566777 7f7891fd3700  0 setting object
  tag=_leyAzxCw7YxpKv8P3v3QGwcsw__9VmP
  2013-12-01 11:37:15.678973 7f7891fd3700  2 req
  1314:0.122286:s3:POST
  /testbucket/files/192.txt:complete_multipart:http status=200
  2013-12-01 11:37:15.679192 7f7891fd3700  1 == req done
  req=0x25406d0 http_status=200 ==
 
  Yes, I can read oryginal object.
 
  --
  Regards
  Dominik
 
  2013/12/2 Yehuda Sadeh yeh...@inktank.com:
  That's unknown bug. I have a guess as to how the original object
  was
  created. Can you read the original object, but only copy fails?
 
  On Dec 2, 2013 4:53 AM, Dominik Mostowiec
  dominikmostow...@gmail.com
  wrote:
 
  Hi,
  I found that issue is related with ETag: -0 (ends -0)
  This is known bug ?
 
  --
  Regards
  Dominik
 
  2013/12/2 Dominik Mostowiec dominikmostow...@gmail.com:
   Hi,
   I have strange problem.
   Obj copy (0 size) killing radosgw.
  
   Head for this file:
   Content-Type: application/octet-stream
   Server: Apache/2.2.22 (Ubuntu)
   ETag: d41d8cd98f00b204e9800998ecf8427e-0
   Last-Modified: 2013-12-01T10:37:15Z
  
   rgw log.
   2013-12-02 08:18:59.196651 7f5308ff1700  1 == starting new
   request
   req=0x2be6fa0 =
   2013-12-02 08:18:59.196709 7f5308ff1700  2 req
   237:0.58initializing
   2013-12-02 08:18:59.196752 7f5308ff1700 10

os recommendations

2013-11-26 Thread Dominik Mostowiec
Hi,
I found in doc: http://ceph.com/docs/master/start/os-recommendations/
Putting multiple ceph-osd daemons using XFS or ext4 on the same host
will not perform as well as they could.

For now recommended filesystem is XFS.
This means that for the best performance setup should be 1 OSD per host?

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[radosgw] increase avg get time after sharding

2013-11-26 Thread Dominik Mostowiec
Hi,
We have 2 clusters with copy of objects.
On one of them we splited all large buckets (largest 17mln objects) to
256 buckets each (shards) and we have added 3 extra servers (6-9).
Old bucket was created in ceph argonaut.
Now we have dumpling.
After this operation get avg time increased almost twice:
https://www.dropbox.com/s/lbrpk6ias6r4459/avg_get_time.PNG
I found that also increased 'get_initial_lat'
Sum of this parametr on all servers in cluster:
https://www.dropbox.com/s/l90a43vf9ivw639/get_initial_lat_sum.png

Do you have an idea what is the reason?
Where can I start to search?

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] stopped backfilling process

2013-11-06 Thread Dominik Mostowiec
I hope it will help.

crush: https://www.dropbox.com/s/inrmq3t40om26vf/crush.txt
ceph osd dump: https://www.dropbox.com/s/jsbt7iypyfnnbqm/ceph_osd_dump.txt

--
Regards
Dominik

2013/11/6 yy-nm yxdyours...@gmail.com:
 On 2013/11/5 22:02, Dominik Mostowiec wrote:

 Hi,
 After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
 starts data migration process.
 It stopped on:
 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
 active+degraded, 2 active+clean+scrubbing;
 degraded (1.718%)

 All osd with reweight==1 are UP.

 ceph -v
 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

 health details:
 https://www.dropbox.com/s/149zvee2ump1418/health_details.txt

 pg active+degraded query:
 https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
 pg active+remapped query:
 https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt

 Please help - how can we fix it?

 can you show  your  decoded crushmap? and output of #ceph osd dump ?

 ---
 此电子邮件没有病毒和恶意软件,因为 avast! 防病毒保护处于活动状态。
 http://www.avast.com




-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: stopped backfilling process

2013-11-05 Thread Dominik Mostowiec
Hi,
This is s3/ceph cluster, .rgw.buckets has 3 copies of data.
Many PG's are only on 2 OSD's and are marked as 'degraded'.
Scrubbing can fix this on degraded object's?

I don't have set tunables in cruch, mabye this can help (this is safe?)?

--
Regards
Dominik



2013/11/5 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 After remove ( ceph osd out X) osd from one server ( 11 osd ) ceph
 starts data migration process.
 It stopped on:
 32424 pgs: 30635 active+clean, 191 active+remapped, 1596
 active+degraded, 2 active+clean+scrubbing;
 degraded (1.718%)

 All osd with reweight==1 are UP.

 ceph -v
 ceph version 0.56.7 (14f23ab86b0058a8651895b3dc972a29459f3a33)

 health details:
 https://www.dropbox.com/s/149zvee2ump1418/health_details.txt

 pg active+degraded query:
 https://www.dropbox.com/s/46emswxd7s8xce1/pg_11.39_query.txt
 pg active+remapped query:
 https://www.dropbox.com/s/wij4uqh8qoz60fd/pg_16.2172_query.txt

 Please help - how can we fix it?

 --
 Pozdrawiam
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


radosgw - complete_multipart errors

2013-10-31 Thread Dominik Mostowiec
Hi,
I have strange radosgw error:

==
2013-10-26 21:18:29.844676 7f637beaf700  0 setting object
tag=_ZPeVs7d6W8GjU8qKr4dsilbGeo6NOgw
2013-10-26 21:18:30.049588 7f637beaf700  0 WARNING: set_req_state_err
err_no=125 resorting to 500
2013-10-26 21:18:30.049738 7f637beaf700  2 req 61655:0.224186:s3:POST
/testbucket/files/image%20%286%29.jpeg:complete_multipart:http
status=500
2013-10-26 21:18:30.049975 7f637beaf700  1 == req done
req=0x11013d0 http_status=500 ==

It's similar to: http://tracker.ceph.com/issues/5439
This is the same bug?
My ceph version:
ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)

Bug (5439) is fixed.
How to check (if I know commit
72d4351ea5a470051e428ffc5531acfc7d1c7b6f ) which CEPH version have
this fix.
In latest bobtail - 0.56.7 it is fixed?

---
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues when bucket index deep-scrubbing

2013-10-21 Thread Dominik Mostowiec
Hi,
Thanks for your response.

 That is definitely the obvious next step, but it's a non-trivial
 amount of work and hasn't yet been started on by anybody. This is
 probably a good subject for a CDS blueprint!
But we want to split our big bucket into the smallest ones. We want to
shard it before radosgw.
Do you think this is a good idea to make workaround of this problem
(big index issues)?

Regards
Dominik



2013/10/18 Gregory Farnum g...@inktank.com:
 On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 I plan to shard my largest bucket because of issues of deep-scrubbing
 (when PG which index for this bucket is stored on is deep-scrubbed, it
 appears many slow requests and OSD grows in memory - after latest
 scrub it grows up to 9G).

 I trying to found why large bucket index make issues when it is scrubbed.
 On test cluster:
 radosgw-admin bucket stats --bucket=test1-XX
 { bucket: test1-XX,
   pool: .rgw.buckets,
   index_pool: .rgw.buckets,
   id: default.4211.2,
 ...

 I guess index is in object .dir.default.4211.2. (pool: .rgw.buckets)

 rados -p .rgw.buckets get .dir.default.4211.2 -
 empty

 But:
 rados -p .rgw.buckets listomapkeys .dir.default.4211.2
 test_file_2.txt
 test_file_2_11.txt
 test_file_3.txt
 test_file_4.txt
 test_file_5.txt

 I guess that list of files are stored in leveldb not in one large file.
 'omap' files are stored in {osd_dir}/current/omap/, the largest file
 that i found in this directory (on production) have 8.8M.

 I'm a little confused.

 How list of files (for bucket) is stored?

 The index is stored as a bunch of omap entries in a single object.

 If list of objects in bucket is splitted on many small files in
 leveldb that large bucket (with many files) should not cause larger
 latency in PUT new object.

 That's not quite how it works. Leveldb has a custom storage format in
 which it stores sets of keys based on both time of update and the
 value of the key, so the size of the individual files in its directory
 has no correlation to the number or size of any given set of entries.

 Scrubbing also should not be a problem i think ...

 The problem you're running into is that scrubbing is done on an
 object-by-object basis, and so the OSD is reading all of the keys
 associated with that object out of leveldb, and processing them, at
 once. This number can be very much larger than the 8MB file you've
 found in the leveldb directory, as discussed above.

 What you think about using a sharding to split big buckets into the
 smalest one to avoid the problems with big indexes?

 That is definitely the obvious next step, but it's a non-trivial
 amount of work and hasn't yet been started on by anybody. This is
 probably a good subject for a CDS blueprint!
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: issues when bucket index deep-scrubbing

2013-10-21 Thread Dominik Mostowiec
 You shouldn't run into any issues except the scrubbing on a large index 
 object.
Great !!

 There's not a great way to get around that right now; sorry. :(
Ok.

Thanks for Your help.

--
Regards
Dominik

2013/10/21 Gregory Farnum g...@inktank.com:
 You shouldn't run into any issues except the scrubbing on a large
 index object. There's not a great way to get around that right now;
 sorry. :(
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Oct 21, 2013 at 1:44 PM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 Thanks, for now i'm sure what to do.

 Maybe there is another way ( except turning off deep-scrubbing) to
 avoid issues caused by large indexes?

 Now we have ~15m bojects in the largest bucket.
 In the short term(after sharding) we want to put there 100m object more.
 Are there any other limitations in ceph that can affect us?

 --
 Regards
 Dominik


 2013/10/21 Gregory Farnum g...@inktank.com:
 On Mon, Oct 21, 2013 at 2:26 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 Thanks for your response.

 That is definitely the obvious next step, but it's a non-trivial
 amount of work and hasn't yet been started on by anybody. This is
 probably a good subject for a CDS blueprint!
 But we want to split our big bucket into the smallest ones. We want to
 shard it before radosgw.
 Do you think this is a good idea to make workaround of this problem
 (big index issues)?

 Oh, yes, this is a good workaround.
 Sorry, I misread your initial post and thought you were discussing
 sharding the bucket index itself, rather than sharding across buckets
 in the application. :)
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com



 Regards
 Dominik



 2013/10/18 Gregory Farnum g...@inktank.com:
 On Fri, Oct 18, 2013 at 4:01 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:
 Hi,
 I plan to shard my largest bucket because of issues of deep-scrubbing
 (when PG which index for this bucket is stored on is deep-scrubbed, it
 appears many slow requests and OSD grows in memory - after latest
 scrub it grows up to 9G).

 I trying to found why large bucket index make issues when it is scrubbed.
 On test cluster:
 radosgw-admin bucket stats --bucket=test1-XX
 { bucket: test1-XX,
   pool: .rgw.buckets,
   index_pool: .rgw.buckets,
   id: default.4211.2,
 ...

 I guess index is in object .dir.default.4211.2. (pool: .rgw.buckets)

 rados -p .rgw.buckets get .dir.default.4211.2 -
 empty

 But:
 rados -p .rgw.buckets listomapkeys .dir.default.4211.2
 test_file_2.txt
 test_file_2_11.txt
 test_file_3.txt
 test_file_4.txt
 test_file_5.txt

 I guess that list of files are stored in leveldb not in one large file.
 'omap' files are stored in {osd_dir}/current/omap/, the largest file
 that i found in this directory (on production) have 8.8M.

 I'm a little confused.

 How list of files (for bucket) is stored?

 The index is stored as a bunch of omap entries in a single object.

 If list of objects in bucket is splitted on many small files in
 leveldb that large bucket (with many files) should not cause larger
 latency in PUT new object.

 That's not quite how it works. Leveldb has a custom storage format in
 which it stores sets of keys based on both time of update and the
 value of the key, so the size of the individual files in its directory
 has no correlation to the number or size of any given set of entries.

 Scrubbing also should not be a problem i think ...

 The problem you're running into is that scrubbing is done on an
 object-by-object basis, and so the OSD is reading all of the keys
 associated with that object out of leveldb, and processing them, at
 once. This number can be very much larger than the 8MB file you've
 found in the leveldb directory, as discussed above.

 What you think about using a sharding to split big buckets into the
 smalest one to avoid the problems with big indexes?

 That is definitely the obvious next step, but it's a non-trivial
 amount of work and hasn't yet been started on by anybody. This is
 probably a good subject for a CDS blueprint!
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com



 --
 Pozdrawiam
 Dominik



 --
 Pozdrawiam
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


issues when bucket index deep-scrubbing

2013-10-18 Thread Dominik Mostowiec
Hi,
I plan to shard my largest bucket because of issues of deep-scrubbing
(when PG which index for this bucket is stored on is deep-scrubbed, it
appears many slow requests and OSD grows in memory - after latest
scrub it grows up to 9G).

I trying to found why large bucket index make issues when it is scrubbed.
On test cluster:
radosgw-admin bucket stats --bucket=test1-XX
{ bucket: test1-XX,
  pool: .rgw.buckets,
  index_pool: .rgw.buckets,
  id: default.4211.2,
...

I guess index is in object .dir.default.4211.2. (pool: .rgw.buckets)

rados -p .rgw.buckets get .dir.default.4211.2 -
empty

But:
rados -p .rgw.buckets listomapkeys .dir.default.4211.2
test_file_2.txt
test_file_2_11.txt
test_file_3.txt
test_file_4.txt
test_file_5.txt

I guess that list of files are stored in leveldb not in one large file.
'omap' files are stored in {osd_dir}/current/omap/, the largest file
that i found in this directory (on production) have 8.8M.

I'm a little confused.

How list of files (for bucket) is stored?
If list of objects in bucket is splitted on many small files in
leveldb that large bucket (with many files) should not cause larger
latency in PUT new object.
Scrubbing also should not be a problem i think ...

What you think about using a sharding to split big buckets into the
smalest one to avoid the problems with big indexes?

--
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


osd down after server failure

2013-10-14 Thread Dominik Mostowiec
Hi,
I had server failure that starts from one disk failure:
Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023986] sd 4:2:26:0:
[sdaa] Unhandled error code
Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023990] sd 4:2:26:0:
[sdaa]  Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023995] sd 4:2:26:0:
[sdaa] CDB: Read(10): 28 00 00 00 00 d0 00 00 10 00
Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024005] end_request:
I/O error, dev sdaa, sector 208
Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024744] XFS (sdaa):
metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf
count 8192
Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.025879] XFS (sdaa):
xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.820288] XFS (sdaa):
metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf
count 8192
Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.821194] XFS (sdaa):
xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
Oct 14 03:25:32 s3-10-177-64-6 kernel: [1027264.667851] XFS (sdaa):
metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf
count 8192

this caused that the server has been unresponsive.

After server restart 3 of 26 osd on it are down.
In ceph-osd log after debug osd = 10 and restart is:

2013-10-14 06:21:23.141936 7fdeb4872700 -1 osd.47 43203 *** Got signal
Terminated ***
2013-10-14 06:21:23.142141 7fdeb4872700 -1 osd.47 43203  pausing thread pools
2013-10-14 06:21:23.142146 7fdeb4872700 -1 osd.47 43203  flushing io
2013-10-14 06:21:25.406187 7f02690f9780  0
filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and
appears to work
2013-10-14 06:21:25.406204 7f02690f9780  0
filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via
'filestore fiemap' config option
2013-10-14 06:21:25.406557 7f02690f9780  0
filestore(/vol0/data/osd.47) mount did NOT detect btrfs
2013-10-14 06:21:25.412617 7f02690f9780  0
filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported
(by glibc and kernel)
2013-10-14 06:21:25.412831 7f02690f9780  0
filestore(/vol0/data/osd.47) mount found snaps 
2013-10-14 06:21:25.415798 7f02690f9780  0
filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode:
btrfs not detected
2013-10-14 06:21:26.078377 7f02690f9780  2 osd.47 0 mounting
/vol0/data/osd.47 /vol0/data/osd.47/journal
2013-10-14 06:21:26.080872 7f02690f9780  0
filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and
appears to work
2013-10-14 06:21:26.080885 7f02690f9780  0
filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via
'filestore fiemap' config option
2013-10-14 06:21:26.081289 7f02690f9780  0
filestore(/vol0/data/osd.47) mount did NOT detect btrfs
2013-10-14 06:21:26.087524 7f02690f9780  0
filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported
(by glibc and kernel)
2013-10-14 06:21:26.087582 7f02690f9780  0
filestore(/vol0/data/osd.47) mount found snaps 
2013-10-14 06:21:26.089614 7f02690f9780  0
filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode:
btrfs not detected
2013-10-14 06:21:26.726676 7f02690f9780  2 osd.47 0 boot
2013-10-14 06:21:26.726773 7f02690f9780 10 osd.47 0 read_superblock
sb(16773c25-5054-4451-bf9f-efc1f7f21b89 osd.47
63cf7d70-99cb-0ab1-4006-002f e43203 [41261,43203]
lci=[43194,43203])
2013-10-14 06:21:26.726862 7f02690f9780 10 osd.47 0 add_map_bl 43203 82622 bytes
2013-10-14 06:21:26.727184 7f02690f9780 10 osd.47 43203 load_pgs
2013-10-14 06:21:26.727643 7f02690f9780 10 osd.47 43203 load_pgs
ignoring unrecognized meta
2013-10-14 06:21:26.727681 7f02690f9780 10 osd.47 43203 load_pgs
3.df1_TEMP clearing temp

osd.47 is still down, I put it out from cluster.
47  1   osd.47  down0

How can I check what is wrong?

ceph -v
ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)

-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: osd down after server failure

2013-10-14 Thread Dominik Mostowiec
Hi
I have found somthing.
After restart time was wrong on server (+2hours) before ntp has fixed it.
I restarted this 3 osd - it not helps.
It is possible that ceph banned this osd? Or after start with wrong
time osd has broken hi's filestore?

--
Regards
Dominik


2013/10/14 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 I had server failure that starts from one disk failure:
 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023986] sd 4:2:26:0:
 [sdaa] Unhandled error code
 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023990] sd 4:2:26:0:
 [sdaa]  Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.023995] sd 4:2:26:0:
 [sdaa] CDB: Read(10): 28 00 00 00 00 d0 00 00 10 00
 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024005] end_request:
 I/O error, dev sdaa, sector 208
 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.024744] XFS (sdaa):
 metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf
 count 8192
 Oct 14 03:25:04 s3-10-177-64-6 kernel: [1027237.025879] XFS (sdaa):
 xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
 Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.820288] XFS (sdaa):
 metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf
 count 8192
 Oct 14 03:25:28 s3-10-177-64-6 kernel: [1027260.821194] XFS (sdaa):
 xfs_imap_to_bp: xfs_trans_read_buf() returned error 5.
 Oct 14 03:25:32 s3-10-177-64-6 kernel: [1027264.667851] XFS (sdaa):
 metadata I/O error: block 0xd0 (xfs_trans_read_buf) error 5 buf
 count 8192

 this caused that the server has been unresponsive.

 After server restart 3 of 26 osd on it are down.
 In ceph-osd log after debug osd = 10 and restart is:

 2013-10-14 06:21:23.141936 7fdeb4872700 -1 osd.47 43203 *** Got signal
 Terminated ***
 2013-10-14 06:21:23.142141 7fdeb4872700 -1 osd.47 43203  pausing thread pools
 2013-10-14 06:21:23.142146 7fdeb4872700 -1 osd.47 43203  flushing io
 2013-10-14 06:21:25.406187 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and
 appears to work
 2013-10-14 06:21:25.406204 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via
 'filestore fiemap' config option
 2013-10-14 06:21:25.406557 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount did NOT detect btrfs
 2013-10-14 06:21:25.412617 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported
 (by glibc and kernel)
 2013-10-14 06:21:25.412831 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount found snaps 
 2013-10-14 06:21:25.415798 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode:
 btrfs not detected
 2013-10-14 06:21:26.078377 7f02690f9780  2 osd.47 0 mounting
 /vol0/data/osd.47 /vol0/data/osd.47/journal
 2013-10-14 06:21:26.080872 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is supported and
 appears to work
 2013-10-14 06:21:26.080885 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount FIEMAP ioctl is disabled via
 'filestore fiemap' config option
 2013-10-14 06:21:26.081289 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount did NOT detect btrfs
 2013-10-14 06:21:26.087524 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount syncfs(2) syscall fully supported
 (by glibc and kernel)
 2013-10-14 06:21:26.087582 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount found snaps 
 2013-10-14 06:21:26.089614 7f02690f9780  0
 filestore(/vol0/data/osd.47) mount: enabling WRITEAHEAD journal mode:
 btrfs not detected
 2013-10-14 06:21:26.726676 7f02690f9780  2 osd.47 0 boot
 2013-10-14 06:21:26.726773 7f02690f9780 10 osd.47 0 read_superblock
 sb(16773c25-5054-4451-bf9f-efc1f7f21b89 osd.47
 63cf7d70-99cb-0ab1-4006-002f e43203 [41261,43203]
 lci=[43194,43203])
 2013-10-14 06:21:26.726862 7f02690f9780 10 osd.47 0 add_map_bl 43203 82622 
 bytes
 2013-10-14 06:21:26.727184 7f02690f9780 10 osd.47 43203 load_pgs
 2013-10-14 06:21:26.727643 7f02690f9780 10 osd.47 43203 load_pgs
 ignoring unrecognized meta
 2013-10-14 06:21:26.727681 7f02690f9780 10 osd.47 43203 load_pgs
 3.df1_TEMP clearing temp

 osd.47 is still down, I put it out from cluster.
 47  1   osd.47  down0

 How can I check what is wrong?

 ceph -v
 ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)

 --
 Pozdrawiam
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


upgrade from bobtail to dumpling

2013-10-07 Thread Dominik Mostowiec
hi,
It is possible to (safe) upgrade directly from bobtail (0.56.6) to
dumpling (latest)?
Is there any instruction?

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


many report failed after mon election

2013-09-12 Thread Dominik Mostowiec
Hi,
Today i have some issues with ceph cluster.
After new mon election many osd has been marked failed.
Some time later osd boot and i think recover because meny slow request appear.
Cluster come back after about 20minutes.

cluster:
ceph version 0.56.6
6 servers x 26 osd

2013-09-12 07:11:40.920384 mon.1 10.177.64.5:6789/0 353 : [INF] mon.3
calling new monitor election
2013-09-12 07:12:40.992532 mon.3 10.177.64.7:6789/0 364 : [INF] mon.4
calling new monitor election
2013-09-12 07:12:41.024954 mon.4 10.177.64.8:6789/0 360 : [INF] mon.2
calling new monitor election
2013-09-12 07:13:02.782203 mon.2 10.177.64.6:6789/0 336 : [INF] mon.1
calling new monitor election
2013-09-12 07:13:02.783778 mon.3 10.177.64.7:6789/0 366 : [INF] mon.4
calling new monitor election
2013-09-12 07:13:10.852842 mon.3 10.177.64.7:6789/0 367 : [INF] mon.4
calling new monitor election
2013-09-12 16:17:09.484277 mon.4 10.177.64.8:6789/0 363 : [INF] mon.2
calling new monitor election
2013-09-12 16:17:09.497337 mon.3 10.177.64.7:6789/0 368 : [INF] mon.4
calling new monitor election
2013-09-12 16:17:09.523787 mon.0 10.177.64.4:6789/0 4369021 : [INF]
mon.0 calling new monitor election
2013-09-12 16:17:14.525282 mon.0 10.177.64.4:6789/0 4369022 : [INF]
mon.0@0 won leader election with quorum 0,1,2,3,4
...
2013-09-12 16:17:14.689555 mon.0 10.177.64.4:6789/0 4369027 : [DBG]
osd.130 10.177.64.9:6801/1401 reported failed by osd.121
10.177.64.7:6909/29496
2013-09-12 16:17:14.689584 mon.0 10.177.64.4:6789/0 4369028 : [DBG]
osd.131 10.177.64.9:6810/2435 reported failed by osd.121
10.177.64.7:6909/29496
2013-09-12 16:17:14.689600 mon.0 10.177.64.4:6789/0 4369029 : [DBG]
osd.132 10.177.64.9:6846/2885 reported failed by osd.121
10.177.64.7:6909/29496
2013-09-12 16:17:14.689615 mon.0 10.177.64.4:6789/0 4369030 : [DBG]
osd.134 10.177.64.9:6855/3223 reported failed by osd.121
10.177.64.7:6909/29496
2013-09-12 16:17:14.689630 mon.0 10.177.64.4:6789/0 4369031 : [DBG]
osd.136 10.177.64.9:6865/3559 reported failed by osd.121
10.177.64.7:6909/29496
2013-09-12 16:17:14.689645 mon.0 10.177.64.4:6789/0 4369032 : [DBG]
osd.141 10.177.64.9:6904/4259 reported failed by osd.121
10.177.64.7:6909/29496

-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph s3 allowed characters

2013-08-30 Thread Dominik Mostowiec
Hi,
I got err (400) from radosgw on request:
2013-08-30 08:09:19.396812 7f3b307c0700  2 req 3070:0.000150::POST
/dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%283%29.pdf::http
status=400
2013-08-30 08:09:34.851892 7f3b55ffb700 10
s-object=files/test.t...@op.pl/DOMIWENT 2013/Damian
DW/dw/Specyfikacja istotnych warunkF3w zamF3wienia.doc
s-bucket=dysk

What is allowed range of chars in url in radosgw?

-- 
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] ceph s3 allowed characters

2013-08-30 Thread Dominik Mostowiec
(echo -n 'GET 
/dysk/files/test.test%40op.pl/DOMIWENT%202013/Damian%20DW/dw/Specyfikacja%20istotnych%20warunk%F3w%20zam%F3wienia.doc
HTTP/1.0'; printf \r\n\r\n) | nc localhost 88
HTTP/1.1 400 Bad Request
Date: Fri, 30 Aug 2013 14:10:07 GMT
Server: Apache/2.2.22 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 83
Connection: close
Content-Type: application/xml

?xml version=1.0
encoding=UTF-8?ErrorCodeInvalidObjectName/Code/Error

Full log from radosgw another error:

2013-08-30 14:32:52.166321 7f42e77d6700  1 == starting new request
req=0x12cff20 =
2013-08-30 14:32:52.166385 7f42e77d6700  2 req 33246:0.65initializing
2013-08-30 14:32:52.166410 7f42e77d6700 10 meta HTTP_X_AMZ_ACL=public-read
2013-08-30 14:32:52.166419 7f42e77d6700 10 x x-amz-acl:public-read
2013-08-30 14:32:52.166497 7f42e77d6700 10
s-object=files/test.t...@op.pl/DOMIWENT 2013/DW 2013_03_27/PROJEKTY
2012/ZB KROL/Szkoła Łaziska ZB
KROL/sala-A3aziska_Dolne_PB-0_went_15_11_06 Layou
t1 (4).pdf s-bucket=dysk
2013-08-30 14:32:52.166563 7f42e77d6700  2 req 33246:0.000242::POST
/dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL
/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%284%29.pdf::http
status=400
2013-08-30 14:32:52.166653 7f42e77d6700  1 == req done
req=0x12cff20 http_status=400 ==

--
Dominik

2013/8/30 Alfredo Deza alfredo.d...@inktank.com:



 On Fri, Aug 30, 2013 at 9:52 AM, Dominik Mostowiec
 dominikmostow...@gmail.com wrote:

 Hi,
 I got err (400) from radosgw on request:
 2013-08-30 08:09:19.396812 7f3b307c0700  2 req 3070:0.000150::POST

 /dysk/files/test.test%40op.pl/DOMIWENT%202013/DW%202013_03_27/PROJEKTY%202012/ZB%20KROL/Szko%C5%82a%20%C5%81aziska%20ZB%20KROL/sala-%A3aziska_Dolne_PB-0_went_15_11_06%20Layout1%20%283%29.pdf::http
 status=400
 2013-08-30 08:09:34.851892 7f3b55ffb700 10
 s-object=files/test.t...@op.pl/DOMIWENT 2013/Damian
 DW/dw/Specyfikacja istotnych warunkF3w zamF3wienia.doc
 s-bucket=dysk

 What is allowed range of chars in url in radosgw?


 Can you post the full HTTP headers for the response?

 The output you are pasting is not entirely clear to me, is that a single log
 line for the whole request? Maybe it is just the formatting that
 is throwing me off.


 --
 Regards
 Dominik
 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bucket count limit

2013-08-22 Thread Dominik Mostowiec
Hi,
I think about sharding s3 buckets in CEPH cluster, create
bucket-per-XX (256 buckets) or even bucket-per-XXX (4096 buckets)
where XXX is sign from object md5 url.
Could this be the problem? (performance, or some limits)

--
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bucket count limit

2013-08-22 Thread Dominik Mostowiec
I'm sorry for the spam :-(

--
Dominik

2013/8/22 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 I think about sharding s3 buckets in CEPH cluster, create
 bucket-per-XX (256 buckets) or even bucket-per-XXX (4096 buckets)
 where XXX is sign from object md5 url.
 Could this be the problem? (performance, or some limits)

 --
 Regards
 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rgw bucket index

2013-07-21 Thread Dominik Mostowiec
Hi,
Rgw bucket index is in one file (one osd performance issues).
Is there on roudmap sharding or other change to increase performance?


--
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


optimal values for osd threads

2013-07-19 Thread Dominik Mostowiec
Hi,
My config:
osd op threads = 8
osd disk threads = 4
osd recovery threads = 1
osd recovery max active = 1
osd recovery op priority = 10
osd client op priority = 100
osd max backfills = 1

I set it to maximize client operation priority and slow backfill
operations ( client first !! :-) )
Once the osd holding the rgw index died, after the restart the cluster
got stuck on 25 active+recovery_wait, 1 active+recovering;

Please help me choose optimal values for osd recovery threads and
priorty on ceph s3 optimized cluster.

Cluster:
   12 server x 12 osd
   3 mons, 144 osds, 32424 pgs

--
Regards
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: two osd stack on peereng after start osd to recovery

2013-06-05 Thread Dominik Mostowiec
hi,
I have again the same problem.
Have you got any idea?

--
Regards
Dominik

2013/5/23 Dominik Mostowiec dominikmostow...@gmail.com:
 Hi,
 I changed disk after failure (osd.155).
 When osd.155 start on empty disk, and starts recovery, two osd stack
 on peering (108 and 71).
 Logs in attachment.
 Restart (osd 108,71) helps.
 ceph -v
 ceph version 0.56.6 (95a0bda7f007a33b0dc7adf4b330778fa1e5d70c)

 setup:

 6 servers x 26 osd
 6 x mons
 journal and data on the same disk

 Regards

 Dominik



-- 
Pozdrawiam
Dominik
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html