Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Brad Hubbard
Excuse the top-posting.

When looking at the logs it helps to filter by the actual thread that crashed.

$ grep 7f08af3b6700 ceph-osd.27.log.last.error.txt|tail -15
 -1001> 2019-10-30 12:55:41.498823 7f08af3b6700  1 --
129.20.199.93:6803/977508 --> 129.20.199.7:0/2975967502 --
osd_op_reply(283046730 rbd_data.384d296b8b4567.0f99
[set-alloc-hint object_size 4194304 write_size 4194304,write
3145728~4096] v194345'6696469 uv6696469 ondisk = 0) v8 --
0x5598ed521440 con 0
  -651> 2019-10-30 12:55:42.211634 7f08af3b6700  5
write_log_and_missing with: dirty_to: 0'0, dirty_from:
4294967295'18446744073709551615, writeout_from:
4294967295'18446744073709551615, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
  -565> 2019-10-30 12:55:42.775786 7f08af3b6700  1 --
129.20.177.3:6802/977508 --> 129.20.177.2:6823/3002168 --
MOSDScrubReserve(5.2d8 REJECT e194345) v1 -- 0x5598ed7e4000 con 0
  -457> 2019-10-30 12:55:43.390134 7f08af3b6700  5
write_log_and_missing with: dirty_to: 0'0, dirty_from:
4294967295'18446744073709551615, writeout_from: 194345'4406723,
trimmed: , trimmed_dups: , clear_divergent_priors: 0
  -435> 2019-10-30 12:55:43.850768 7f08af3b6700  5
write_log_and_missing with: dirty_to: 0'0, dirty_from:
4294967295'18446744073709551615, writeout_from: 194345'1735861,
trimmed: , trimmed_dups: , clear_divergent_priors: 0
  -335> 2019-10-30 12:55:44.637635 7f08af3b6700  5
write_log_and_missing with: dirty_to: 0'0, dirty_from:
4294967295'18446744073709551615, writeout_from: 194345'7602452,
trimmed: , trimmed_dups: , clear_divergent_priors: 0
  -325> 2019-10-30 12:55:44.682357 7f08af3b6700  1 --
129.20.177.3:6802/977508 --> 129.20.177.1:6802/3802 --
osd_repop(client.108792126.1:283046901 6.369 e194345/194339
6:96f81e66:::rbd_data.384d296b8b4567.0f99:head v
194345'6696470) v2 -- 0x5598ee591600 con 0
  -324> 2019-10-30 12:55:44.682450 7f08af3b6700  1 --
129.20.177.3:6802/977508 --> 129.20.177.2:6821/6004637 --
osd_repop(client.108792126.1:283046901 6.369 e194345/194339
6:96f81e66:::rbd_data.384d296b8b4567.0f99:head v
194345'6696470) v2 -- 0x5598cf2ad600 con 0
  -323> 2019-10-30 12:55:44.682510 7f08af3b6700  5
write_log_and_missing with: dirty_to: 0'0, dirty_from:
4294967295'18446744073709551615, writeout_from: 194345'6696470,
trimmed: , trimmed_dups: , clear_divergent_priors: 0
   -20> 2019-10-30 12:55:46.366704 7f08af3b6700  1 --
129.20.177.3:6802/977508 --> 129.20.177.2:6806/1848108 --
pg_scan(digest 2.1d9
2:9b97b661:::rb.0.a7bb39.238e1f29.00107c9b:head-MAX e
194345/194345) v2 -- 0x5598efc0bb80 con 0
 0> 2019-10-30 12:55:46.496423 7f08af3b6700 -1
/build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: In function 'virtual void
PrimaryLogPG::on_local_recover(const hobject_t&, const
ObjectRecoveryInfo&, ObjectContextRef, bool,
ObjectStore::Transaction*)' thread 7f08af3b6700 time 2019-10-30
12:55:46.487842
2019-10-30 12:55:46.557930 7f08af3b6700 -1 *** Caught signal (Aborted) **
 in thread 7f08af3b6700 thread_name:tp_osd_tp
 0> 2019-10-30 12:55:46.557930 7f08af3b6700 -1 *** Caught signal
(Aborted) **
 in thread 7f08af3b6700 thread_name:tp_osd_tp

Since PrimaryLogPG::on_local_recover() prints the object id when the
function is entered at debug level 10 I'd suggest gathering a log at a
higher 'debug_osd' level (I'd suggest 20) to be sure about what object
is causing the issue.

  334 void PrimaryLogPG::on_local_recover(
  335   const hobject_t ,
  336   const ObjectRecoveryInfo &_recovery_info,
  337   ObjectContextRef obc,
  338   bool is_delete,
  339   ObjectStore::Transaction *t
  340   )
  341 {
  342   dout(10) << __func__ << ": " << hoid << dendl;

On Wed, Oct 30, 2019 at 11:43 PM Jérémy Gardais
 wrote:
>
> The "best" health i was able to get was :
> HEALTH_ERR norecover flag(s) set; 1733/37482459 objects misplaced (0.005%); 5 
> scrub errors; Possible data damage: 2 pgs inconsistent; Degraded data 
> redundancy: 7461/37482459 objects degraded (0.020%), 24 pgs degraded, 2 pgs 
> undersized
> OSDMAP_FLAGS norecover flag(s) set
> OBJECT_MISPLACED 1733/37482459 objects misplaced (0.005%)
> OSD_SCRUB_ERRORS 5 scrub errors
> PG_DAMAGED Possible data damage: 2 pgs inconsistent
> pg 2.2ba is active+clean+inconsistent, acting [42,29,30]
> pg 2.2bb is active+clean+inconsistent, acting [25,42,18]
> PG_DEGRADED Degraded data redundancy: 7461/37482459 objects degraded 
> (0.020%), 24 pgs degraded, 2 pgs undersized
> pg 2.3e is active+recovery_wait+degraded, acting [27,31,5]
> pg 2.9d is active+recovery_wait+degraded, acting [27,22,37]
> pg 2.a3 is active+recovery_wait+degraded, acting [27,30,35]
> pg 2.136 is active+recovery_wait+degraded, acting [27,18,22]
> pg 2.150 is active+recovery_wait+degraded, acting [27,19,35]
> pg 2.15e is active+recovery_wait+degraded, acting [27,11,36]
> pg 2.1d9 is stuck undersized for 14023.243179, current state 
> active+undersized+degraded+remapped+backfill_wait, last acting [25,30]
> pg 2.20f is 

Re: [ceph-users] cephfs 1 large omap objects

2019-10-30 Thread Patrick Donnelly
On Wed, Oct 30, 2019 at 9:28 AM Jake Grimmett  wrote:
>
> Hi Zheng,
>
> Many thanks for your helpful post, I've done the following:
>
> 1) set the threshold to 1024 * 1024:
>
> # ceph config set osd \
> osd_deep_scrub_large_omap_object_key_threshold 1048576
>
> 2) deep scrubbed all of the pgs on the two OSD that reported "Large omap
> object found." - these were all in pool 1, which has just four osd.
>
>
> Result: After 30 minutes, all deep-scrubs completed, and all "large omap
> objects" warnings disappeared.
>
> ...should we be worried about the size of these OMAP objects?

No. There are only a few of these objects and it's not caused problems
up to now in any other cluster.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph pg in inactive state

2019-10-30 Thread soumya tr
Thanks 潘东元 for the response.

The creation of a new pool works, and all the PGs corresponding to that
pool have active+clean state.

When I initially set ceph 3 node cluster using juju charms (replication
count per object was set to 3), there were issues with ceph-osd services.
So I had to delete the units and readd them (I did all of them together,
which must have created issues with rebalancing). I assume that the PGs in
the inactive state points to the 3 old OSDs which were deleted.

I assume I will have to create all the pools again. But my concern is about
the default pools.

---
pool 1 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 16 pgp_num 16 last_change 15 flags hashpspool
stripe_width 0 application rgw
pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 19 flags hashpspool
stripe_width 0 application rgw
pool 3 'default.rgw.data.root' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 23 flags hashpspool
stripe_width 0 application rgw
pool 4 'default.rgw.gc' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 27 flags hashpspool
stripe_width 0 application rgw
pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 31 flags hashpspool
stripe_width 0 application rgw
pool 6 'default.rgw.intent-log' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 35 flags hashpspool
stripe_width 0 application rgw
pool 7 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 39 flags hashpspool
stripe_width 0 application rgw
pool 8 'default.rgw.usage' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 43 flags hashpspool
stripe_width 0 application rgw
pool 9 'default.rgw.users.keys' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 47 flags hashpspool
stripe_width 0 application rgw
pool 10 'default.rgw.users.email' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 51 flags hashpspool
stripe_width 0 application rgw
pool 11 'default.rgw.users.swift' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 55 flags hashpspool
stripe_width 0 application rgw
pool 12 'default.rgw.users.uid' replicated size 3 min_size 2 crush_rule 0
object_hash rjenkins pg_num 2 pgp_num 2 last_change 59 flags hashpspool
stripe_width 0 application rgw
pool 13 'default.rgw.buckets.extra' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 2 pgp_num 2 last_change 63 flags hashpspool
stripe_width 0 application rgw
pool 14 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule
0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 67 flags hashpspool
stripe_width 0 application rgw
---

Can you please update if recreating them using rados cli will break
anything?





On Wed, Oct 30, 2019 at 4:56 PM 潘东元  wrote:

> your pg acting set is empty,and cluster report i don't have pg that
> indicate pg dost not have primary osd.
> What are you cluster status when  you are create poo?l
>
> Wido den Hollander  于2019年10月30日周三 下午1:30写道:
> >
> >
> >
> > On 10/30/19 3:04 AM, soumya tr wrote:
> > > Hi all,
> > >
> > > I have a 3 node ceph cluster setup using juju charms. ceph health shows
> > > having inactive pgs.
> > >
> > > ---
> > > /# ceph status
> > >   cluster:
> > > id: 0e36956e-ef64-11e9-b472-00163e6e01e8
> > > health: HEALTH_WARN
> > > Reduced data availability: 114 pgs inactive
> > >
> > >   services:
> > > mon: 3 daemons, quorum
> > > juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0
> > > mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0,
> > > juju-06c3e9-2-lxd-0
> > > osd: 3 osds: 3 up, 3 in
> > >
> > >   data:
> > > pools:   18 pools, 114 pgs
> > > objects: 0  objects, 0 B
> > > usage:   3.0 GiB used, 34 TiB / 34 TiB avail
> > > pgs: 100.000% pgs unknown
> > >  114 unknown/
> > > ---
> > >
> > > *PG health as well shows the PGs are in inactive state*
> > >
> > > ---
> > > /# ceph health detail
> > > HEALTH_WARN Reduced data availability: 114 pgs inactive
> > > PG_AVAILABILITY Reduced data availability: 114 pgs inactive
> > > pg 1.0 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > > pg 1.1 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > > pg 1.2 is stuck inactive for 1454.593774, current state unknown,
> > > last acting []
> > > pg 1.3 is stuck inactive for 1454.593774, current state unknown,
> > > last 

Re: [ceph-users] Ceph pg in inactive state

2019-10-30 Thread soumya tr
Thanks, Wido for the update.

Yeah, I have already tried a restart of ceph-mgr.  But it didn't help.

On Wed, Oct 30, 2019 at 4:30 PM Wido den Hollander  wrote:

>
>
> On 10/30/19 3:04 AM, soumya tr wrote:
> > Hi all,
> >
> > I have a 3 node ceph cluster setup using juju charms. ceph health shows
> > having inactive pgs.
> >
> > ---
> > /# ceph status
> >   cluster:
> > id: 0e36956e-ef64-11e9-b472-00163e6e01e8
> > health: HEALTH_WARN
> > Reduced data availability: 114 pgs inactive
> >
> >   services:
> > mon: 3 daemons, quorum
> > juju-06c3e9-0-lxd-0,juju-06c3e9-2-lxd-0,juju-06c3e9-1-lxd-0
> > mgr: juju-06c3e9-0-lxd-0(active), standbys: juju-06c3e9-1-lxd-0,
> > juju-06c3e9-2-lxd-0
> > osd: 3 osds: 3 up, 3 in
> >
> >   data:
> > pools:   18 pools, 114 pgs
> > objects: 0  objects, 0 B
> > usage:   3.0 GiB used, 34 TiB / 34 TiB avail
> > pgs: 100.000% pgs unknown
> >  114 unknown/
> > ---
> >
> > *PG health as well shows the PGs are in inactive state*
> >
> > ---
> > /# ceph health detail
> > HEALTH_WARN Reduced data availability: 114 pgs inactive
> > PG_AVAILABILITY Reduced data availability: 114 pgs inactive
> > pg 1.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.2 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.3 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.4 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.5 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.6 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.7 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.8 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.9 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 1.a is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 2.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 2.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 3.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 3.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 4.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 4.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 5.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 5.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 6.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 6.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 7.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 7.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 8.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 8.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 9.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 9.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 10.1 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 11.0 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.10 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.11 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.12 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.13 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.14 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.15 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.16 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.17 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.18 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.19 is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 17.1a is stuck inactive for 1454.593774, current state unknown,
> > last acting []
> > pg 18.10 is stuck inactive for 

[ceph-users] Using multisite to migrate data between bucket data pools.

2019-10-30 Thread David Turner
This is a tangent on Paul Emmerich's response to "[ceph-users] Correct
Migration Workflow Replicated -> Erasure Code". I've tried Paul's method
before to migrate between 2 data pools. However I ran into some issues.

The first issue seems like a bug in RGW where the RGW for the new zone was
able to pull data directly from the data pool of the original zone after
the metadata had been sync'd. The metadata seemed to realize the file
actually exists and so it went ahead and grabbed it from the pool backing
the other zone. I worked around that slightly by using cephx to specify
which pools each RGW user could access, but it gives a permission denied
error instead of a file not found error. This happens on buckets that are
set not to replicate as well as buckets that failed to sync properly. Seems
like a bit of a security threat, but not a super common situation at all.

The second issue I think has to do with corrupt index files in my index
pool. Some of the buckets I don't need any more so I went to delete them
for simplicity, but the command failed to delete them. I just set them
aside for now and can just set the ones that I don't need any more to not
replicate on the bucket level. That works for most things, but then I have
a few buckets that I need to migrate, but when I set them to start
replicating the data sync between zones gets stuck. Does anyone have any
ideas on how to clean up the bucket indexes to make these operations
possible?

At this point I've disabled multisite and cleared up the new zone so I can
run operations on these buckets without dealing with multisite and
replication. I've tried a few things and can get some additional
information on my specific errors tomorrow at work.


-- Forwarded message -
From: Paul Emmerich 
Date: Wed, Oct 30, 2019 at 4:32 AM
Subject: [ceph-users] Re: Correct Migration Workflow Replicated -> Erasure
Code
To: Konstantin Shalygin 
Cc: Mac Wynkoop , ceph-users 


We've solved this off-list (because I already got access to the cluster)

For the list:

Copying on rados level is possible, but requires to shut down radosgw
to get a consistent copy. This wasn't feasible here due to the size
and performance.
We've instead added a second zone where the placement maps to an EC
pool to the zonegroup and it's currently copying over data. We'll then
make the second zone master and default and ultimately delete the
first one.
This allows for a migration without downtime.

Another possibility would be using a Transition lifecycle rule, but
that's not ideal because it doesn't actually change the bucket.

I don't think it would be too complicated to add a native bucket
migration mechanism that works similar to "bucket rewrite" (which is
intended for something similar but different).

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs 1 large omap objects

2019-10-30 Thread Jake Grimmett
Hi Zheng,

Many thanks for your helpful post, I've done the following:

1) set the threshold to 1024 * 1024:

# ceph config set osd \
osd_deep_scrub_large_omap_object_key_threshold 1048576

2) deep scrubbed all of the pgs on the two OSD that reported "Large omap
object found." - these were all in pool 1, which has just four osd.


Result: After 30 minutes, all deep-scrubs completed, and all "large omap
objects" warnings disappeared.

...should we be worried about the size of these OMAP objects?

again many thanks,

Jake

On 10/30/19 3:15 AM, Yan, Zheng wrote:
> see https://tracker.ceph.com/issues/42515.  just ignore the warning for now
> 
> On Mon, Oct 7, 2019 at 7:50 AM Nigel Williams
>  wrote:
>>
>> Out of the blue this popped up (on an otherwise healthy cluster):
>>
>> HEALTH_WARN 1 large omap objects
>> LARGE_OMAP_OBJECTS 1 large omap objects
>> 1 large objects found in pool 'cephfs_metadata'
>> Search the cluster log for 'Large omap object found' for more details.
>>
>> "Search the cluster log" is somewhat opaque, there are logs for many 
>> daemons, what is a "cluster" log? In the ML history some found it in the OSD 
>> logs?
>>
>> Another post suggested removing lost+found, but using cephfs-shell I don't 
>> see one at the top-level, is there another way to disable this "feature"?
>>
>> thanks.
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


--
Jake Grimmett
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Jérémy Gardais
The "best" health i was able to get was :
HEALTH_ERR norecover flag(s) set; 1733/37482459 objects misplaced (0.005%); 5 
scrub errors; Possible data damage: 2 pgs inconsistent; Degraded data 
redundancy: 7461/37482459 objects degraded (0.020%), 24 pgs degraded, 2 pgs 
undersized
OSDMAP_FLAGS norecover flag(s) set
OBJECT_MISPLACED 1733/37482459 objects misplaced (0.005%)
OSD_SCRUB_ERRORS 5 scrub errors
PG_DAMAGED Possible data damage: 2 pgs inconsistent
pg 2.2ba is active+clean+inconsistent, acting [42,29,30]
pg 2.2bb is active+clean+inconsistent, acting [25,42,18]
PG_DEGRADED Degraded data redundancy: 7461/37482459 objects degraded (0.020%), 
24 pgs degraded, 2 pgs undersized
pg 2.3e is active+recovery_wait+degraded, acting [27,31,5]
pg 2.9d is active+recovery_wait+degraded, acting [27,22,37]
pg 2.a3 is active+recovery_wait+degraded, acting [27,30,35]
pg 2.136 is active+recovery_wait+degraded, acting [27,18,22]
pg 2.150 is active+recovery_wait+degraded, acting [27,19,35]
pg 2.15e is active+recovery_wait+degraded, acting [27,11,36]
pg 2.1d9 is stuck undersized for 14023.243179, current state 
active+undersized+degraded+remapped+backfill_wait, last acting [25,30]
pg 2.20f is active+recovery_wait+degraded, acting [27,30,2]
pg 2.2a1 is active+recovery_wait+degraded, acting [27,18,35]
pg 2.2b7 is active+recovery_wait+degraded, acting [27,18,36]
pg 2.386 is active+recovery_wait+degraded, acting [27,42,17]
pg 2.391 is active+recovery_wait+degraded, acting [27,15,36]
pg 2.448 is stuck undersized for 51520.798900, current state 
active+recovery_wait+undersized+degraded+remapped, last acting [27,38]
pg 2.456 is active+recovery_wait+degraded, acting [27,5,43]
pg 2.45a is active+recovery_wait+degraded, acting [27,43,36]
pg 2.45f is active+recovery_wait+degraded, acting [27,16,36]
pg 2.46c is active+recovery_wait+degraded, acting [27,30,38]
pg 2.4bf is active+recovery_wait+degraded, acting [27,39,18]
pg 2.522 is active+recovery_wait+degraded, acting [27,17,3]
pg 2.535 is active+recovery_wait+degraded, acting [27,29,36]
pg 2.55a is active+recovery_wait+degraded, acting [27,29,18]
pg 5.23f is active+recovery_wait+degraded, acting [27,39,18]
pg 5.356 is active+recovery_wait+degraded, acting [27,36,15]
pg 5.4a6 is active+recovery_wait+degraded, acting [29,40,30]


After that, the flapping started again :
2019-10-30 12:55:46.772593 mon.r730xd1 [INF] osd.38 failed 
(root=default,datacenter=IPR,room=11B,rack=baie2,host=r740xd1) (connection 
refused reported by osd.22)
2019-10-30 12:55:46.850239 mon.r730xd1 [INF] osd.27 failed 
(root=default,datacenter=IPR,room=11B,rack=baie2,host=r730xd3) (connection 
refused reported by osd.19)
2019-10-30 12:55:56.714029 mon.r730xd1 [WRN] Health check update: 2 osds down 
(OSD_DOWN)


Setting "norecover" flag allow these 2 OSDs to recover up state and
limit the flapping states and many backfills.


In both osd.27 and osd.38 logs, i still find these logs before one
FAILED assert :
-2> 2019-10-30 12:52:31.999571 7fb5c125b700  5 -- 129.20.177.3:6802/870834 
>> 129.20.177.3:6808/810999 conn(0x564c31d31000 :6802 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=97 cs=7 l=0). rx osd.25 seq 
2600 0x564c3a02e1c0 MOSDPGPush(2.1d9 194334/194298 
[PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 
127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, 
omap_entries_size: 0, attrset_size: 1, recovery_info: 
ObjectRecoveryInfo(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926@127481'7241006,
 size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), 
after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, 
data_complete:true, omap_recovered_to:, omap_complete:true, error:false), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3
-1> 2019-10-30 12:52:31.999633 7fb5c125b700  1 -- 129.20.177.3:6802/870834 
<== osd.25 129.20.177.3:6808/810999 2600  MOSDPGPush(2.1d9 194334/194298 
[PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 
127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, 
omap_entries_size: 0, attrset_size: 1, recovery_info: 
ObjectRecoveryInfo(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926@127481'7241006,
 size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), 
after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, 
data_complete:true, omap_recovered_to:, omap_complete:true, error:false), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) 
v3  909+0+0 (4137937376 0 0) 0x564c3a02e1c0 con 0x564c31d31000
0> 2019-10-30 12:52:32.008605 7fb59b03d700 -1 
/build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: In function 'virtual void 

Re: [ceph-users] ceph-ansible / block-db block-wal

2019-10-30 Thread Lars Täuber
I don't use ansible anymore. But this was my config for the host onode1:

./host_vars/onode2.yml:

lvm_volumes:
  - data: /dev/sdb
db: '1'
db_vg: host-2-db
  - data: /dev/sdc
db: '2'
db_vg: host-2-db
  - data: /dev/sde
db: '3'
db_vg: host-2-db
  - data: /dev/sdf
db: '4'
db_vg: host-2-db
…

one config file per host. The LVs were created by hand on a PV over RAID1 over 
two SSDs.
The hosts had empty slots for hdds to be bought later. So I had to "partition" 
the PV by hand, because ansible uses the whole RAID1 only for the present HDDs.

It is said that only certain sizes of DB & WAL partitions are sensible.
I now use 58GiB LVs.
The remaining space in the RAID1 is used for a faster OSD.


Lars


Wed, 30 Oct 2019 10:02:23 +
CUZA Frédéric  ==> "ceph-users@lists.ceph.com" 
 :
> Hi Everyone,
> 
> Does anyone know how to indicate block-db and block-wal to device on ansible ?
> In ceph-deploy it is quite easy :
> ceph-deploy osd create osd_host08 --data /dev/sdl --block-db /dev/sdm12 
> --block-wal /dev/sdn12 -bluestore
> 
> On my data nodes I have 12 HDDs and 2 SSDs I use those SSDs for block-db and 
> block-wal.
> How to indicate for each osd which partition to use ?
> 
> And finally, how do you handle the deployment if you have multiple data nodes 
> setup ?
> SSDs on sdm and sdn on one host and SSDs on sda and sdb on another ?
> 
> Thank you for your help.
> 
> Regards,


-- 
Informationstechnologie
Berlin-Brandenburgische Akademie der Wissenschaften
Jägerstraße 22-23  10117 Berlin
Tel.: +49 30 20370-352   http://www.bbaw.de
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-30 Thread Jérémy Gardais
Thus spake Brad Hubbard (bhubb...@redhat.com) on mercredi 30 octobre 2019 à 
12:50:50:
> Maybe you should set nodown and noout while you do these maneuvers?
> That will minimise peering and recovery (data movement).

As the commands don't take too long, i just had a few slow requests before
the osd was back online. Thanks for the nodown|noout tip.

> > snapid 22772 from osd.29 and osd.42 :
> > ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-29/ 
> > '["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]'
> >  remove
> > ceph-objectstore-tool --pgid 2.2ba --data-path /var/lib/ceph/osd/ceph-42/ 
> > '["2.2ba",{"oid":"rbd_data.b4537a2ae8944a.425f","key":"","snapid":22772,"hash":719609530,"max":0,"pool":2,"namespace":"","max":0}]'
> >  remove
>
> That looks right.

Done preceded by some dump, get-attrs,… commands. Yeah, not sure about
the real interest, but just to be cautious ^^

The PG still looks inconsistent. I asked for a deep-scrub 2.2ba,
still waiting. `list-inconsistent-obj` and `list-inconsistent-snapset`
returns "No scrub information available for pg 2.2ba" for the moment.

I also tried to manage pg 2.371 with :
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-27/ 
'["2.371",{"oid":"rbd_data.0c16b76b8b4567.000420bb","key":"","snapid":22822,"hash":3394498417,"max":0,"pool":2,"namespace":"","max":0}]'
 remove

This one doesn't looks inconsistent anymore but i also asked for a
deep-scrup.


> You should probably try and work out what caused the issue and take
> steps to minimise the likelihood of a recurrence. This is not expected
> behaviour in a correctly configured and stable environment.

Yes… I wait a little bit to see what happens with these commands first
and keep an eye on the cluster health and logs…


--
Gardais Jérémy
Institut de Physique de Rennes
Université Rennes 1
Téléphone: 02-23-23-68-60
Mail & bonnes pratiques: http://fr.wikipedia.org/wiki/Nétiquette
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-ansible / block-db block-wal

2019-10-30 Thread CUZA Frédéric
Hi Everyone,

Does anyone know how to indicate block-db and block-wal to device on ansible ?
In ceph-deploy it is quite easy :
ceph-deploy osd create osd_host08 --data /dev/sdl --block-db /dev/sdm12 
--block-wal /dev/sdn12 -bluestore

On my data nodes I have 12 HDDs and 2 SSDs I use those SSDs for block-db and 
block-wal.
How to indicate for each osd which partition to use ?

And finally, how do you handle the deployment if you have multiple data nodes 
setup ?
SSDs on sdm and sdn on one host and SSDs on sda and sdb on another ?

Thank you for your help.

Regards,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS client hanging and cache issues

2019-10-30 Thread Bob Farrell
Thanks a lot and sorry for the spam, I should have checked ! We are on
18.04, kernel is currently upgrading so if you don't hear back from me then
it is fixed.

Thanks for the amazing support !

On Wed, 30 Oct 2019, 09:54 Lars Täuber,  wrote:

> Hi.
>
> Sounds like you use kernel clients with kernels from canonical/ubuntu.
> Two kernels have a bug:
> 4.15.0-66
> and
> 5.0.0-32
>
> Updated kernels are said to have fixes.
> Older kernels also work:
> 4.15.0-65
> and
> 5.0.0-31
>
>
> Lars
>
> Wed, 30 Oct 2019 09:42:16 +
> Bob Farrell  ==> ceph-users 
> :
> > Hi. We are experiencing a CephFS client issue on one of our servers.
> >
> > ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus
> > (stable)
> >
> > Trying to access, `umount`, or `umount -f` a mounted CephFS volumes
> causes
> > my shell to hang indefinitely.
> >
> > After a reboot I can remount the volumes cleanly but they drop out after
> <
> > 1 hour of use.
> >
> > I see this log entry multiple times when I reboot the server:
> > ```
> > cache_from_obj: Wrong slab cache. inode_cache but object is from
> > ceph_inode_info
> > ```
> > The machine then reboots after approx. 30 minutes.
> >
> > All other Ceph/CephFS clients and servers seem perfectly happy. CephFS
> > cluster is HEALTH_OK.
> >
> > Any help appreciated. If I can provide any further details please let me
> > know.
> >
> > Thanks in advance,
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS client hanging and cache issues

2019-10-30 Thread Lars Täuber
Hi.

Sounds like you use kernel clients with kernels from canonical/ubuntu.
Two kernels have a bug:
4.15.0-66
and
5.0.0-32

Updated kernels are said to have fixes.
Older kernels also work: 
4.15.0-65
and
5.0.0-31


Lars

Wed, 30 Oct 2019 09:42:16 +
Bob Farrell  ==> ceph-users  :
> Hi. We are experiencing a CephFS client issue on one of our servers.
> 
> ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus
> (stable)
> 
> Trying to access, `umount`, or `umount -f` a mounted CephFS volumes causes
> my shell to hang indefinitely.
> 
> After a reboot I can remount the volumes cleanly but they drop out after <
> 1 hour of use.
> 
> I see this log entry multiple times when I reboot the server:
> ```
> cache_from_obj: Wrong slab cache. inode_cache but object is from
> ceph_inode_info
> ```
> The machine then reboots after approx. 30 minutes.
> 
> All other Ceph/CephFS clients and servers seem perfectly happy. CephFS
> cluster is HEALTH_OK.
> 
> Any help appreciated. If I can provide any further details please let me
> know.
> 
> Thanks in advance,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS client hanging and cache issues

2019-10-30 Thread Paul Emmerich
Kernel bug due to a bad backport, see recent posts here.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Oct 30, 2019 at 10:42 AM Bob Farrell  wrote:
>
> Hi. We are experiencing a CephFS client issue on one of our servers.
>
> ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus 
> (stable)
>
> Trying to access, `umount`, or `umount -f` a mounted CephFS volumes causes my 
> shell to hang indefinitely.
>
> After a reboot I can remount the volumes cleanly but they drop out after < 1 
> hour of use.
>
> I see this log entry multiple times when I reboot the server:
> ```
> cache_from_obj: Wrong slab cache. inode_cache but object is from 
> ceph_inode_info
> ```
> The machine then reboots after approx. 30 minutes.
>
> All other Ceph/CephFS clients and servers seem perfectly happy. CephFS 
> cluster is HEALTH_OK.
>
> Any help appreciated. If I can provide any further details please let me know.
>
> Thanks in advance,
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS client hanging and cache issues

2019-10-30 Thread Bob Farrell
Hi. We are experiencing a CephFS client issue on one of our servers.

ceph version 14.2.0 (3a54b2b6d167d4a2a19e003a705696d4fe619afc) nautilus
(stable)

Trying to access, `umount`, or `umount -f` a mounted CephFS volumes causes
my shell to hang indefinitely.

After a reboot I can remount the volumes cleanly but they drop out after <
1 hour of use.

I see this log entry multiple times when I reboot the server:
```
cache_from_obj: Wrong slab cache. inode_cache but object is from
ceph_inode_info
```
The machine then reboots after approx. 30 minutes.

All other Ceph/CephFS clients and servers seem perfectly happy. CephFS
cluster is HEALTH_OK.

Any help appreciated. If I can provide any further details please let me
know.

Thanks in advance,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] changing set-require-min-compat-client will cause hiccup?

2019-10-30 Thread Konstantin Shalygin

Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG 
balancer. Will this cause a disconnect of all clients? We're talking cephfs and 
RBD images for VMs.
Or is it save to switch that live?

Is safe.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] changing set-require-min-compat-client will cause hiccup?

2019-10-30 Thread Philippe D'Anjou
Hi,I need to change set-require-min-compat-clientto use upmap mode for the PG 
balancer. Will this cause a disconnect of all clients? We're talking cephfs and 
RBD images for VMs.
Or is it save to switch that live?
thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] very high ram usage by OSDs on Nautilus

2019-10-30 Thread Philippe D'Anjou
Yes you were right, somehow there was an unusual high memory target set, not 
sure where this came from. I set it back to normal now, that should fix it I 
guess.
Thanks
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com