Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Mark Schouten

Which, as a user, is very surprising to me too..
--

Mark Schouten  | Tuxis Internet Engineering
KvK: 61527076  | http://www.tuxis.nl/
T: 0318 200208 | i...@tuxis.nl
 



- Original Message -


From: Wido den Hollander (w...@42on.com)
Date: 16-11-2018 08:25
To: Mark Schouten (m...@tuxis.nl)
Cc: Ceph Users (ceph-us...@ceph.com)
Subject: Re: [ceph-users] PG auto repair with BlueStore


On 11/15/18 7:45 PM, Mark Schouten wrote:
> As a user, I’m very surprised that this isn’t a default setting. 
> 

That is because you can also have FileStore OSDs in a cluster on which
such a auto-repair is not safe.

Wido

> Mark Schouten
> 
>> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het 
>> volgende geschreven:
>>
>> Hi,
>>
>> This question is actually still outstanding. Is there any good reason to
>> keep auto repair for scrub errors disabled with BlueStore?
>>
>> I couldn't think of a reason when using size=3 and min_size=2, so just
>> wondering.
>>
>> Thanks!
>>
>> Wido
>>
>>> On 8/24/18 8:55 AM, Wido den Hollander wrote:
>>> Hi,
>>>
>>> osd_scrub_auto_repair still defaults to false and I was wondering how we
>>> think about enabling this feature by default.
>>>
>>> Would we say it's safe to enable this with BlueStore?
>>>
>>> Wido
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Wido den Hollander


On 11/15/18 7:51 PM, koukou73gr wrote:
> Are there any means to notify the administrator that an auto-repair has
> taken place?

I don't think so. You'll see the cluster go to HEALTH_ERR for a while
before it turns to HEALTH_OK again after the PG has been repaired.

You would have to search the cluster logs to find out that a auto repair
took place on a Placement Group.

Wido

> 
> -K.
> 
> 
> On 2018-11-15 20:45, Mark Schouten wrote:
>> As a user, I’m very surprised that this isn’t a default setting.
>>
>> Mark Schouten
>>
>>> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het
>>> volgende geschreven:
>>>
>>> Hi,
>>>
>>> This question is actually still outstanding. Is there any good reason to
>>> keep auto repair for scrub errors disabled with BlueStore?
>>>
>>> I couldn't think of a reason when using size=3 and min_size=2, so just
>>> wondering.
>>>
>>> Thanks!
>>>
>>> Wido
>>>
 On 8/24/18 8:55 AM, Wido den Hollander wrote:
 Hi,

 osd_scrub_auto_repair still defaults to false and I was wondering
 how we
 think about enabling this feature by default.

 Would we say it's safe to enable this with BlueStore?

 Wido
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Wido den Hollander


On 11/15/18 7:45 PM, Mark Schouten wrote:
> As a user, I’m very surprised that this isn’t a default setting. 
> 

That is because you can also have FileStore OSDs in a cluster on which
such a auto-repair is not safe.

Wido

> Mark Schouten
> 
>> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het 
>> volgende geschreven:
>>
>> Hi,
>>
>> This question is actually still outstanding. Is there any good reason to
>> keep auto repair for scrub errors disabled with BlueStore?
>>
>> I couldn't think of a reason when using size=3 and min_size=2, so just
>> wondering.
>>
>> Thanks!
>>
>> Wido
>>
>>> On 8/24/18 8:55 AM, Wido den Hollander wrote:
>>> Hi,
>>>
>>> osd_scrub_auto_repair still defaults to false and I was wondering how we
>>> think about enabling this feature by default.
>>>
>>> Would we say it's safe to enable this with BlueStore?
>>>
>>> Wido
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] НА: Migration osds to Bluestore on Ubuntu 14.04 Trusty

2018-11-15 Thread Klimenko, Roman
Ok, thx, I'll try ceph-disk.

От: Alfredo Deza 
Отправлено: 15 ноября 2018 г. 20:16
Кому: Klimenko, Roman
Копия: ceph-users@lists.ceph.com
Тема: Re: [ceph-users] Migration osds to Bluestore on Ubuntu 14.04 Trusty

On Thu, Nov 15, 2018 at 8:57 AM Klimenko, Roman  wrote:
>
> Hi everyone!
>
> As I noticed, ceph-volume lacks Ubuntu Trusty compatibility  
> https://tracker.ceph.com/issues/23496
>
> So, I can't follow this instruction 
> http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
>
> Do I have any other option to migrate my Filestore osds (Luminous 12.2.9)  to 
> Bluestore?
>
> P.S This is a test environment, so I can try anything

You could just use ceph-disk, but the way ceph-volume does bluestore
is more robust. I would try really hard to upgrade the OS so that you
can rely on ceph-volume

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] read performance, separate client CRUSH maps or limit osd read access from each client

2018-11-15 Thread Vlad Kopylov
Exactly. But write operations should go to all nodes.

v

On Wed, Nov 14, 2018 at 9:52 PM Konstantin Shalygin  wrote:

> On 11/15/18 9:31 AM, Vlad Kopylov wrote:
> > Thanks Konstantin, I already tried accessing it in different ways and
> > best I got is bulk renamed files and other non presentable data.
> >
> > Maybe to solve this I can create overlapping osd pools?
> > Like one pool includes all 3 osd for replication, and 3 more include
> > one osd at each site with same blocks?
> >
>
> As far as I understand, you need something like this:
>
>
> vm1 io -> building1 osds only
>
> vm2 io -> building2 osds only
>
> vm3 io -> buildgin3 osds only
>
>
> Right?
>
>
>
> k
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Librbd performance VS KRBD performance

2018-11-15 Thread Jason Dillaman
On Thu, Nov 15, 2018 at 2:30 PM 赵赵贺东  wrote:
>
> I test in 12 osds cluster, change objecter_inflight_op_bytes from 100MB to 
> 300MB, performance seems not change obviously.
> But at the beginning , librbd works in better performance in 12 osds cluster. 
> So it seems meaning less for me.
>  In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD 
>  is about 0.89 : 1 (177MB/s : 198MB/s ).
>  In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD 
>  is about  0.38: 1 (420MB/s : 1080MB/s).
>
>
> Our problem is librbd bad performance in big cluster (72 osds)
> But I can not test in 72 osds right now, some other tests are running .
> I will test in 72 osds when our cluster is ready.
>
> It is a little hard to understand that objecter_inflight_op_bytes=100MB works 
> well in 12 osds cluster, but works poor in 72 osd clusters.
> Dose objecter_inflight_op_bytes not have an effect  on krbd, only effect 
> librbd?

Correct -- the "ceph.conf" config settings are for user-space tooling
only. Given the fact that you are writing full 4MiB objects in your
test, any user-space performance degradation is probably going to be
in the librados layer and below. That 100 MiB limit setting will block
the IO path while it waits for in-flight IO to complete. You also
might be just hitting the default throughput of the lower-level
messenger code, so perhaps you need to throw more threads at it
(ms_async_op_threads / ms_async_max_op_threads) or change its
throttles (ms_dispatch_throttle_bytes). Also, depending on your
cluster and krbd versions, perhaps the OSDs are telling your clients
to back-off but only librados is responding to it. You should also
take into account the validity of your test case -- does it really
match your expected workload that you are trying to optimize against?

> Thanks.
>
>
>
> > 在 2018年11月15日,下午3:50,赵赵贺东  写道:
> >
> > Thanks you for your suggestion.
> > It really give me a lot of inspirations.
> >
> >
> > I will test as your suggestion, and browse through src/common/config_opts.h 
> > to see if I can find some configs performance related.
> >
> > But, our osd nodes hardware itself is very poor, that is the truth…we have 
> > to face it.
> > Two osds in an arm board, two gb memory and 2*10T hdd disk on board, so one 
> > osd has 1gb memory to support 10TB hdd disk, we must try to make cluster 
> > works better as we can.
> >
> >
> > Thanks.
> >
> >> 在 2018年11月15日,下午2:08,Jason Dillaman  写道:
> >>
> >> Attempting to send 256 concurrent 4MiB writes via librbd will pretty
> >> quickly hit the default "objecter_inflight_op_bytes = 100 MiB" limit,
> >> which will drastically slow (stall) librados. I would recommend
> >> re-testing librbd w/ a much higher throttle override.
> >> On Thu, Nov 15, 2018 at 11:34 AM 赵赵贺东  wrote:
> >>>
> >>> Thank you for your attention.
> >>>
> >>> Our test are in run in physical machine environments.
> >>>
> >>> Fio for KRBD:
> >>> [seq-write]
> >>> description="seq-write"
> >>> direct=1
> >>> ioengine=libaio
> >>> filename=/dev/rbd0
> >>> numjobs=1
> >>> iodepth=256
> >>> group_reporting
> >>> rw=write
> >>> bs=4M
> >>> size=10T
> >>> runtime=180
> >>>
> >>> */dev/rbd0 mapped by rbd_pool/image2, so KRBD & librbd fio test use the 
> >>> same image.
> >>>
> >>> Fio for librbd:
> >>> [global]
> >>> direct=1
> >>> numjobs=1
> >>> ioengine=rbd
> >>> clientname=admin
> >>> pool=rbd_pool
> >>> rbdname=image2
> >>> invalidate=0# mandatory
> >>> rw=write
> >>> bs=4M
> >>> size=10T
> >>> runtime=180
> >>>
> >>> [rbd_iodepth32]
> >>> iodepth=256
> >>>
> >>>
> >>> Image info:
> >>> rbd image 'image2':
> >>> size 50TiB in 13107200 objects
> >>> order 22 (4MiB objects)
> >>> data_pool: ec_rbd_pool
> >>> block_name_prefix: rbd_data.8.148bb6b8b4567
> >>> format: 2
> >>> features: layering, data-pool
> >>> flags:
> >>> create_timestamp: Wed Nov 14 09:21:18 2018
> >>>
> >>> * data_pool is a EC pool
> >>>
> >>> Pool info:
> >>> pool 8 'rbd_pool' replicated size 2 min_size 1 crush_rule 0 object_hash 
> >>> rjenkins pg_num 256 pgp_num 256 last_change 82627 flags hashpspool 
> >>> stripe_width 0 application rbd
> >>> pool 9 'ec_rbd_pool' erasure size 6 min_size 5 crush_rule 4 object_hash 
> >>> rjenkins pg_num 256 pgp_num 256 last_change 82649 flags 
> >>> hashpspool,ec_overwrites stripe_width 16384 application rbd
> >>>
> >>>
> >>> Rbd cache: Off (Because I think in tcmu , rbd cache will mandatory off, 
> >>> and our cluster will export disk by iscsi in furture.)
> >>>
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> 在 2018年11月15日,下午1:22,Gregory Farnum  写道:
> >>>
> >>> You'll need to provide more data about how your test is configured and 
> >>> run for us to have a good idea. IIRC librbd is often faster than krbd 
> >>> because it can support newer features and things, but krbd may have less 
> >>> overhead and is not dependent on the VM's driver configuration in QEMU...
> >>>
> >>> On Thu, Nov 15, 2018 at 8:22 AM 赵赵贺东  wrote:
> 
>  Hi cephers,
> 

Re: [ceph-users] pg 17.36 is active+clean+inconsistent head expected clone 1 missing?

2018-11-15 Thread Frank Yu
try to restart osd.29, then use pg repair. If this doesn't work or it
appear again after a while, scan your HDD which used for osd.29, maybe
there is bad sector of your disks, just replace the disk with new one.



On Thu, Nov 15, 2018 at 5:00 PM Marc Roos  wrote:

>
> Forgot, these are bluestore osds
>
>
>
> -Original Message-
> From: Marc Roos
> Sent: donderdag 15 november 2018 9:59
> To: ceph-users
> Subject: [ceph-users] pg 17.36 is active+clean+inconsistent head
> expected clone 1 missing?
>
>
>
> I thought I will give it another try, asking again here since there is
> another thread current. I am having this error since a year or so.
>
> This I of course already tried:
> ceph pg deep-scrub 17.36
> ceph pg repair 17.36
>
>
> [@c01 ~]# rados list-inconsistent-obj 17.36
> {"epoch":24363,"inconsistents":[]}
>
>
> [@c01 ~]# ceph pg map 17.36
> osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6]
>
>
> [@c04 ceph]# zgrep ERR ceph-osd.29.log*gz
> ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604 7f25a05b1700 -1
> log_channel(cluster) log [ERR] : deep-scrub 17.36
> 17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:head expected
> clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4 1 missing
> ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454 7f25a05b1700 -1
> log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Regards
Frank Yu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Matthew Vernon

Hi,

[apropos auto-repair for scrub settings]

On 15/11/2018 18:45, Mark Schouten wrote:

As a user, I’m very surprised that this isn’t a default setting.


We've been to cowardly to do it so far; even on a large cluster the 
occasional ceph pg repair hasn't taken up too much admin time, and the 
fact it isn't enabled by default has put us off. This sometimes helps us 
spot OSD drives "on the way out" that haven't actually failed yet, but 
I'd be in favour of auto-repair iff we're confident it's safe (to be 
fair, ceph pg repair is the first port of call anyway, so it's not clear 
what we gain by having a human type it).


Regards,

Matthew


--
The Wellcome Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 
___

ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread koukou73gr
Are there any means to notify the administrator that an auto-repair has 
taken place?


-K.


On 2018-11-15 20:45, Mark Schouten wrote:

As a user, I’m very surprised that this isn’t a default setting.

Mark Schouten


Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het volgende 
geschreven:

Hi,

This question is actually still outstanding. Is there any good reason to
keep auto repair for scrub errors disabled with BlueStore?

I couldn't think of a reason when using size=3 and min_size=2, so just
wondering.

Thanks!

Wido


On 8/24/18 8:55 AM, Wido den Hollander wrote:
Hi,

osd_scrub_auto_repair still defaults to false and I was wondering how we
think about enabling this feature by default.

Would we say it's safe to enable this with BlueStore?

Wido
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Mark Schouten
As a user, I’m very surprised that this isn’t a default setting. 

Mark Schouten

> Op 15 nov. 2018 om 18:40 heeft Wido den Hollander  het 
> volgende geschreven:
> 
> Hi,
> 
> This question is actually still outstanding. Is there any good reason to
> keep auto repair for scrub errors disabled with BlueStore?
> 
> I couldn't think of a reason when using size=3 and min_size=2, so just
> wondering.
> 
> Thanks!
> 
> Wido
> 
>> On 8/24/18 8:55 AM, Wido den Hollander wrote:
>> Hi,
>> 
>> osd_scrub_auto_repair still defaults to false and I was wondering how we
>> think about enabling this feature by default.
>> 
>> Would we say it's safe to enable this with BlueStore?
>> 
>> Wido
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG auto repair with BlueStore

2018-11-15 Thread Wido den Hollander
Hi,

This question is actually still outstanding. Is there any good reason to
keep auto repair for scrub errors disabled with BlueStore?

I couldn't think of a reason when using size=3 and min_size=2, so just
wondering.

Thanks!

Wido

On 8/24/18 8:55 AM, Wido den Hollander wrote:
> Hi,
> 
> osd_scrub_auto_repair still defaults to false and I was wondering how we
> think about enabling this feature by default.
> 
> Would we say it's safe to enable this with BlueStore?
> 
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migration osds to Bluestore on Ubuntu 14.04 Trusty

2018-11-15 Thread Alfredo Deza
On Thu, Nov 15, 2018 at 8:57 AM Klimenko, Roman  wrote:
>
> Hi everyone!
>
> As I noticed, ceph-volume lacks Ubuntu Trusty compatibility  
> https://tracker.ceph.com/issues/23496
>
> So, I can't follow this instruction 
> http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/
>
> Do I have any other option to migrate my Filestore osds (Luminous 12.2.9)  to 
> Bluestore?
>
> P.S This is a test environment, so I can try anything

You could just use ceph-disk, but the way ceph-volume does bluestore
is more robust. I would try really hard to upgrade the OS so that you
can rely on ceph-volume

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD-mirror high cpu usage?

2018-11-15 Thread Magnus Grönlund
Hi,

I’m trying to setup one-way rbd-mirroring for a ceph-cluster used by an
openstack cloud, but the rbd-mirror is unable to “catch up” with the
changes. However it appears to me as if it's not due to the ceph-clusters
or the network but due to the server running the rbd-mirror process running
out of cpu?

Is a high cpu load to be expected or is it a symptom of something else?
Or in other words, what can I check/do to get the mirroring working? 

# rbd mirror pool status nova
health: WARNING
images: 596 total
572 starting_replay
24 replaying

top - 13:31:36 up 79 days,  5:31,  1 user,  load average: 32.27, 26.82,
25.33
Tasks: 360 total,  17 running, 182 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.9 us, 70.0 sy,  0.0 ni, 18.5 id,  0.0 wa,  0.0 hi,  2.7 si,
0.0 st
KiB Mem : 13205185+total, 12862490+free,   579508 used,  2847444 buff/cache
KiB Swap:0 total,0 free,0 used. 12948856+avail Mem
PID USER  PR  NIVIRTRESSHR S  %CPU %MEM
 TIME+ COMMAND
2336553 ceph  20   0   17.1g 178160  20344 S 417.2  0.1  21:50.61
rbd-mirror
2312698 root  20   0   0  0  0 I  70.2  0.0  70:11.51
kworker/12:2
2312851 root  20   0   0  0  0 R  69.2  0.0  62:29.69
kworker/24:1
2324627 root  20   0   0  0  0 I  68.4  0.0  40:36.77
kworker/14:1
2235817 root  20   0   0  0  0 I  68.0  0.0 469:14.08
kworker/8:0
2241720 root  20   0   0  0  0 R  67.3  0.0 437:46.51
kworker/9:1
2306648 root  20   0   0  0  0 R  66.9  0.0 109:27.44
kworker/25:0
2324625 root  20   0   0  0  0 R  66.9  0.0  40:37.53
kworker/13:1
2336318 root  20   0   0  0  0 R  66.7  0.0  14:51.96
kworker/27:3
2324643 root  20   0   0  0  0 I  66.5  0.0  36:21.46
kworker/15:2
2294989 root  20   0   0  0  0 I  66.3  0.0 134:09.89
kworker/11:1
2324626 root  20   0   0  0  0 I  66.3  0.0  39:44.14
kworker/28:2
2324019 root  20   0   0  0  0 I  65.3  0.0  44:51.80
kworker/26:1
2235814 root  20   0   0  0  0 R  65.1  0.0 459:14.70
kworker/29:2
2294174 root  20   0   0  0  0 I  64.5  0.0 220:58.50
kworker/30:1
2324355 root  20   0   0  0  0 R  63.3  0.0  45:04.29
kworker/10:1
2263800 root  20   0   0  0  0 R  62.9  0.0 353:38.48
kworker/31:1
2270765 root  20   0   0  0  0 R  60.2  0.0 294:46.34
kworker/0:0
2294798 root  20   0   0  0  0 R  59.8  0.0 148:48.23
kworker/1:2
2307128 root  20   0   0  0  0 R  59.8  0.0  86:15.45
kworker/6:2
2307129 root  20   0   0  0  0 I  59.6  0.0  85:29.66
kworker/5:0
2294826 root  20   0   0  0  0 R  58.2  0.0 138:53.56
kworker/7:3
2294575 root  20   0   0  0  0 I  57.8  0.0 155:03.74
kworker/2:3
2294310 root  20   0   0  0  0 I  57.2  0.0 176:10.92
kworker/4:2
2295000 root  20   0   0  0  0 I  57.2  0.0 132:47.28
kworker/3:2
2307060 root  20   0   0  0  0 I  56.6  0.0  87:46.59
kworker/23:2
2294931 root  20   0   0  0  0 I  56.4  0.0 133:31.47
kworker/17:2
2318659 root  20   0   0  0  0 I  56.2  0.0  55:01.78
kworker/16:2
2336304 root  20   0   0  0  0 I  56.0  0.0  11:45.92
kworker/21:2
2306947 root  20   0   0  0  0 R  55.6  0.0  90:45.31
kworker/22:2
2270628 root  20   0   0  0  0 I  53.8  0.0 273:43.31
kworker/19:3
2294797 root  20   0   0  0  0 R  52.3  0.0 141:13.67
kworker/18:0
2330537 root  20   0   0  0  0 R  52.3  0.0  25:33.25
kworker/20:2

The main cluster has 12 nodes with 120 OSDs and the backup cluster has 6
nodes with 60 OSDs (but roughly the same amount of storage), the rbd-mirror
runs on a separate server with 2* E5-2650v2 cpus and 128GB memory.

Best regards
/Magnus
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Migration osds to Bluestore on Ubuntu 14.04 Trusty

2018-11-15 Thread Klimenko, Roman
Hi everyone!

As I noticed, ceph-volume lacks Ubuntu Trusty compatibility  
https://tracker.ceph.com/issues/23496

So, I can't follow this instruction 
http://docs.ceph.com/docs/mimic/rados/operations/bluestore-migration/

Do I have any other option to migrate my Filestore osds (Luminous 12.2.9)  to 
Bluestore?

P.S This is a test environment, so I can try anything

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs nfs-ganesha rados_cluster

2018-11-15 Thread Steven Vacaroaia
Thanks Jeff for taking the trouble to respond and your willingness to help

Here are some questions:

- Apparently rados_cluster is gone in 2.8.  There is "fs" and 'fs_ng' now
  However, I was not able to find a config depicting usage
  Would you be able to share your working one  ?

- how would one interpret the output of ganesha-rados-grace ?
( what NE and E means and what are the actions one should take when they
see it )

- how would one check whether active/active is working properly ( i.e both
nfs servers are being used )

I was able to get active passive working using rados_ng and pacemaker
Is there anything pacemaker specific that has to be done to get
active-active working ( assuming , of course , ganesha is configured
properly?

many thanks
Steven

On Thu, 15 Nov 2018 at 06:53, Jeff Layton  wrote:

> > Hi,
> >
> > I've been trying to setup an active active ( or even active passive) NFS
> share for a while without any success
> >
> > Using Mimic 13.2.2 and nfs-ganesha 2.8 with rados_cluster as recovery
> mechanism
> >
> > I focused on corosync/pacemaker as a HA controlling software but I would
> not mind using anything else
> >
> > Has any of you managed to get this working ?
> > If yes, could you please provide some detail / instructions / resources
> / configuration ?
>
> I've gotten it working, but I wrote most of the code so that shouldn't
> be too surprising. The docs are still pretty sketchy at this point, but
> most of the info is distilled into the sample config file and the
> ganesha-rados-grace manpage.
>
> Writing a real howto is on my to-do list but I'm not sure when I'll get
> to it. If you have specific questions, I'm happy to try and answer them
> though.
>
> --
> Jeff Layton 
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Removing orphaned radosgw bucket indexes from pool

2018-11-15 Thread Wido den Hollander
Hi,

Recently we've seen multiple messages on the mailinglists about people
seeing HEALTH_WARN due to large OMAP objects on their cluster. This is
due to the fact that starting with 12.2.6 OSDs warn about this.

I've got multiple people asking me the same questions and I've done some
digging around.

Somebody on the ML wrote this script:

for bucket in `radosgw-admin metadata list bucket | jq -r '.[]' | sort`; do
  actual_id=`radosgw-admin bucket stats --bucket=${bucket} | jq -r '.id'`
  for instance in `radosgw-admin metadata list bucket.instance | jq -r
'.[]' | grep ${bucket}: | cut -d ':' -f 2`
  do
if [ "$actual_id" != "$instance" ]
then
  radosgw-admin bi purge --bucket=${bucket} --bucket-id=${instance}
  radosgw-admin metadata rm bucket.instance:${bucket}:${instance}
fi
  done
done

That partially works, but 'orphaned' objects in the index pool do not work.

So I wrote my own script [0]:

#!/bin/bash
INDEX_POOL=$1

if [ -z "$INDEX_POOL" ]; then
echo "Usage: $0 "
exit 1
fi

INDEXES=$(mktemp)
METADATA=$(mktemp)

trap "rm -f ${INDEXES} ${METADATA}" EXIT

radosgw-admin metadata list bucket.instance|jq -r '.[]' > ${METADATA}
rados -p ${INDEX_POOL} ls > $INDEXES

for OBJECT in $(cat ${INDEXES}); do
MARKER=$(echo ${OBJECT}|cut -d '.' -f 3,4,5)
grep ${MARKER} ${METADATA} > /dev/null
if [ "$?" -ne 0 ]; then
echo $OBJECT
fi
done

It does not remove anything, but for example, it returns these objects:

.dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
.dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
.dir.eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186

The output of:

$ radosgw-admin metadata list|jq -r '.[]'

Does not contain:
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10406917.5752
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6162
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6186

So for me these objects do not seem to be tied to any bucket and seem to
be leftovers which were not cleaned up.

For example, I see these objects tied to a bucket:

- b32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6160
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6188
- eb32b1ca-807a-4867-aea5-ff43ef7647c6.10289105.6167

But notice the difference: 6160, 6188, 6167, but not 6162 nor 6186

Before I remove these objects I want to verify with other users if they
see the same and if my thinking is correct.

Wido

[0]: https://gist.github.com/wido/6650e66b09770ef02df89636891bef04

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd bench error

2018-11-15 Thread ST Wong (ITSC)
Hi,

We're trying to test rbd on a small CEPH running on VM:  8 OSD, 3 mon+mgr using 
rbd bench on 2 rbd from 2 pools with different replication setting:

For pool 4copy:

---
rule 4copy_rule {
id 1
type replicated
min_size 2
max_size 10
step take default
step choose firstn 2 type datacenter
step chooseleaf firstn 2 type host
step emit
}

For pool 2copy:

rule 2copy_rule {
id 2
type replicated
min_size 1
max_size 10
step take default
step choose firstn 2 type datacenter
step chooseleaf firstn 1 type host
step emit
}
---

Rbd bench completed 'normally' on the 2copy pool rbd.  But got error for the 
4copy:

- cut here 
# rbd bench --io-type rw --io-total 1073741824 4copy/foo
bench  type readwrite read:write=50:50 io_size 4096 io_threads 16 bytes 
1073741824 pattern sequential
  SEC   OPS   OPS/SEC   BYTES/SEC
1  5232   5311.74  21756891.99

[snipped]

   33 27584531.19  2175742.58
   34 27920560.05  2293957.89
   35 28272462.05  1892551.92
   36 28624359.65  1473143.60
   37 28672319.51  1308725.83
   38 28736227.08  930138.44
   39 28800175.16  717452.27
   70 28832 15.88  65061.98
2018-11-12 14:40:36.182 7f5893fff700 -1 librbd::ImageWatcher: 0x7f5880002f40 
image watch failed: 140018215563856, (107) Transport endpoint is not connected
2018-11-12 14:40:36.182 7f5893fff700 -1 librbd::Watcher: 0x7f5880002f40 
handle_error: handle=140018215563856: (107) Transport endpoint is not connected
   74 28848  5.85  23980.14
   75 28864  5.04  20646.14
   76 28896  4.20  17193.38
   77 28944  3.79  15509.03
   78 31984413.70  1694525.89
   79 36880   1809.82  7413040.55
   80 38000   1980.06  8110329.05
- cut here 

Will it be performance issue sice it's running on VM?
Would like to know how to get more information for trouble-shooting.

Thanks a lot.
Best Regards,
/stwong

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for debian in Ceph repo

2018-11-15 Thread Kevin Olbrich
I now had the time to test and after installing this package, uploads to
rbd are working perfectly.
Thank you very much fur sharing this!

Kevin

Am Mi., 7. Nov. 2018 um 15:36 Uhr schrieb Kevin Olbrich :

> Am Mi., 7. Nov. 2018 um 07:40 Uhr schrieb Nicolas Huillard <
> nhuill...@dolomede.fr>:
>
>>
>> > It lists rbd but still fails with the exact same error.
>>
>> I stumbled upon the exact same error, and since there was no answer
>> anywhere, I figured it was a very simple problem: don't forget to
>> install the qemu-block-extra package (Debian stretch) along with qemu-
>> utils which contains the qemu-img command.
>> This command is actually compiled with rbd support (hence the output
>> above), but need this extra package to pull actual support-code and
>> dependencies...
>>
>
> I have not been able to test this yet but this package was indeed missing
> on my system!
> Thank you for this hint!
>
>
>> --
>> Nicolas Huillard
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Placement Groups undersized after adding OSDs

2018-11-15 Thread Wido den Hollander


On 11/15/18 4:37 AM, Gregory Farnum wrote:
> This is weird. Can you capture the pg query for one of them and narrow
> down in which epoch it “lost” the previous replica and see if there’s
> any evidence of why?

So I checked it further and dug deeper into the logs and found this on
osd.1982:

2018-11-14 15:03:04.261689 7fde7b525700  0 log_channel(cluster) log
[WRN] : Monitor daemon marked osd.1982 down, but it is still running
2018-11-14 15:03:04.261713 7fde7b525700  0 log_channel(cluster) log
[DBG] : map e647120 wrongly marked me down at e647120

After searching further (Zabbix graphs) it seems that this machine had a
spike in CPU load around that time which probably caused it to be marked
as down.

As OSD 1982 was involved which these PGs it's now in undersized+degraded
state.

Recovery didn't start, but Ceph choose to wait for the backfill to
happen as the PG needed to be vacated from this OSD.

The side-effect is that it took 14 hours before these PGs started to
backfill.

I would say that a PG which is in undersized+degraded should get the
highest possible priority to be repaired asap.

Wido

> On Wed, Nov 14, 2018 at 8:09 PM Wido den Hollander  > wrote:
> 
> Hi,
> 
> I'm in the middle of expanding a Ceph cluster and while having 'ceph -s'
> open I suddenly saw a bunch of Placement Groups go undersized.
> 
> My first hint was that one or more OSDs have failed, but none did.
> 
> So I checked and I saw these Placement Groups undersized:
> 
> 11.3b54 active+undersized+degraded+remapped+backfill_wait
> [1795,639,1422]       1795       [1795,639]           1795
> 11.362f active+undersized+degraded+remapped+backfill_wait
> [1431,1134,2217]       1431      [1134,1468]           1134
> 11.3e31 active+undersized+degraded+remapped+backfill_wait
> [1451,1391,1906]       1451      [1906,2053]           1906
> 11.50c  active+undersized+degraded+remapped+backfill_wait
> [1867,1455,1348]       1867      [1867,2036]           1867
> 11.421e   active+undersized+degraded+remapped+backfilling
> [280,117,1421]        280        [280,117]            280
> 11.700  active+undersized+degraded+remapped+backfill_wait
> [2212,1422,2087]       2212      [2055,2087]           2055
> 11.735    active+undersized+degraded+remapped+backfilling
> [772,1832,1433]        772       [772,1832]            772
> 11.d5a  active+undersized+degraded+remapped+backfill_wait
> [423,1709,1441]        423       [423,1709]            423
> 11.a95  active+undersized+degraded+remapped+backfill_wait
> [1433,1180,978]       1433       [978,1180]            978
> 11.a67  active+undersized+degraded+remapped+backfill_wait
> [1154,1463,2151]       1154      [1154,2151]           1154
> 11.10ca active+undersized+degraded+remapped+backfill_wait
> [2012,486,1457]       2012       [2012,486]           2012
> 11.2439 active+undersized+degraded+remapped+backfill_wait
> [910,1457,1193]        910       [910,1193]            910
> 11.2f7e active+undersized+degraded+remapped+backfill_wait
> [1423,1356,2098]       1423      [1356,2098]           1356
> 
> After searching I found that OSDs
> 1422,1431,1451,1455,1421,1422,1433,1441,1433,1463,1457,1457 and 1423 are
> all running on the same (newly) added host.
> 
> I checked:
> - The host did not reboot
> - The OSDs did not restart
> 
> The OSDs are up_thru since map 646724 which is from 11:05 this morning
> (4,5 hours ago), which is about the same time when these were added.
> 
> So these PGs are currently running on *2* replicas while they should be
> running on *3*.
> 
> We just added 8 nodes with 24 disks each to the cluster, but none of the
> existing OSDs were touched.
> 
> When looking at PG 11.3b54 I see that 1422 is a backfill target:
> 
> $ ceph pg 11.3b54 query|jq '.recovery_state'
> 
> The 'enter time' for this is about 30 minutes ago and that's about the
> same time this has happened.
> 
> 'might_have_unfound' tells me OSD 1982 which is in the same rack as 1422
> (CRUSH replicates over racks), but that OSD is also online.
> 
> It's up_thru = 647122 and that's from about 30 minutes ago. That
> ceph-osd process is however running since September and seems to be
> functioning fine.
> 
> This confuses me as during such an expansion I know that normally a PG
> would map to size+1 until the backfill finishes.
> 
> The cluster is running Luminous 12.2.8 on CentOS 7.5.
> 
> Any ideas on what this could be?
> 
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] Librbd performance VS KRBD performance

2018-11-15 Thread 赵赵贺东
I test in 12 osds cluster, change objecter_inflight_op_bytes from 100MB to 
300MB, performance seems not change obviously.
But at the beginning , librbd works in better performance in 12 osds cluster. 
So it seems meaning less for me.
 In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD 
 is about 0.89 : 1 (177MB/s : 198MB/s ).
 In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD is 
 about  0.38: 1 (420MB/s : 1080MB/s).


Our problem is librbd bad performance in big cluster (72 osds)
But I can not test in 72 osds right now, some other tests are running . 
I will test in 72 osds when our cluster is ready.

It is a little hard to understand that objecter_inflight_op_bytes=100MB works 
well in 12 osds cluster, but works poor in 72 osd clusters.
Dose objecter_inflight_op_bytes not have an effect  on krbd, only effect 
librbd? 

Thanks.



> 在 2018年11月15日,下午3:50,赵赵贺东  写道:
> 
> Thanks you for your suggestion.
> It really give me a lot of inspirations.
> 
> 
> I will test as your suggestion, and browse through src/common/config_opts.h 
> to see if I can find some configs performance related.
> 
> But, our osd nodes hardware itself is very poor, that is the truth…we have to 
> face it.
> Two osds in an arm board, two gb memory and 2*10T hdd disk on board, so one 
> osd has 1gb memory to support 10TB hdd disk, we must try to make cluster 
> works better as we can.
> 
> 
> Thanks.
> 
>> 在 2018年11月15日,下午2:08,Jason Dillaman  写道:
>> 
>> Attempting to send 256 concurrent 4MiB writes via librbd will pretty
>> quickly hit the default "objecter_inflight_op_bytes = 100 MiB" limit,
>> which will drastically slow (stall) librados. I would recommend
>> re-testing librbd w/ a much higher throttle override.
>> On Thu, Nov 15, 2018 at 11:34 AM 赵赵贺东  wrote:
>>> 
>>> Thank you for your attention.
>>> 
>>> Our test are in run in physical machine environments.
>>> 
>>> Fio for KRBD:
>>> [seq-write]
>>> description="seq-write"
>>> direct=1
>>> ioengine=libaio
>>> filename=/dev/rbd0
>>> numjobs=1
>>> iodepth=256
>>> group_reporting
>>> rw=write
>>> bs=4M
>>> size=10T
>>> runtime=180
>>> 
>>> */dev/rbd0 mapped by rbd_pool/image2, so KRBD & librbd fio test use the 
>>> same image.
>>> 
>>> Fio for librbd:
>>> [global]
>>> direct=1
>>> numjobs=1
>>> ioengine=rbd
>>> clientname=admin
>>> pool=rbd_pool
>>> rbdname=image2
>>> invalidate=0# mandatory
>>> rw=write
>>> bs=4M
>>> size=10T
>>> runtime=180
>>> 
>>> [rbd_iodepth32]
>>> iodepth=256
>>> 
>>> 
>>> Image info:
>>> rbd image 'image2':
>>> size 50TiB in 13107200 objects
>>> order 22 (4MiB objects)
>>> data_pool: ec_rbd_pool
>>> block_name_prefix: rbd_data.8.148bb6b8b4567
>>> format: 2
>>> features: layering, data-pool
>>> flags:
>>> create_timestamp: Wed Nov 14 09:21:18 2018
>>> 
>>> * data_pool is a EC pool
>>> 
>>> Pool info:
>>> pool 8 'rbd_pool' replicated size 2 min_size 1 crush_rule 0 object_hash 
>>> rjenkins pg_num 256 pgp_num 256 last_change 82627 flags hashpspool 
>>> stripe_width 0 application rbd
>>> pool 9 'ec_rbd_pool' erasure size 6 min_size 5 crush_rule 4 object_hash 
>>> rjenkins pg_num 256 pgp_num 256 last_change 82649 flags 
>>> hashpspool,ec_overwrites stripe_width 16384 application rbd
>>> 
>>> 
>>> Rbd cache: Off (Because I think in tcmu , rbd cache will mandatory off, and 
>>> our cluster will export disk by iscsi in furture.)
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> 在 2018年11月15日,下午1:22,Gregory Farnum  写道:
>>> 
>>> You'll need to provide more data about how your test is configured and run 
>>> for us to have a good idea. IIRC librbd is often faster than krbd because 
>>> it can support newer features and things, but krbd may have less overhead 
>>> and is not dependent on the VM's driver configuration in QEMU...
>>> 
>>> On Thu, Nov 15, 2018 at 8:22 AM 赵赵贺东  wrote:
 
 Hi cephers,
 
 
 All our cluster osds are deployed in armhf.
 Could someone say something about what is the rational performance rates 
 for librbd VS KRBD ?
 Or rational performance loss range when we use librbd compare to KRBD.
 I googled a lot, but I could not find a solid criterion.
 In fact , it confused me for a long time.
 
 About our tests:
 In a small cluster(12 osds), 4m seq write performance for Librbd VS KRBD 
 is about 0.89 : 1 (177MB/s : 198MB/s ).
 In a big cluster (72 osds), 4m seq write performance for Librbd VS KRBD is 
 about  0.38: 1 (420MB/s : 1080MB/s).
 
 We expect even increase  osd numbers, Librbd performance can keep being 
 close to KRBD.
 
 PS: Librbd performance are tested both in  fio rbd engine & iscsi 
 (tcmu+librbd).
 
 Thanks.
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> ___
>>> ceph-users mailing list
>>> 

Re: [ceph-users] pg 17.36 is active+clean+inconsistent head expected clone 1 missing?

2018-11-15 Thread Marc Roos
 
Forgot, these are bluestore osds



-Original Message-
From: Marc Roos 
Sent: donderdag 15 november 2018 9:59
To: ceph-users
Subject: [ceph-users] pg 17.36 is active+clean+inconsistent head 
expected clone 1 missing?



I thought I will give it another try, asking again here since there is 
another thread current. I am having this error since a year or so.

This I of course already tried:
ceph pg deep-scrub 17.36
ceph pg repair 17.36


[@c01 ~]# rados list-inconsistent-obj 17.36 
{"epoch":24363,"inconsistents":[]}


[@c01 ~]# ceph pg map 17.36
osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6]


[@c04 ceph]# zgrep ERR ceph-osd.29.log*gz
ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604 7f25a05b1700 -1
log_channel(cluster) log [ERR] : deep-scrub 17.36 
17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:head expected 
clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4 1 missing
ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454 7f25a05b1700 -1
log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] pg 17.36 is active+clean+inconsistent head expected clone 1 missing?

2018-11-15 Thread Marc Roos



I thought I will give it another try, asking again here since there is 
another thread current. I am having this error since a year or so.

This I of course already tried:
ceph pg deep-scrub 17.36
ceph pg repair 17.36


[@c01 ~]# rados list-inconsistent-obj 17.36
{"epoch":24363,"inconsistents":[]}


[@c01 ~]# ceph pg map 17.36
osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6]


[@c04 ceph]# zgrep ERR ceph-osd.29.log*gz
ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604 7f25a05b1700 -1 
log_channel(cluster) log [ERR] : deep-scrub 17.36 
17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:head expected 
clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0974:4 1 missing
ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454 7f25a05b1700 -1 
log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS hangs in "heartbeat_map" deadlock

2018-11-15 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Patrick Donnelly (pdonn...@redhat.com):
> > Thanks for the detailed notes. It looks like the MDS is stuck
> > somewhere it's not even outputting any log messages. If possible, it'd
> > be helpful to get a coredump (e.g. by sending SIGQUIT to the MDS) or,
> > if you're comfortable with gdb, a backtrace of any threads that look
> > suspicious (e.g. not waiting on a futex) including `info threads`.

Today the issue reappeared (after being absent for ~ 3 weeks). This time
the standby MDS could take over and would not get into a deadlock
itself. We made gdb traces again, which you can find over here:

https://8n1.org/14011/d444

Would be great if someone could figure out whats causing this issue.

Thanks,

Stefan

-- 
| BIT BV  http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com