[ceph-users] Problem with Stale+Perring PGs

2018-02-20 Thread Rudolf Kasper

Hi,

we got currently the problem that our ceph-cluster has 5 pgs that are 
stale+peering.

the cluster is a 16 OSD, 4 Host Cluster.

how can i tell ceph that these pgs are not there anymore?


ceph pg 1.20a mark_unfound_lost delete
Error ENOENT: i don't have pgid 1.20a


ceph pg 1.20a mark_unfound_lost revert
Error ENOENT: i don't have pgid 1.20a


ceph pg dump_stuck stale
ok
PG_STAT STATE UP   UP_PRIMARY ACTING ACTING_PRIMARY
1.327   stale+peering [12] 12   [12] 12
1.3a8   stale+peering [12] 12   [12] 12
1.38f   stale+peering [12] 12   [12] 12
1.20a   stale+peering [12] 12   [12] 12
1.288   stale+peering [12] 12   [12] 12


OSD.12 was removed and is back the cluster with a new drive.

kind regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread John Petrini
I just wanted to add that even if you only provide one monitor IP the
client will learn about the other monitors on mount so failover will still
work. This only presents a problem when you try to remount or reboot a
client while the monitor it's using is unavailable.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous: Help with Bluestore WAL

2018-02-20 Thread Linh Vu
Yeah that is the expected behaviour.


From: ceph-users  on behalf of Balakumar 
Munusawmy 
Sent: Wednesday, 21 February 2018 1:41:36 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Luminous: Help with Bluestore WAL


Hi,

We were recently testing luminous with bluestore. We have 6 node cluster 
with 12 HDD and 1 SSD each, we used ceph-volume with LVM to create all the OSD 
and attached with SSD WAL (LVM ). We create individual 10GBx12 LVM on single 
SDD for each WAL. So all the OSD WAL is on the singe SSD. Problem is if we pull 
the SSD out, it brings down all the 12 OSD on that node. Is that expected 
behavior or we are missing any configuration ?





Thanks and Regards,
Balakumar Munusawmy
Mobile:+19255771645
Skype: 
bala.munusa...@latticeworkinc.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread Linh Vu
You're welcome :)


From: Paul Kunicki 
Sent: Wednesday, 21 February 2018 1:16:32 PM
To: Linh Vu
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Automated Failover of CephFS Clients

Thanks for the hint Linh. I had neglected to read up on mount.fuse.ceph here: 
http://docs.ceph.com/docs/master/man/8/mount.fuse.ceph/

I am trying this right now.

Thanks again.




 *   [http://storage.pardot.com/24972/128155/sl_logo.png]
*   Paul Kunicki
*   Systems Manager
*   SproutLoud Media Networks, LLC.
*   954-476-6211 ext. 144
*   pkuni...@sproutloud.com
*   www.sproutloud.com
*   [http://storage.pardot.com/24972/128145/inc_500.jpg]  •  
[http://storage.pardot.com/24972/128137/deloitte.jpg]  •  
[http://storage.pardot.com/24972/128135/CIO_Review___google_logo.png]  •  
[http://storage.pardot.com/24972/128151/Marketing_logo_email.jpg]
*   [Facebook]    [Twitter] 
   [LinkedIn] 
   [LinkedIn] 
   [YouTube] 


The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and for others authorized 
to receive it. It may contain confidential or legally privileged information. 
If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, or taking any action in reliance on these 
contents is strictly prohibited and may be unlawful. In the event the recipient 
or recipients of this communication are under a non-disclosure agreement, any 
and all information discussed during phone calls and online presentations fall 
under the agreements signed by both parties. If you received this communication 
in error, please notify us immediately by responding to this e-mail.



On Tue, Feb 20, 2018 at 8:35 PM, Linh Vu 
mailto:v...@unimelb.edu.au>> wrote:

Why are you mounting with a single monitor? What is your mount command or 
/etc/fstab? Ceph-fuse should use the available mons you have on the client's 
/etc/ceph/ceph.conf.


e.g our /etc/fstab entry:


none/home   fuse.ceph   
_netdev,ceph.id=myclusterid,ceph.client_mountpoint=/home,nonempty,defaults
  0   0


From: ceph-users 
mailto:ceph-users-boun...@lists.ceph.com>> 
on behalf of Paul Kunicki 
mailto:pkuni...@sproutloud.com>>
Sent: Wednesday, 21 February 2018 10:23:37 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Automated Failover of CephFS Clients

We currently have multiple CephFS fuse clients mounting the same filesystem 
from a single monitor even though our cluster has several monitors. I would 
like to automate the fail over from one monitor to another. Is this possible 
and where should I bee looking for guidance on accomplishing this in 
production? I would like to avoid involving NFS if possible and Pacemaker seems 
to be overkill but we can go that route if that is what is in fact needed?

We are currently at 12.2.2 on Centos 7.4.

Thanks.



 *   [http://storage.pardot.com/24972/128155/sl_logo.png]
*   Paul Kunicki
*   Systems Manager
*   SproutLoud Media Networks, LLC.
*   954-476-6211 ext. 144
*   pkuni...@sproutloud.com
*   www.sproutloud.com
*   [http://storage.pardot.com/24972/128145/inc_500.jpg]  •  
[http://storage.pardot.com/24972/128137/deloitte.jpg]  •  
[http://storage.pardot.com/24972/128135/CIO_Review___google_logo.png]  •  
[http://storage.pardot.com/24972/128151/Marketing_logo_email.jpg]
*   [Facebook]    [Twitter] 
   [LinkedIn] 
   [LinkedIn] 
   [YouTube] 


The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and for others authorized 
to receive it. It may contain confidential or legally privileged information. 
If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, or taking any action in reliance on these 
contents is strictly prohibited and may be unlawful. In the event the recipient 
or recipients of this communication are under a non-disclosure agreement, any 
and all information discussed during phone calls and online presentations fall 
under the agreements signed by both parties. If you received this communication 
in err

Re: [ceph-users] Help with Bluestore WAL

2018-02-20 Thread Konstantin Shalygin

Hi,
 We were recently testing luminous with bluestore. We have 6 node cluster 
with 12 HDD and 1 SSD each, we used ceph-volume with LVM to create all the OSD 
and attached with SSD WAL (LVM ). We create individual 10GBx12 LVM on single 
SDD for each WAL. So all the OSD WAL is on the singe SSD. Problem is if we pull 
the SSD out, it brings down all the 12 OSD on that node. Is that expected 
behavior or we are missing any configuration ?



Yes, you should plan your failure domain, i.e. what will be happens with 
your cluster if one backend ssd suddenly dies.


Also you should plan mass failures of your ssd/nvme, so rule of thumb - 
don't overload your flash backend with osd. Recommend is ~4 osd per 
ssd/nvme.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous: Help with Bluestore WAL

2018-02-20 Thread Balakumar Munusawmy
Hi,
We were recently testing luminous with bluestore. We have 6 node cluster 
with 12 HDD and 1 SSD each, we used ceph-volume with LVM to create all the OSD 
and attached with SSD WAL (LVM ). We create individual 10GBx12 LVM on single 
SDD for each WAL. So all the OSD WAL is on the singe SSD. Problem is if we pull 
the SSD out, it brings down all the 12 OSD on that node. Is that expected 
behavior or we are missing any configuration ?


Thanks and Regards,
Balakumar Munusawmy
Mobile:+19255771645
Skype: 
bala.munusa...@latticeworkinc.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-20 Thread nokia ceph
Hi Alfredo Deza,

I understand the point between lvm and simple however we see issue , was it
issue in luminous because we use same ceph config and workload from client.
The graphs i attached in previous mail is from ceph-volume lvm osd.

In this case does it ococcupies 2 times only inside ceph. If we consider
only lvm based system does this high iops because of dm-cache created for
each osd?.

Meanwhile i will update some graphs to show this once i have.

Thanks,
Muthu

On Tuesday, February 20, 2018, Alfredo Deza  wrote:

>
>
> On Mon, Feb 19, 2018 at 9:29 PM, nokia ceph 
> wrote:
>
>> Hi Alfredo Deza,
>>
>> We have 5 node platforms with lvm osd created from scratch and another 5
>> node platform migrated from kraken which is ceph volume simple. Both has
>> same issue . Both platform has only hdd for osd.
>>
>> We also noticed 2 times disk iops more compare to kraken , this causes
>> less read performance. During rocksdb compaction the situation is worse.
>>
>>
>> Meanwhile we are building another platform creating osd using ceph-disk
>> and analyse on this.
>>
>
> If you have two platforms, one with `simple` and the other one with `lvm`
> experiencing the same, then something else must be at fault here.
>
> The `simple` setup in ceph-volume basically keeps everything as it was
> before, it just captures details of what devices were being used so OSDs
> can be started. There is no interaction from ceph-volume
> in there that could cause something like this.
>
>
>
>> Thanks,
>> Muthu
>>
>>
>>
>> On Tuesday, February 20, 2018, Alfredo Deza  wrote:
>>
>>>
>>>
>>> On Mon, Feb 19, 2018 at 2:01 PM, nokia ceph 
>>> wrote:
>>>
 Hi All,

 We have 5 node clusters with EC 4+1 and use bluestore since last year
 from Kraken.
 Recently we migrated all our platforms to luminous 12.2.2 and finally
 all OSDs migrated to ceph-volume simple type and on few platforms installed
 ceph using ceph-volume .

 Now we see two times more traffic in read compare to client traffic on
 migrated platform and newly created platforms . This was not the case in
 older releases where ceph status read B/W will be same as client read
 traffic.

 Some network graphs :

 *Client network interface* towards ceph public interface : shows
 *4.3Gbps* read


 [image: Inline image 2]

 *Ceph Node Public interface* : Each node around 960Mbps * 5 node =*
 4.6 Gbps *- this matches.
 [image: Inline image 3]

 Ceph status output : show  1032 MB/s =* 8.06 Gbps*

 cn6.chn6us1c1.cdn ~# ceph status
   cluster:
 id: abda22db-3658-4d33-9681-e3ff10690f88
 health: HEALTH_OK

   services:
 mon: 5 daemons, quorum cn6,cn7,cn8,cn9,cn10
 mgr: cn6(active), standbys: cn7, cn9, cn10, cn8
 osd: 340 osds: 340 up, 340 in

   data:
 pools:   1 pools, 8192 pgs
 objects: 270M objects, 426 TB
 usage:   581 TB used, 655 TB / 1237 TB avail
 pgs: 8160 active+clean
  32   active+clean+scrubbing

   io:
 client:   *1032 MB/s rd*, 168 MB/s wr, 1908 op/s rd, 1594 op/s wr


 Write operation we don't see this issue. Client traffic and this
 matches.
 Is this expected behavior in Luminous and ceph-volume lvm or a bug ?
 Wrong calculation in ceph status read B/W ?

>>>
>>> You mentioned `ceph-volume simple` but here you say lvm. With LVM
>>> ceph-volume will create the OSDs from scratch, while "simple" will keep
>>> whatever OSD was created before.
>>>
>>> Have you created the OSDs from scratch with ceph-volume? or is it just
>>> using "simple" , managing a previously deployed OSD?
>>>

 Please provide your feedback.

 Thanks,
 Muthu



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread Paul Kunicki
Thanks for the hint Linh. I had neglected to read up on mount.fuse.ceph
here: http://docs.ceph.com/docs/master/man/8/mount.fuse.ceph/

I am trying this right now.

Thanks again.




   -
  - *Paul Kunicki*
 - *Systems Manager*
 - SproutLoud Media Networks, LLC.
 - 954-476-6211 ext. 144
 - pkuni...@sproutloud.com
 - www.sproutloud.com
  -  •   •   •
  - [image: Facebook]   [image:
 Twitter]   [image: LinkedIn]
   [image:
 LinkedIn]   [image: YouTube]
 

  The information contained in this communication is intended solely
  for the use of the individual or entity to whom it is addressed and for
  others authorized to receive it. It may contain confidential or legally
  privileged information. If you are not the intended recipient, you are
  hereby notified that any disclosure, copying, distribution, or taking any
  action in reliance on these contents is strictly prohibited and may be
  unlawful. In the event the recipient or recipients of this communication
  are under a non-disclosure agreement, any and all information discussed
  during phone calls and online presentations fall under the agreements
  signed by both parties. If you received this communication in
error, please
  notify us immediately by responding to this e-mail.



On Tue, Feb 20, 2018 at 8:35 PM, Linh Vu  wrote:

> Why are you mounting with a single monitor? What is your mount command or
> /etc/fstab? Ceph-fuse should use the available mons you have on the
> client's /etc/ceph/ceph.conf.
>
>
> e.g our /etc/fstab entry:
>
>
> none/home   fuse.ceph   _netdev,ceph.id=myclusterid,
> ceph.client_mountpoint=/home,nonempty,defaults  0   0
> --
> *From:* ceph-users  on behalf of Paul
> Kunicki 
> *Sent:* Wednesday, 21 February 2018 10:23:37 AM
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] Automated Failover of CephFS Clients
>
> We currently have multiple CephFS fuse clients mounting the same
> filesystem from a single monitor even though our cluster has several
> monitors. I would like to automate the fail over from one monitor to
> another. Is this possible and where should I bee looking for guidance on
> accomplishing this in production? I would like to avoid involving NFS if
> possible and Pacemaker seems to be overkill but we can go that route if
> that is what is in fact needed?
>
> We are currently at 12.2.2 on Centos 7.4.
>
> Thanks.
>
>
>
>-
>   - *Paul Kunicki*
>  - *Systems Manager*
>  - SproutLoud Media Networks, LLC.
>  - 954-476-6211 ext. 144
>  - pkuni...@sproutloud.com
>  - www.sproutloud.com
>   -  •   •   •
>   - [image: Facebook]   [image:
>  Twitter]   [image: LinkedIn]
>    [image:
>  LinkedIn]   [image:
>  YouTube] 
>
>   The information contained in this communication is intended solely
>   for the use of the individual or entity to whom it is addressed and for
>   others authorized to receive it. It may contain confidential or legally
>   privileged information. If you are not the intended recipient, you are
>   hereby notified that any disclosure, copying, distribution, or taking 
> any
>   action in reliance on these contents is strictly prohibited and may be
>   unlawful. In the event the recipient or recipients of this communication
>   are under a non-disclosure agreement, any and all information discussed
>   during phone calls and online presentations fall under the agreements
>   signed by both parties. If you received this communication in error, 
> please
>   notify us immediately by responding to this e-mail.
>
> 
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help with Bluestore WAL

2018-02-20 Thread Balakumar Munusawmy
Hi,
We were recently testing luminous with bluestore. We have 6 node cluster 
with 12 HDD and 1 SSD each, we used ceph-volume with LVM to create all the OSD 
and attached with SSD WAL (LVM ). We create individual 10GBx12 LVM on single 
SDD for each WAL. So all the OSD WAL is on the singe SSD. Problem is if we pull 
the SSD out, it brings down all the 12 OSD on that node. Is that expected 
behavior or we are missing any configuration ?


Thanks and Regards,
Balakumar Munusawmy
Mobile:+19255771645
Skype: 
bala.munusa...@latticeworkinc.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume activation

2018-02-20 Thread Oliver Freyermuth
Many thanks for your replies! 

Am 21.02.2018 um 02:20 schrieb Alfredo Deza:
> On Tue, Feb 20, 2018 at 5:56 PM, Oliver Freyermuth
>  wrote:
>> Dear Cephalopodians,
>>
>> with the release of ceph-deploy we are thinking about migrating our 
>> Bluestore-OSDs (currently created with ceph-disk via old ceph-deploy)
>> to be created via ceph-volume (with LVM).
> 
> When you say migrating, do you mean creating them again from scratch
> or making ceph-volume take over the previously created OSDs
> (ceph-volume can do both)

I would recreate from scratch to switch to LVM, we have a k=4 m=2 EC-pool with 
6 hosts, so I can just take down a full host and recreate. 
But good to know both would work! 

> 
>>
>> I note two major changes:
>> 1. It seems the block.db partitions have to be created beforehand, manually.
>>With ceph-disk, one should not do that - or manually set the correct 
>> PARTTYPE ID.
>>Will ceph-volume take care of setting the PARTTYPE on existing partitions 
>> for block.db now?
>>Is it not necessary anymore?
>>Is the config option bluestore_block_db_size now also obsoleted?
> 
> Right, ceph-volume will not create any partitions for you, so no, it
> will not take care of setting PARTTYPE either. If your setup requires
> a block.db, then this must be
> created beforehand and then passed onto ceph-volume. The one
> requirement if it is a partition is to have a PARTUUID. For logical
> volumes it can just work as-is. This is
> explained in detail at
> http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#bluestore
> 
> PARTUUID information for ceph-volume at:
> http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#partitioning

Ok. 
So do I understand correctly that the PARTTYPE setting (i.e. those magic 
numbers as found e.g. in ceph-disk sources in PTYPE:
https://github.com/ceph/ceph/blob/master/src/ceph-disk/ceph_disk/main.py#L62 )
is not needed anymore for the block.db partitions, since it was effectively 
only there
to have udev work?

I remember from ceph-disk that if I created the block.db partition beforehand 
and without setting the magic PARTTYPE,
it would become unhappy. 
ceph-volume and the systemd activation path should not care at all if I 
understand this correctly. 

So in short, to create a new OSD, steps for me would be:
- Create block.db partition (and don't care about PARTTYPE). 
  I do only have to make sure it has a PARTUUID. 
- ceph-volume lvm create --bluestore --block.db /dev/sdag1 --data /dev/sda
  (or the same via ceph-deploy)


>>
>> 2. Activation does not work via udev anymore, which solves some racy things.
>>
>> This second major change makes me curious: How does activation work now?
>> In the past, I could reinstall the full OS, install ceph packages, trigger 
>> udev / reboot and all OSDs would come back,
>> without storing any state / activating any services in the OS.
> 
> Activation works via systemd. This is explained in detail here
> http://docs.ceph.com/docs/master/ceph-volume/lvm/activate
> 
> Nothing with `ceph-volume lvm` requires udev for discovery. If you
> need to re-install the OS and recover your OSDs all you need to do is
> to
> re-activate them. You would need to know what the ID and UUID of the OSDs is.
> 
> If you don't have that information handy, you can run:
> 
> ceph-volume lvm list
> 
> And all the information will be available. This will persist even on
> system re-installs

Understood - so indeed the manual step would be to run "list" and then activate 
the OSDs one-by-one
to re-create the service files. 
More cumbersome than letting udev do it's thing, but it certainly gives more 
control,
so it seems preferrable. 

Are there plans to have something like
"ceph-volume discover-and-activate" 
which would effectively do something like:
ceph-volume list and activate all OSDs which are re-discovered from LVM 
metadata? 

This would largely simplify OS reinstalls (otherwise I'll likely write a small 
shell script to do exactly that),
and as far as I understand, activating an already activated OSD should be 
harmless (it should only re-enable
an already enabled service file). 

> 
>>
>> Does this still work?
>> Or is there a manual step needed to restore the ceph-osd@ID-UUID services 
>> which at first glance appear to store state (namely ID and UUID)?
> 
> The manual step would be to call activate as described here
> http://docs.ceph.com/docs/master/ceph-volume/lvm/activate/#new-osds
>>
>> If that's the case:
>> - What is this magic manual step?
> 
> Linked above
> 
>> - Is it still possible to flip two disks within the same OSD host without 
>> issues?
> 
> What do you mean by "flip" ?

Sorry, I was unclear on this. I meant exchanging two harddrives with each other 
within a single OSD host,
e.g. /dev/sda => /dev/sdc and /dev/sdc => /dev/sda (for controller weirdness or 
whatever reason). 
If I understand correctly, this should not be a problem at all, since OSD-ID 
and PARTUUID are unaffected by that
(as you write, LVM meta

Re: [ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread Linh Vu
Why are you mounting with a single monitor? What is your mount command or 
/etc/fstab? Ceph-fuse should use the available mons you have on the client's 
/etc/ceph/ceph.conf.


e.g our /etc/fstab entry:


none/home   fuse.ceph   
_netdev,ceph.id=myclusterid,ceph.client_mountpoint=/home,nonempty,defaults  0   
0


From: ceph-users  on behalf of Paul Kunicki 

Sent: Wednesday, 21 February 2018 10:23:37 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Automated Failover of CephFS Clients

We currently have multiple CephFS fuse clients mounting the same filesystem 
from a single monitor even though our cluster has several monitors. I would 
like to automate the fail over from one monitor to another. Is this possible 
and where should I bee looking for guidance on accomplishing this in 
production? I would like to avoid involving NFS if possible and Pacemaker seems 
to be overkill but we can go that route if that is what is in fact needed?

We are currently at 12.2.2 on Centos 7.4.

Thanks.



 *   [http://storage.pardot.com/24972/128155/sl_logo.png]
*   Paul Kunicki
*   Systems Manager
*   SproutLoud Media Networks, LLC.
*   954-476-6211 ext. 144
*   pkuni...@sproutloud.com
*   www.sproutloud.com
*   [http://storage.pardot.com/24972/128145/inc_500.jpg]  •  
[http://storage.pardot.com/24972/128137/deloitte.jpg]  •  
[http://storage.pardot.com/24972/128135/CIO_Review___google_logo.png]  •  
[http://storage.pardot.com/24972/128151/Marketing_logo_email.jpg]
*   [Facebook]    [Twitter] 
   [LinkedIn] 
   [LinkedIn] 
   [YouTube] 


The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and for others authorized 
to receive it. It may contain confidential or legally privileged information. 
If you are not the intended recipient, you are hereby notified that any 
disclosure, copying, distribution, or taking any action in reliance on these 
contents is strictly prohibited and may be unlawful. In the event the recipient 
or recipients of this communication are under a non-disclosure agreement, any 
and all information discussed during phone calls and online presentations fall 
under the agreements signed by both parties. If you received this communication 
in error, please notify us immediately by responding to this e-mail.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Migrating to new pools

2018-02-20 Thread Rafael Lopez
>
> There is also work-in-progress for online
> image migration [1] that will allow you to keep using the image while
> it's being migrated to a new destination image.


Hi Jason,

Is there any recommended method/workaround for live rbd migration in
luminous? eg. snapshot/copy or export/import then export/import-diff?
We are looking at options for moving large RBDs (100T) to a new pool with
minimal downtime.

I was thinking we might be able to configure/hack rbd mirroring to mirror
to a pool on the same cluster but I gather from the OP and your post that
this is not really possible?

-- 
*Rafael Lopez*
Research Devops Engineer
Monash University eResearch Centre
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume activation

2018-02-20 Thread Alfredo Deza
On Tue, Feb 20, 2018 at 5:56 PM, Oliver Freyermuth
 wrote:
> Dear Cephalopodians,
>
> with the release of ceph-deploy we are thinking about migrating our 
> Bluestore-OSDs (currently created with ceph-disk via old ceph-deploy)
> to be created via ceph-volume (with LVM).

When you say migrating, do you mean creating them again from scratch
or making ceph-volume take over the previously created OSDs
(ceph-volume can do both)

>
> I note two major changes:
> 1. It seems the block.db partitions have to be created beforehand, manually.
>With ceph-disk, one should not do that - or manually set the correct 
> PARTTYPE ID.
>Will ceph-volume take care of setting the PARTTYPE on existing partitions 
> for block.db now?
>Is it not necessary anymore?
>Is the config option bluestore_block_db_size now also obsoleted?

Right, ceph-volume will not create any partitions for you, so no, it
will not take care of setting PARTTYPE either. If your setup requires
a block.db, then this must be
created beforehand and then passed onto ceph-volume. The one
requirement if it is a partition is to have a PARTUUID. For logical
volumes it can just work as-is. This is
explained in detail at
http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#bluestore

PARTUUID information for ceph-volume at:
http://docs.ceph.com/docs/master/ceph-volume/lvm/prepare/#partitioning
>
> 2. Activation does not work via udev anymore, which solves some racy things.
>
> This second major change makes me curious: How does activation work now?
> In the past, I could reinstall the full OS, install ceph packages, trigger 
> udev / reboot and all OSDs would come back,
> without storing any state / activating any services in the OS.

Activation works via systemd. This is explained in detail here
http://docs.ceph.com/docs/master/ceph-volume/lvm/activate

Nothing with `ceph-volume lvm` requires udev for discovery. If you
need to re-install the OS and recover your OSDs all you need to do is
to
re-activate them. You would need to know what the ID and UUID of the OSDs is.

If you don't have that information handy, you can run:

ceph-volume lvm list

And all the information will be available. This will persist even on
system re-installs

>
> Does this still work?
> Or is there a manual step needed to restore the ceph-osd@ID-UUID services 
> which at first glance appear to store state (namely ID and UUID)?

The manual step would be to call activate as described here
http://docs.ceph.com/docs/master/ceph-volume/lvm/activate/#new-osds
>
> If that's the case:
> - What is this magic manual step?

Linked above

> - Is it still possible to flip two disks within the same OSD host without 
> issues?

What do you mean by "flip" ?

>   I would guess so, since the services would detect the disk in the 
> ceph-volume trigger phase.
> - Is it still possible to take a disk from one OSD host, and put it in 
> another one, or does this now need a manual interaction?
>   With ceph-disk / udev, it did not, since udev triggered disk activation and 
> then the service was created at runtime.

It is technically possible, the lvm part of it was built with this in
mind. The LVM metadata will persist with the device, so this is not a
problem. Just manual activation would be needed.

>
> Many thanks for your help and cheers,
> Oliver
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Automated Failover of CephFS Clients

2018-02-20 Thread Paul Kunicki
We currently have multiple CephFS fuse clients mounting the same filesystem
from a single monitor even though our cluster has several monitors. I would
like to automate the fail over from one monitor to another. Is this
possible and where should I bee looking for guidance on accomplishing this
in production? I would like to avoid involving NFS if possible and
Pacemaker seems to be overkill but we can go that route if that is what is
in fact needed?

We are currently at 12.2.2 on Centos 7.4.

Thanks.



   -
  - *Paul Kunicki*
 - *Systems Manager*
 - SproutLoud Media Networks, LLC.
 - 954-476-6211 ext. 144
 - pkuni...@sproutloud.com
 - www.sproutloud.com
  -  •   •   •
  - [image: Facebook]   [image:
 Twitter]   [image: LinkedIn]
   [image:
 LinkedIn]   [image: YouTube]
 

  The information contained in this communication is intended solely
  for the use of the individual or entity to whom it is addressed and for
  others authorized to receive it. It may contain confidential or legally
  privileged information. If you are not the intended recipient, you are
  hereby notified that any disclosure, copying, distribution, or taking any
  action in reliance on these contents is strictly prohibited and may be
  unlawful. In the event the recipient or recipients of this communication
  are under a non-disclosure agreement, any and all information discussed
  during phone calls and online presentations fall under the agreements
  signed by both parties. If you received this communication in
error, please
  notify us immediately by responding to this e-mail.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Bluestore performance question

2018-02-20 Thread Oliver Freyermuth
Answering the first RDMA question myself... 

Am 18.02.2018 um 16:45 schrieb Oliver Freyermuth:
> This leaves me with two questions:
> - Is it safe to use RDMA with 12.2.2 already? Reading through this mail 
> archive, 
>   I grasped it may lead to memory exhaustion and in any case needs some hacks 
> to the systemd service files. 

I tried that on our cluster and while I had a running cluster for a few 
minutes, I ran into many random disconnects,
mons and mgrs disconnecting, osds vanishing, no client being able to connect... 
I got the very same issues described here:
https://tracker.ceph.com/issues/22944
I'm also on CentOS 7.4, with Connect-X3 cards, but was not using a modern 
Mellanox OFED, but the stack that came with CentOS 7.4. 

Hence, I reverted to IPoIB.
However, I got a significant performance improvement (> 2x) by switching 
to mode "connected" and MTU 65520 instead of mode "datagram" and MTU 2044 as 
outlined e.g. here:
https://wiki.gentoo.org/wiki/InfiniBand#Performance_tuning

Total throughput in iperf (send + recv) is now about 30 GBit/s. 

Even though this is not "perfect" (harddrives are a bit bored...), it's 
sufficient for our usecase and runs very stable. I'll try some sysctl tuning in 
the next days. 

> - Is it already clear whether RDMA will be part of 12.2.3? 
> 
> Also, of course the final question from the last mail:
> "Why is data moved in a k=4 m=2 EC-pool with 6 hosts and failure domain 
> "host" after failure of one host?"
> is still open. 
> 
> Many thanks already, this helped a lot to understand things better!
> 
> Cheers,
> Oliver
> 




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-volume activation

2018-02-20 Thread Oliver Freyermuth
Dear Cephalopodians,

with the release of ceph-deploy we are thinking about migrating our 
Bluestore-OSDs (currently created with ceph-disk via old ceph-deploy)
to be created via ceph-volume (with LVM). 

I note two major changes:
1. It seems the block.db partitions have to be created beforehand, manually. 
   With ceph-disk, one should not do that - or manually set the correct 
PARTTYPE ID. 
   Will ceph-volume take care of setting the PARTTYPE on existing partitions 
for block.db now? 
   Is it not necessary anymore? 
   Is the config option bluestore_block_db_size now also obsoleted? 

2. Activation does not work via udev anymore, which solves some racy things. 

This second major change makes me curious: How does activation work now? 
In the past, I could reinstall the full OS, install ceph packages, trigger udev 
/ reboot and all OSDs would come back,
without storing any state / activating any services in the OS. 

Does this still work?
Or is there a manual step needed to restore the ceph-osd@ID-UUID services which 
at first glance appear to store state (namely ID and UUID)? 

If that's the case:
- What is this magic manual step? 
- Is it still possible to flip two disks within the same OSD host without 
issues? 
  I would guess so, since the services would detect the disk in the ceph-volume 
trigger phase. 
- Is it still possible to take a disk from one OSD host, and put it in another 
one, or does this now need a manual interaction? 
  With ceph-disk / udev, it did not, since udev triggered disk activation and 
then the service was created at runtime. 

Many thanks for your help and cheers,
Oliver



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw bucket inaccessible - appears to be using incorrect index pool?

2018-02-20 Thread Graham Allan

On 02/19/2018 09:49 PM, Robin H. Johnson wrote:


When I read the bucket instance metadata back again, it still reads
"placement_rule": "" so I wonder if the bucket_info change is really
taking effect.

So it never showed the new placement_rule if you did a get after the
put?


I think not. It's odd; it returned an empty list once, then reverted to 
producing a file not found error. Hard to explain or understand that!



A quick debug session seems to show it still querying the wrong pool
(100) for the index, so it seems that my attempt to update the
bucket_info is either failing or incorrect!

Did you run a local build w/ the linked patch? I think that would have
more effect than


I did just build a local copy of 12.2.2 with the patch - and it does 
seem to fix it.


Thanks!

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-20 Thread Bryan Banister
HI David [Resending with smaller message size],

I tried setting the OSDs down and that does clear the blocked requests 
momentarily but they just return back to the same state.  Not sure how to 
proceed here, but one thought was just to do a full cold restart of the entire 
cluster.  We have disabled our backups so the cluster is effectively down.  Any 
recommendations on next steps?

This also seems like a pretty serious issue, given that making this change has 
effectively broken the cluster.  Perhaps Ceph should not allow you to increase 
the number of PGs so drastically or at least make you put in a 
‘--yes-i-really-mean-it’ flag?

Or perhaps just some warnings on the docs.ceph.com placement groups page 
(http://docs.ceph.com/docs/master/rados/operations/placement-groups/ ) and the 
ceph command man page?

Would be good to help other avoid this pitfall.

Thanks again,
-Bryan

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, February 16, 2018 3:21 PM
To: Bryan Banister mailto:bbanis...@jumptrading.com>>
Cc: Bryan Stillwell mailto:bstillw...@godaddy.com>>; 
Janne Johansson mailto:icepic...@gmail.com>>; Ceph Users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

Note: External Email

That sounds like a good next step.  Start with OSDs involved in the longest 
blocked requests.  Wait a couple minutes after the osd marks itself back up and 
continue through them.  Hopefully things will start clearing up so that you 
don't need to mark all of them down.  There is usually a only a couple OSDs 
holding everything up.

On Fri, Feb 16, 2018 at 4:15 PM Bryan Banister 
mailto:bbanis...@jumptrading.com>> wrote:
Thanks David,

Taking the list of all OSDs that are stuck reports that a little over 50% of 
all OSDs are in this condition.  There isn’t any discernable pattern that I can 
find and they are spread across the three servers.  All of the OSDs are online 
as far as the service is concern.

I have also taken all PGs that were reported the health detail output and 
looked for any that report “peering_blocked_by” but none do, so I can’t tell if 
any OSD is actually blocking the peering operation.

As suggested, I got a report of all peering PGs:
[root@carf-ceph-osd01 ~]# ceph health detail | grep "pg " | grep peering | sort 
-k13
pg 14.fe0 is stuck peering since forever, current state peering, last 
acting [104,94,108]
pg 14.fe0 is stuck unclean since forever, current state peering, last 
acting [104,94,108]
pg 14.fbc is stuck peering since forever, current state peering, last 
acting [110,91,0]
pg 14.fd1 is stuck peering since forever, current state peering, last 
acting [130,62,111]
pg 14.fd1 is stuck unclean since forever, current state peering, last 
acting [130,62,111]
pg 14.fed is stuck peering since forever, current state peering, last 
acting [32,33,82]
pg 14.fed is stuck unclean since forever, current state peering, last 
acting [32,33,82]
pg 14.fee is stuck peering since forever, current state peering, last 
acting [37,96,68]
pg 14.fee is stuck unclean since forever, current state peering, last 
acting [37,96,68]
pg 14.fe8 is stuck peering since forever, current state peering, last 
acting [45,31,107]
pg 14.fe8 is stuck unclean since forever, current state peering, last 
acting [45,31,107]
pg 14.fc1 is stuck peering since forever, current state peering, last 
acting [59,124,39]
pg 14.ff2 is stuck peering since forever, current state peering, last 
acting [62,117,7]
pg 14.ff2 is stuck unclean since forever, current state peering, last 
acting [62,117,7]
pg 14.fe4 is stuck peering since forever, current state peering, last 
acting [84,55,92]
pg 14.fe4 is stuck unclean since forever, current state peering, last 
acting [84,55,92]
pg 14.fb0 is stuck peering since forever, current state peering, last 
acting [94,30,38]
pg 14.ffc is stuck peering since forever, current state peering, last 
acting [96,53,70]
pg 14.ffc is stuck unclean since forever, current state peering, last 
acting [96,53,70]

Some have common OSDs but some OSDs only listed once.

Should I try just marking OSDs with stuck requests down to see if that will 
re-assert them?

Thanks!!
-Bryan

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, February 16, 2018 2:51 PM

To: Bryan Banister mailto:bbanis...@jumptrading.com>>
Cc: Bryan Stillwell mailto:bstillw...@godaddy.com>>; 
Janne Johansson mailto:icepic...@gmail.com>>; Ceph Users 
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

Note: External Email

The questions I definitely know the answer to first, and then we'll continue 
from there.  If an OSD is blocking peering but is online, when you mark it as 
down in the cluster it receives a message in it's log saying it was wrongly 
marked d

[ceph-users] ceph-deploy ver 2 - [ceph_deploy.gatherkeys][WARNING] No mon key found in host

2018-02-20 Thread Steven Vacaroaia
Hi,

I have decided to redeploy my test cluster using latest ceph-deploy and
Luminous

I cannot pass the ceph-deploy mon create-initial stage due to

[ceph_deploy.gatherkeys][WARNING] No mon key found in host

Any help will be appreciated

ceph-deploy --version
2.0.0

[cephuser@ceph prodceph]$ ls -al
total 140
drwxrwxr-x  2 cephuser cephuser72 Feb 20 16:04 .
drwx-- 15 cephuser cephuser  4096 Feb 20 13:29 ..
-rw-rw-r--  1 cephuser cephuser  1367 Feb 20 16:19 ceph.conf
-rw-rw-r--  1 cephuser cephuser 94602 Feb 20 16:22 ceph-deploy-ceph.log
-rw---  1 cephuser cephuser73 Feb 20 16:04 ceph.mon.keyring

[mon01][INFO  ] monitor: mon.mon01 is running
[mon01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.mon01.asok mon_status
[ceph_deploy.mon][INFO  ] processing monitor mon.mon01
[mon01][DEBUG ] connection detected need for sudo
[mon01][DEBUG ] connected to host: mon01
[mon01][DEBUG ] detect platform information from remote host
[mon01][DEBUG ] detect machine type
[mon01][DEBUG ] find the location of an executable
[mon01][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon
/var/run/ceph/ceph-mon.mon01.asok mon_status
[ceph_deploy.mon][INFO  ] mon.mon01 monitor has reached quorum!
[ceph_deploy.mon][INFO  ] all initial monitors are running and have formed
quorum
[ceph_deploy.mon][INFO  ] Running gatherkeys...
[ceph_deploy.gatherkeys][INFO  ] Storing keys in temp directory
/tmp/tmpdvkaM7
[mon01][DEBUG ] connection detected need for sudo
[mon01][DEBUG ] connected to host: mon01
[mon01][DEBUG ] detect platform information from remote host
[mon01][DEBUG ] detect machine type
[mon01][DEBUG ] get remote short hostname
[mon01][DEBUG ] fetch remote file
[ceph_deploy.gatherkeys][WARNIN] No mon key found in host: mon01
[ceph_deploy.gatherkeys][ERROR ] Failed to connect to host:mon01
[ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmpdvkaM7
[ceph_deploy][ERROR ] RuntimeError: Failed to connect any mon
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy 2.0.0

2018-02-20 Thread Alfredo Deza
A fully backwards incompatible release of ceph-deploy was completed in
early January [0] which removed ceph-disk as a backend to create OSDs
in favor of ceph-volume.

The backwards incompatible change means that the API for creating OSDs
has changed [1], and also that it now relies on Ceph versions that are
Luminous or newer (e.g. dev releases).

The last version that supported ceph-disk is 1.5.39. Since ceph-deploy
gets included in community Ceph repos, these will get updated with
Ceph releases. The upcoming Luminous release will include version
2.0.0 of ceph-deploy.

Our repositories do not support multi-package versions, so it is
currently not possible to specify a previous version of ceph-deploy
even though we include them in all Ceph repos. As a workaround,
ceph-deploy is also released to the Python package Index [2] and can
be installed form there.

We want to keep the easy ceph-deploy deployment experience, so please
follow bugs as soon as you find them at
http://tracker.ceph.com/projects/ceph-deploy/issues/new

[0] http://docs.ceph.com/ceph-deploy/docs/changelog.html#id1
[1] http://docs.ceph.com/ceph-deploy/docs/index.html#deploying-osds
[2] https://pypi.python.org/pypi/ceph-deploy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Missing clones

2018-02-20 Thread Eugen Block

Alright, good luck!
The results would be interesting. :-)


Zitat von Karsten Becker :


Hi Eugen,

yes, I also see the rbd_data.-Number changing. This can be caused by me
by deleting snapshots and trying to move over VMs to another pool which
is not affected.

Currently I'm trying to move the Finance VM, which is a very old VM
which got created as one of the first VMs and is still alive (as the
only one of this age). Maybe it's really a problem of "old" VM formats,
like mentioned in the links somebody sent where snapshots had wrong/old
bits that a new Ceph could not understrand anymore.

We'll see... the VM is large and currently copying... if the error gets
also copied, the VM format/age is the cause. If not, ... hm...   :-D

Nevertheless thank you for your help!
Karsten




On 20.02.2018 15:47, Eugen Block wrote:

I'm not quite sure how to interpret this, but there are different
objects referenced. From the first log output you pasted:


2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9
10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected
clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1
missing


From the failed PG import the logs mention two different objects:


Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head#

snapset 0=[]:{}
Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18#

And your last log output has another two different objects:


Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#

snapset 0=[]:{}
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#


So in total we're seeing five different rbd_data objects here:

 - rbd_data.2313975238e1f29
 - rbd_data.f5b8603d1b58ba
 - rbd_data.966489238e1f29
 - rbd_data.e57feb238e1f29
 - rbd_data.4401c7238e1f29

This doesn't make too much sense to me, yet. Which ones are belongig to
your corrupted VM? Do you have a backup of the VM in case the repair fails?


Zitat von Karsten Becker :


Nope:


Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#
snapset 0=[]:{}
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23#
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head#
snapset 612=[23,22,15]:{19=[15],23=[23,22]}
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&, const
std::set&,
MapCacher::Transaction,
ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20
13:56:20.672430
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED
assert(r == -2)
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7fd4478c68f2]
 2: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&,
MapCacher::Transaction, std::allocator >,
ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
 3: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x5569304ca01b]
 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
[0x5569304caae8]
 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string,
std::allocator >, ObjectStore::Sequencer&)+0x1135)
[0x5569304d12f5]
 6: (main()+0x3909) [0x556930432349]
 7: (__libc_start_main()+0xf1) [0x7fd444d252b1]
 8: (_start()+0x2a) [0x5569304ba01a]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
*** Caught signal (Aborted) **
 in thread 7fd45147a400 thread_name:ceph-objectstor
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (()+0x913f14) [0x556930ae1f14]
 2: (()+0x110c0) [0x7fd44619e0c0]
 3: (gsignal()+0xcf) [0x7fd444d37fcf]
 4: (abort()+0x16a) [0x7fd444d393fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x28e) [0x7fd4478c6a7e]
 6: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&,
MapCacher::Transaction, std::allocator >,
ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
 7: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x5569304ca01b]
 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
[0x5569304caae8]
 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string,
std::allocator >, ObjectStore::Sequencer&)+0x1135)
[0x5569304d12f5]
 10: (main()+0x3909) [0x556930432349]
 11: (__libc_start_main()+0xf1) [0x7fd444d252b1]
 12: (_start()+0x2a) [0x5569304ba01a]
Aborted




What I also do not understand: If I take your approach of finding out
what is stored in the PG, I get no match with my PG ID anymore.

If I take the approach of "rbd info" which was posted by Mykola Golub, I
get a match

Re: [ceph-users] Missing clones

2018-02-20 Thread Karsten Becker
Hi Eugen,

yes, I also see the rbd_data.-Number changing. This can be caused by me
by deleting snapshots and trying to move over VMs to another pool which
is not affected.

Currently I'm trying to move the Finance VM, which is a very old VM
which got created as one of the first VMs and is still alive (as the
only one of this age). Maybe it's really a problem of "old" VM formats,
like mentioned in the links somebody sent where snapshots had wrong/old
bits that a new Ceph could not understrand anymore.

We'll see... the VM is large and currently copying... if the error gets
also copied, the VM format/age is the cause. If not, ... hm...   :-D

Nevertheless thank you for your help!
Karsten




On 20.02.2018 15:47, Eugen Block wrote:
> I'm not quite sure how to interpret this, but there are different
> objects referenced. From the first log output you pasted:
> 
>> 2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9
>> 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head expected
>> clone 10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1
>> missing
> 
> From the failed PG import the logs mention two different objects:
> 
>> Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head#
> snapset 0=[]:{}
> Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18#
> 
> And your last log output has another two different objects:
> 
>> Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#
> snapset 0=[]:{}
> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#
> 
> 
> So in total we're seeing five different rbd_data objects here:
> 
>  - rbd_data.2313975238e1f29
>  - rbd_data.f5b8603d1b58ba
>  - rbd_data.966489238e1f29
>  - rbd_data.e57feb238e1f29
>  - rbd_data.4401c7238e1f29
> 
> This doesn't make too much sense to me, yet. Which ones are belongig to
> your corrupted VM? Do you have a backup of the VM in case the repair fails?
> 
> 
> Zitat von Karsten Becker :
> 
>> Nope:
>>
>>> Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#
>>> snapset 0=[]:{}
>>> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#
>>> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23#
>>> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head#
>>> snapset 612=[23,22,15]:{19=[15],23=[23,22]}
>>> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function
>>> 'void SnapMapper::add_oid(const hobject_t&, const
>>> std::set&,
>>> MapCacher::Transaction,
>>> ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20
>>> 13:56:20.672430
>>> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED
>>> assert(r == -2)
>>>  ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
>>> luminous (stable)
>>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x102) [0x7fd4478c68f2]
>>>  2: (SnapMapper::add_oid(hobject_t const&, std::set>> std::less, std::allocator > const&,
>>> MapCacher::Transaction>> std::char_traits, std::allocator >,
>>> ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
>>>  3: (get_attrs(ObjectStore*, coll_t, ghobject_t,
>>> ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
>>> SnapMapper&)+0xafb) [0x5569304ca01b]
>>>  4: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
>>> ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
>>> [0x5569304caae8]
>>>  5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
>>> std::__cxx11::basic_string,
>>> std::allocator >, ObjectStore::Sequencer&)+0x1135)
>>> [0x5569304d12f5]
>>>  6: (main()+0x3909) [0x556930432349]
>>>  7: (__libc_start_main()+0xf1) [0x7fd444d252b1]
>>>  8: (_start()+0x2a) [0x5569304ba01a]
>>>  NOTE: a copy of the executable, or `objdump -rdS ` is
>>> needed to interpret this.
>>> *** Caught signal (Aborted) **
>>>  in thread 7fd45147a400 thread_name:ceph-objectstor
>>>  ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
>>> luminous (stable)
>>>  1: (()+0x913f14) [0x556930ae1f14]
>>>  2: (()+0x110c0) [0x7fd44619e0c0]
>>>  3: (gsignal()+0xcf) [0x7fd444d37fcf]
>>>  4: (abort()+0x16a) [0x7fd444d393fa]
>>>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x28e) [0x7fd4478c6a7e]
>>>  6: (SnapMapper::add_oid(hobject_t const&, std::set>> std::less, std::allocator > const&,
>>> MapCacher::Transaction>> std::char_traits, std::allocator >,
>>> ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
>>>  7: (get_attrs(ObjectStore*, coll_t, ghobject_t,
>>> ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
>>> SnapMapper&)+0xafb) [0x5569304ca01b]
>>>  8: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
>>> ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
>>> [0x5569304caae8]
>>>  9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
>>> std::__cxx11::basic_string,
>>> std::allocator >, ObjectStore::Sequencer&)+0x1135)
>>> [0x5569304d12f5]
>>>  10: (main()+0x3909) [0x556930432349]
>>>  11: (__libc_start_main()+0xf1

Re: [ceph-users] Missing clones

2018-02-20 Thread Eugen Block
I'm not quite sure how to interpret this, but there are different  
objects referenced. From the first log output you pasted:


2018-02-19 11:00:23.183695 osd.29 [ERR] repair 10.7b9  
10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:head  
expected clone  
10:9defb021:::rbd_data.2313975238e1f29.0002cbb5:64e 1 missing


From the failed PG import the logs mention two different objects:


Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head#

snapset 0=[]:{}
Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18#

And your last log output has another two different objects:


Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#

snapset 0=[]:{}
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#


So in total we're seeing five different rbd_data objects here:

 - rbd_data.2313975238e1f29
 - rbd_data.f5b8603d1b58ba
 - rbd_data.966489238e1f29
 - rbd_data.e57feb238e1f29
 - rbd_data.4401c7238e1f29

This doesn't make too much sense to me, yet. Which ones are belongig  
to your corrupted VM? Do you have a backup of the VM in case the  
repair fails?



Zitat von Karsten Becker :


Nope:


Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#
snapset 0=[]:{}
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23#
Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head#
snapset 612=[23,22,15]:{19=[15],23=[23,22]}
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function  
'void SnapMapper::add_oid(const hobject_t&, const  
std::set&,  
MapCacher::Transaction,  
ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20  
13:56:20.672430
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED  
assert(r == -2)
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)  
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x102) [0x7fd4478c68f2]
 2: (SnapMapper::add_oid(hobject_t const&, std::setstd::less, std::allocator > const&,  
MapCacher::Transactionstd::char_traits, std::allocator >,  
ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
 3: (get_attrs(ObjectStore*, coll_t, ghobject_t,  
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,  
SnapMapper&)+0xafb) [0x5569304ca01b]
 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t,  
ceph::buffer::list&, OSDMap&, bool*,  
ObjectStore::Sequencer&)+0x738) [0x5569304caae8]
 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,  
std::__cxx11::basic_string,  
std::allocator >, ObjectStore::Sequencer&)+0x1135)  
[0x5569304d12f5]

 6: (main()+0x3909) [0x556930432349]
 7: (__libc_start_main()+0xf1) [0x7fd444d252b1]
 8: (_start()+0x2a) [0x5569304ba01a]
 NOTE: a copy of the executable, or `objdump -rdS ` is  
needed to interpret this.

*** Caught signal (Aborted) **
 in thread 7fd45147a400 thread_name:ceph-objectstor
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)  
luminous (stable)

 1: (()+0x913f14) [0x556930ae1f14]
 2: (()+0x110c0) [0x7fd44619e0c0]
 3: (gsignal()+0xcf) [0x7fd444d37fcf]
 4: (abort()+0x16a) [0x7fd444d393fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x28e) [0x7fd4478c6a7e]
 6: (SnapMapper::add_oid(hobject_t const&, std::setstd::less, std::allocator > const&,  
MapCacher::Transactionstd::char_traits, std::allocator >,  
ceph::buffer::list>*)+0x8e9) [0x556930765fe9]
 7: (get_attrs(ObjectStore*, coll_t, ghobject_t,  
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,  
SnapMapper&)+0xafb) [0x5569304ca01b]
 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t,  
ceph::buffer::list&, OSDMap&, bool*,  
ObjectStore::Sequencer&)+0x738) [0x5569304caae8]
 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,  
std::__cxx11::basic_string,  
std::allocator >, ObjectStore::Sequencer&)+0x1135)  
[0x5569304d12f5]

 10: (main()+0x3909) [0x556930432349]
 11: (__libc_start_main()+0xf1) [0x7fd444d252b1]
 12: (_start()+0x2a) [0x5569304ba01a]
Aborted




What I also do not understand: If I take your approach of finding out
what is stored in the PG, I get no match with my PG ID anymore.

If I take the approach of "rbd info" which was posted by Mykola Golub, I
get a match - unfortunately the most important VM on our system which
holds the software for our Finance.

Best
Karsten









On 20.02.2018 09:16, Eugen Block wrote:

And does the re-import of the PG work? From the logs I assumed that the
snapshot(s) prevented a successful import, but now that they are deleted
it could work.


Zitat von Karsten Becker :


Hi Eugen,

hmmm, that should be :


rbd -p cpVirtualMachines list | while read LINE; do osdmaptool
--test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls
cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while
read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2
--pool 10 osdmap 2>&1; done; done | less


It's a Proxmox syst

Re: [ceph-users] mon service failed to start

2018-02-20 Thread Behnam Loghmani
Hi Caspar,

I checked the filesystem and there isn't any error on filesystem.
The disk is SSD and it doesn't any attribute related to Wear level in
smartctl and filesystem is mounted with default options and no discard.

my ceph structure on this node is like this:

it has osd,mon,rgw services
1 SSD for OS and WAL/DB
2 HDD

OSDs are created by ceph-volume lvm.

the whole SSD is on 1 vg.
OS is on root lv
OSD.1 DB is on db-a
OSD.1 WAL is on wal-a
OSD.2 DB is on db-b
OSD.2 WAL is on wal-b

output of lvs:

  data-a data-a -wi-a-
  data-b data-b -wi-a-
  db-a   vg0
-wi-a-
  db-b   vg0
-wi-a-
  root   vg0
-wi-ao
  wal-a  vg0
-wi-a-
  wal-b  vg0-wi-a-

after making a heavy write on the radosgw, OSD.1 and OSD.2 has stopped with
"block checksum mismatch" error.
Now on this node MON and OSDs services has stopped working with this error

I think my issue is related to this bug:
http://tracker.ceph.com/issues/22102

I ran
#ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-1 --deep 1
but it returns the same error:

*** Caught signal (Aborted) **
 in thread 7fbf6c923d00 thread_name:ceph-bluestore-
2018-02-20 16:44:30.128787 7fbf6c923d00 -1 abort: Corruption: block
checksum mismatch
 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
(stable)
 1: (()+0x3eb0b1) [0x55f779e6e0b1]
 2: (()+0xf5e0) [0x7fbf61ae15e0]
 3: (gsignal()+0x37) [0x7fbf604d31f7]
 4: (abort()+0x148) [0x7fbf604d48e8]
 5: (RocksDBStore::get(std::string const&, char const*, unsigned long,
ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce]
 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x545)
[0x55f779cd8f75]
 7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75]
 8: (main()+0xde0) [0x55f779baab90]
 9: (__libc_start_main()+0xf5) [0x7fbf604bfc05]
 10: (()+0x1bc59f) [0x55f779c3f59f]
2018-02-20 16:44:30.131334 7fbf6c923d00 -1 *** Caught signal (Aborted) **
 in thread 7fbf6c923d00 thread_name:ceph-bluestore-

 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
(stable)
 1: (()+0x3eb0b1) [0x55f779e6e0b1]
 2: (()+0xf5e0) [0x7fbf61ae15e0]
 3: (gsignal()+0x37) [0x7fbf604d31f7]
 4: (abort()+0x148) [0x7fbf604d48e8]
 5: (RocksDBStore::get(std::string const&, char const*, unsigned long,
ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce]
 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x545)
[0x55f779cd8f75]
 7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75]
 8: (main()+0xde0) [0x55f779baab90]
 9: (__libc_start_main()+0xf5) [0x7fbf604bfc05]
 10: (()+0x1bc59f) [0x55f779c3f59f]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

-1> 2018-02-20 16:44:30.128787 7fbf6c923d00 -1 abort: Corruption: block
checksum mismatch
 0> 2018-02-20 16:44:30.131334 7fbf6c923d00 -1 *** Caught signal
(Aborted) **
 in thread 7fbf6c923d00 thread_name:ceph-bluestore-

 ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
(stable)
 1: (()+0x3eb0b1) [0x55f779e6e0b1]
 2: (()+0xf5e0) [0x7fbf61ae15e0]
 3: (gsignal()+0x37) [0x7fbf604d31f7]
 4: (abort()+0x148) [0x7fbf604d48e8]
 5: (RocksDBStore::get(std::string const&, char const*, unsigned long,
ceph::buffer::list*)+0x1ce) [0x55f779d2b5ce]
 6: (BlueStore::Collection::get_onode(ghobject_t const&, bool)+0x545)
[0x55f779cd8f75]
 7: (BlueStore::_fsck(bool, bool)+0x1bb5) [0x55f779cf1a75]
 8: (main()+0xde0) [0x55f779baab90]
 9: (__libc_start_main()+0xf5) [0x7fbf604bfc05]
 10: (()+0x1bc59f) [0x55f779c3f59f]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.



Could you please help me to recover this node or find a way to prove SSD
disk problem.

Best regards,
Behnam Loghmani




On Mon, Feb 19, 2018 at 1:35 PM, Caspar Smit  wrote:

> Hi Behnam,
>
> I would firstly recommend running a filesystem check on the monitor disk
> first to see if there are any inconsistencies.
>
> Is the disk where the monitor is running on a spinning disk or SSD?
>
> If SSD you should check the Wear level stats through smartctl.
> Maybe trim (discard) enabled on the filesystem mount? (discard could cause
> problems/corruption in combination with certain SSD firmwares)
>
> Caspar
>
> 2018-02-16 23:03 GMT+01:00 Behnam Loghmani :
>
>> I checked the disk that monitor is on it with smartctl and it didn't
>> return any error and it doesn't have any Current_Pending_Sector.
>> Do you recommend any disk checks to make sure that this disk has problem
>> and then I can send the report to the provider for replacing the disk
>>
>> On Sat, Feb 17, 2018 at 1:09 AM, Gregory Farnum 
>> wrote:
>>
>>> The disk that the monitor is on...there isn't anything for you to
>>> configure about a monitor WAL though so I'm not sure how that enters into
>>> it?
>>>
>>> On Fri, Feb 16, 2018 at 12:46 PM Behnam Loghmani <
>>> behnam.loghm...@gmail.com> wrote:
>>>
 Thanks for your reply

 Do you mean, that's the problem with the disk I use for WAL and DB?

 On Fri, Feb 16, 2018 at 11:33 PM, Gregory Far

Re: [ceph-users] Missing clones

2018-02-20 Thread Karsten Becker
Nope:

> Write #10:9df3943b:::rbd_data.e57feb238e1f29.0003c2e1:head#
> snapset 0=[]:{}
> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:19#
> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:23#
> Write #10:9df399dd:::rbd_data.4401c7238e1f29.050d:head#
> snapset 612=[23,22,15]:{19=[15],23=[23,22]}
> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function 'void 
> SnapMapper::add_oid(const hobject_t&, const std::set&, 
> MapCacher::Transaction, 
> ceph::buffer::list>*)' thread 7fd45147a400 time 2018-02-20 13:56:20.672430
> /home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED assert(r 
> == -2)
>  ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous 
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x102) [0x7fd4478c68f2]
>  2: (SnapMapper::add_oid(hobject_t const&, std::set std::less, std::allocator > const&, 
> MapCacher::Transaction std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) 
> [0x556930765fe9]
>  3: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, 
> ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b]
>  4: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, 
> OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8]
>  5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, 
> std::__cxx11::basic_string, std::allocator 
> >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5]
>  6: (main()+0x3909) [0x556930432349]
>  7: (__libc_start_main()+0xf1) [0x7fd444d252b1]
>  8: (_start()+0x2a) [0x5569304ba01a]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> interpret this.
> *** Caught signal (Aborted) **
>  in thread 7fd45147a400 thread_name:ceph-objectstor
>  ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous 
> (stable)
>  1: (()+0x913f14) [0x556930ae1f14]
>  2: (()+0x110c0) [0x7fd44619e0c0]
>  3: (gsignal()+0xcf) [0x7fd444d37fcf]
>  4: (abort()+0x16a) [0x7fd444d393fa]
>  5: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x28e) [0x7fd4478c6a7e]
>  6: (SnapMapper::add_oid(hobject_t const&, std::set std::less, std::allocator > const&, 
> MapCacher::Transaction std::char_traits, std::allocator >, ceph::buffer::list>*)+0x8e9) 
> [0x556930765fe9]
>  7: (get_attrs(ObjectStore*, coll_t, ghobject_t, ObjectStore::Transaction*, 
> ceph::buffer::list&, OSDriver&, SnapMapper&)+0xafb) [0x5569304ca01b]
>  8: (ObjectStoreTool::get_object(ObjectStore*, coll_t, ceph::buffer::list&, 
> OSDMap&, bool*, ObjectStore::Sequencer&)+0x738) [0x5569304caae8]
>  9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool, 
> std::__cxx11::basic_string, std::allocator 
> >, ObjectStore::Sequencer&)+0x1135) [0x5569304d12f5]
>  10: (main()+0x3909) [0x556930432349]
>  11: (__libc_start_main()+0xf1) [0x7fd444d252b1]
>  12: (_start()+0x2a) [0x5569304ba01a]
> Aborted



What I also do not understand: If I take your approach of finding out
what is stored in the PG, I get no match with my PG ID anymore.

If I take the approach of "rbd info" which was posted by Mykola Golub, I
get a match - unfortunately the most important VM on our system which
holds the software for our Finance.

Best
Karsten









On 20.02.2018 09:16, Eugen Block wrote:
> And does the re-import of the PG work? From the logs I assumed that the
> snapshot(s) prevented a successful import, but now that they are deleted
> it could work.
> 
> 
> Zitat von Karsten Becker :
> 
>> Hi Eugen,
>>
>> hmmm, that should be :
>>
>>> rbd -p cpVirtualMachines list | while read LINE; do osdmaptool
>>> --test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls
>>> cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' | while
>>> read LINE2; do echo "$LINE"; osdmaptool --test-map-object $LINE2
>>> --pool 10 osdmap 2>&1; done; done | less
>>
>> It's a Proxmox system. There were only two snapshots on the PG, which I
>> deleted now. Now nothing gets displayed on the PG... is that possible? A
>> repair still fails unfortunately...
>>
>> Best & thank you for the hint!
>> Karsten
>>
>>
>>
>> On 19.02.2018 22:42, Eugen Block wrote:
 BTW - how can I find out, which RBDs are affected by this problem.
 Maybe
 a copy/remove of the affected RBDs could help? But how to find out to
 which RBDs this PG belongs to?
>>>
>>> Depending on how many PGs your cluster/pool has, you could dump your
>>> osdmap and then run the osdmaptool [1] for every rbd object in your pool
>>> and grep for the affected PG. That would be quick for a few objects, I
>>> guess:
>>>
>>> ceph1:~ # ceph osd getmap -o /tmp/osdmap
>>>
>>> ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap
>>> osdmaptool: osdmap file '/tmp/osdmap'
>>>  object 'image1' -> 5.2 -> [0]
>>>
>>> ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap
>>> osdmaptool: osdmap file '/tmp/osdmap'
>>>  object 'image2'

Re: [ceph-users] Luminous : performance degrade while read operations (ceph-volume)

2018-02-20 Thread Alfredo Deza
On Mon, Feb 19, 2018 at 9:29 PM, nokia ceph 
wrote:

> Hi Alfredo Deza,
>
> We have 5 node platforms with lvm osd created from scratch and another 5
> node platform migrated from kraken which is ceph volume simple. Both has
> same issue . Both platform has only hdd for osd.
>
> We also noticed 2 times disk iops more compare to kraken , this causes
> less read performance. During rocksdb compaction the situation is worse.
>
>
> Meanwhile we are building another platform creating osd using ceph-disk
> and analyse on this.
>

If you have two platforms, one with `simple` and the other one with `lvm`
experiencing the same, then something else must be at fault here.

The `simple` setup in ceph-volume basically keeps everything as it was
before, it just captures details of what devices were being used so OSDs
can be started. There is no interaction from ceph-volume
in there that could cause something like this.



> Thanks,
> Muthu
>
>
>
> On Tuesday, February 20, 2018, Alfredo Deza  wrote:
>
>>
>>
>> On Mon, Feb 19, 2018 at 2:01 PM, nokia ceph 
>> wrote:
>>
>>> Hi All,
>>>
>>> We have 5 node clusters with EC 4+1 and use bluestore since last year
>>> from Kraken.
>>> Recently we migrated all our platforms to luminous 12.2.2 and finally
>>> all OSDs migrated to ceph-volume simple type and on few platforms installed
>>> ceph using ceph-volume .
>>>
>>> Now we see two times more traffic in read compare to client traffic on
>>> migrated platform and newly created platforms . This was not the case in
>>> older releases where ceph status read B/W will be same as client read
>>> traffic.
>>>
>>> Some network graphs :
>>>
>>> *Client network interface* towards ceph public interface : shows
>>> *4.3Gbps* read
>>>
>>>
>>> [image: Inline image 2]
>>>
>>> *Ceph Node Public interface* : Each node around 960Mbps * 5 node =* 4.6
>>> Gbps *- this matches.
>>> [image: Inline image 3]
>>>
>>> Ceph status output : show  1032 MB/s =* 8.06 Gbps*
>>>
>>> cn6.chn6us1c1.cdn ~# ceph status
>>>   cluster:
>>> id: abda22db-3658-4d33-9681-e3ff10690f88
>>> health: HEALTH_OK
>>>
>>>   services:
>>> mon: 5 daemons, quorum cn6,cn7,cn8,cn9,cn10
>>> mgr: cn6(active), standbys: cn7, cn9, cn10, cn8
>>> osd: 340 osds: 340 up, 340 in
>>>
>>>   data:
>>> pools:   1 pools, 8192 pgs
>>> objects: 270M objects, 426 TB
>>> usage:   581 TB used, 655 TB / 1237 TB avail
>>> pgs: 8160 active+clean
>>>  32   active+clean+scrubbing
>>>
>>>   io:
>>> client:   *1032 MB/s rd*, 168 MB/s wr, 1908 op/s rd, 1594 op/s wr
>>>
>>>
>>> Write operation we don't see this issue. Client traffic and this
>>> matches.
>>> Is this expected behavior in Luminous and ceph-volume lvm or a bug ?
>>> Wrong calculation in ceph status read B/W ?
>>>
>>
>> You mentioned `ceph-volume simple` but here you say lvm. With LVM
>> ceph-volume will create the OSDs from scratch, while "simple" will keep
>> whatever OSD was created before.
>>
>> Have you created the OSDs from scratch with ceph-volume? or is it just
>> using "simple" , managing a previously deployed OSD?
>>
>>>
>>> Please provide your feedback.
>>>
>>> Thanks,
>>> Muthu
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df: Raw used vs. used vs. actual bytes in cephfs

2018-02-20 Thread Igor Fedotov
Another space "leak" might be due BlueStore misbehavior that takes DB 
partition(s) space into account when calculating total store size. And 
all this space is immediately marked as used even for an empty store. So 
if you have 3 OSD with 10 Gb DB device each you unconditionally get 30 
Gb used space in the report.


Plus additional 1Gb (with default settings) per each OSD as BlueStore 
unconditionally locks that space at block device for BlueFS usage.


Also it might allocate (and hence report as used) even more space at 
block device for BlueFS if DB partition isn't enough. You should inspect 
OSD performance counters under "bluefs" section to check that amount.



Also please note that for 64K allocation
On 2/20/2018 10:33 AM, Flemming Frandsen wrote:

I didn't know about ceph df detail, that's quite useful, thanks.

I was thinking that the problem had to do with some sort of internal 
fragmentation, because the filesystem in question does have millions 
(2.9 M or threabouts) of files, however, even if 4k is lost for each 
file, that only amounts to about 23 GB of raw space lost and I have 
3276 GB of raw space unaccounted for.


I've researched the min alloc option a bit and even though no 
documentation seems to exist, I've found that the default is 64k for 
hdd, but even if the lost space per file is 64k and that's mirrored, I 
can only account for 371 GB, so that doesn't really help a great deal.


I have set up an experimental cluster with "bluestore min alloc size = 
4096" and so far I've been unable to make it lose space like the first 
cluster.



I'm very worried that ceph is unusable because of this issue.



On 19/02/18 19:38, Pavel Shub wrote:

Could you be running into block size (minimum allocation unit)
overhead? Default bluestore block size is 4k for hdd and 64k for ssd.
This is exacerbated if you have tons of small files. I tend to see
this when "ceph df detail" sum of raw used in pools is less than the
global raw bytes used.

On Mon, Feb 19, 2018 at 2:09 AM, Flemming Frandsen
 wrote:

Each OSD lives on a separate HDD in bluestore with the journals on 2GB
partitions on a shared SSD.


On 16/02/18 21:08, Gregory Farnum wrote:

What does the cluster deployment look like? Usually this happens 
when you’re
sharing disks with the OS, or have co-located file journals or 
something.

On Fri, Feb 16, 2018 at 4:02 AM Flemming Frandsen
 wrote:

I'm trying out cephfs and I'm in the process of copying over some
real-world data to see what happens.

I have created a number of cephfs file systems, the only one I've
started working on is the one called jenkins specifically the one 
named

jenkins which lives in fs_jenkins_data and fs_jenkins_metadata.

According to ceph df I have about 1387 GB of data in all of the pools,
while the raw used space is 5918 GB, which gives a ratio of about 
4.3, I
would have expected a ratio around 2 as the pool size has been set 
to 2.



Can anyone explain where half my space has been squandered?

  > ceph df
GLOBAL:
  SIZE  AVAIL RAW USED %RAW USED
  8382G 2463G    5918G 70.61
POOLS:
  NAME ID USED %USED MAX
AVAIL OBJECTS
  .rgw.root    1    1113 0 258G
4
  default.rgw.control  2   0 0 258G
8
  default.rgw.meta 3   0 0 258G
0
  default.rgw.log  4   0 0 258G
207
  fs_docker-nexus_data 5  66120M 11.09 258G
22655
  fs_docker-nexus_metadata 6  39463k 0 258G
2376
  fs_meta_data 7 330 0 258G
4
  fs_meta_metadata 8    567k 0 258G
22
  fs_jenkins_data  9   1321G 71.84 258G
28576278
  fs_jenkins_metadata  10 52178k 0 258G
2285493
  fs_nexus_data    11  0 0 258G
0
  fs_nexus_metadata    12   4181 0 258G
21

--
   Regards Flemming Frandsen - Stibo Systems - DK - STEP Release 
Manager

   Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
  Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
  Please use rele...@stibo.com for all Release Management requests


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore min alloc size vs. wasted space

2018-02-20 Thread Igor Fedotov



On 2/20/2018 11:57 AM, Flemming Frandsen wrote:
I have set up a little ceph installation and added about 80k files of 
various sizes, then I added 1M files of 1 byte each totalling 1 MB, to 
see what kind of overhead is incurred per object.


The overhead for adding 1M objects seems to be 12252M/100 = 
0.012252M or 122 kB per file, which is a bit high, but in line with a 
min allocation size of 64 kB.

0.012M = 12Kb, not 122.



My ceph.conf file contained this line from when I initially deployed 
the cluster:

    bluestore min alloc size = 4096

So your  min alloc size settings for the cluster in question is 4K not 
64K, right?

And pool replication factor is 3, isn't it?
Then one can probably explain additional 12Gb of raw space as:
1M objects * min_alloc_size * replication_factor = 1E6 * 4096 * 3 = 12GB


How do I set the min alloc size if not in the ceph.conf file?

Is it possible to change bluestore min alloc size for an existing 
cluster? How?
This is per-osd setting that can't be altered after OSD deployment. So 
you should either redeploy the cluster totally or redeploy OSDs one by 
one if you want to preserve your data.



Even at this level of overhead I'm nowhere near to the 1129 kB per 
file was lost with the real data.



GLOBAL:
    SIZE AVAIL RAW USED %RAW USED OBJECTS
    273G  253G   19906M  7.12   81059
POOLS:
    NAME    ID QUOTA OBJECTS QUOTA BYTES 
USED   %USED MAX AVAIL OBJECTS DIRTY READ WRITE 
RAW USED
    .rgw.root   1  N/A N/A 1113 0  
120G 4 4   108 4 2226
    default.rgw.control 2  N/A N/A 0 0  120G 
8 8 0 0    0
    default.rgw.meta    3  N/A N/A 0 0  120G 
0 0 0 0    0
    default.rgw.log 4  N/A N/A 0 0  
120G 207 207 54085 36014    0
    fs1_data    5  N/A N/A  7890M 
3.11  120G   80001 80001 0  715k 15781M
    fs1_metadata    6  N/A N/A 40951k 
0.02  120G 839 839   682  103k 81902k


Overhead per object: (19586M-15781M) / 81059 = 0.046M = 46 kB per object



Added 1M files of 1 byte each totalling 1 MB:


GLOBAL:
    SIZE AVAIL RAW USED %RAW USED OBJECTS
    273G  241G   32158M 11.50   1056k
POOLS:
    NAME    ID QUOTA OBJECTS QUOTA BYTES 
USED   %USED MAX AVAIL OBJECTS DIRTY READ WRITE 
RAW USED
    .rgw.root   1  N/A N/A 1113 0  
114G 4 4   108 4 2226
    default.rgw.control 2  N/A N/A 0 0  114G 
8 8 0 0    0
    default.rgw.meta    3  N/A N/A 0 0  114G 
0 0 0 0    0
    default.rgw.log 4  N/A N/A 0 0  
114G 207 207 56374 37540    0
    fs1_data    5  N/A N/A  7891M 
3.27  114G 1080001 1054k  287k 3645k 15783M
    fs1_metadata    6  N/A N/A 29854k 
0.01  114G    1837 1837  5739  118k 59708k


Delta:
   fs1_data: +2M raw space as expected
   fs1_metadata: -22M raw space, because who the fuck knows?
   RAW USED: +12252M



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] bluestore min alloc size vs. wasted space

2018-02-20 Thread Flemming Frandsen
I have set up a little ceph installation and added about 80k files of 
various sizes, then I added 1M files of 1 byte each totalling 1 MB, to 
see what kind of overhead is incurred per object.


The overhead for adding 1M objects seems to be 12252M/100 = 
0.012252M or 122 kB per file, which is a bit high, but in line with a 
min allocation size of 64 kB.



My ceph.conf file contained this line from when I initially deployed the 
cluster:

bluestore min alloc size = 4096

How do I set the min alloc size if not in the ceph.conf file?

Is it possible to change bluestore min alloc size for an existing 
cluster? How?



Even at this level of overhead I'm nowhere near to the 1129 kB per file 
was lost with the real data.



GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
273G  253G   19906M  7.12   81059
POOLS:
NAMEID QUOTA OBJECTS QUOTA BYTES 
USED   %USED MAX AVAIL OBJECTS DIRTY READ  WRITE 
RAW USED
.rgw.root   1  N/A N/A   1113 
0  120G 4 4   108 4 2226
default.rgw.control 2  N/A N/A  0 
0  120G 8 8 0 00
default.rgw.meta3  N/A N/A  0 
0  120G 0 0 0 00
default.rgw.log 4  N/A N/A  0 
0  120G 207 207 54085 360140
fs1_data5  N/A N/A  7890M  
3.11  120G   80001 80001 0  715k   15781M
fs1_metadata6  N/A N/A 40951k  
0.02  120G 839 839   682  103k   81902k


Overhead per object: (19586M-15781M) / 81059 = 0.046M = 46 kB per object



Added 1M files of 1 byte each totalling 1 MB:


GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
273G  241G   32158M 11.50   1056k
POOLS:
NAMEID QUOTA OBJECTS QUOTA BYTES 
USED   %USED MAX AVAIL OBJECTS DIRTY READ  WRITE 
RAW USED
.rgw.root   1  N/A N/A   1113 
0  114G 4 4   108 4 2226
default.rgw.control 2  N/A N/A  0 
0  114G 8 8 0 00
default.rgw.meta3  N/A N/A  0 
0  114G 0 0 0 00
default.rgw.log 4  N/A N/A  0 
0  114G 207 207 56374 375400
fs1_data5  N/A N/A  7891M  
3.27  114G 1080001 1054k  287k 3645k   15783M
fs1_metadata6  N/A N/A 29854k  
0.01  114G1837 1837  5739  118k   59708k


Delta:
   fs1_data: +2M raw space as expected
   fs1_metadata: -22M raw space, because who the fuck knows?
   RAW USED: +12252M

--
 Regards Flemming Frandsen - Stibo Systems - DK - STEP Release Manager
 Please use rele...@stibo.com for all Release Management requests

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "Cannot get stat of OSD" in ceph.mgr.log upon enabling influx plugin

2018-02-20 Thread knawnd
Ben, first of all thanks a lot for such quick reply! I appreciate a provided explanation and info on 
things to check!
I am new to all that that including InfluxDB that's why I used wrong influx cli to check if there 
are actual data is coming. But 
https://docs.influxdata.com/influxdb/v1.4/query_language/schema_exploration/ page helped to figure 
it out. So data is coming from ceph mgr node to influx if ssl and verify_ssl options are set to 
false. But it seems with self-signed certificate, ssl=true and verify_ssl=false data is not filling 
up InfluxDB database. I will try to apply your suggestions on debug to find out a source of the 
problem. But that is a different story.


Thanks again for your prompt and informative reply!

Benjeman Meekhof wrote on 19/02/18 18:07:

The 'cannot stat' messages are normal at startup, we see them also in
our working setup with mgr influx module.  Maybe they could be fixed
by delaying the module startup,  or having it check for some other
'all good' status but I haven't looked into it.  You should only be
seeing them when the mgr initially loads.

As far as not getting data, if the self-test works and outputs metrics
then the module is reading metrics ok from the mgr.  A few things you
could try:

- Check that the user you setup has rights to the destination
database, or admin rights to create database if you did not create and
setup beforehand
- Increase mgr debug and see if anything is showing up:  ceph tell
mgr.* injectargs '--debug_mgr 20'(this will be a lot of logging,
be sure to reset to 1/5 default)
- Check that your influx server is getting the traffic:   ' tcpdump -i
eth1 port 8086 and src host.example '

thanks,
Ben

On Mon, Feb 19, 2018 at 9:36 AM,   wrote:

Forgot to mentioned that influx self-test produces a reasonable output too
(long json list with some metrics and timestamps) as well as there are the
following lines in mgr log:

2018-02-19 17:35:04.208858 7f33a50ec700  1 mgr.server reply handle_command
(0) Success
2018-02-19 17:35:04.245285 7f33a50ec700  0 log_channel(audit) log [DBG] :
from='client.344950 :0/3773014505' entity='client.admin'
cmd=[{"prefix": "influx self-test"}]: dispatch
2018-02-19 17:35:04.245314 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer status'
2018-02-19 17:35:04.245319 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer mode'
2018-02-19 17:35:04.245323 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer on'
2018-02-19 17:35:04.245327 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer off'
2018-02-19 17:35:04.245331 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer eval'
2018-02-19 17:35:04.245335 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer eval-verbose'
2018-02-19 17:35:04.245339 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer optimize'
2018-02-19 17:35:04.245343 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer show'
2018-02-19 17:35:04.245347 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer rm'
2018-02-19 17:35:04.245351 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer reset'
2018-02-19 17:35:04.245354 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer dump'
2018-02-19 17:35:04.245358 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'balancer execute'
2018-02-19 17:35:04.245363 7f33a50ec700  1 mgr.server handle_command
pyc_prefix: 'influx self-test'
2018-02-19 17:35:04.402782 7f33a58ed700  1 mgr.server reply handle_command
(0) Success Self-test OK

kna...@gmail.com wrote on 19/02/18 17:27:


Dear Ceph users,

I am trying to enable influx plugin for ceph following
http://docs.ceph.com/docs/master/mgr/influx/ but no data comes to influxdb
DB. As soon as 'ceph mgr module enable influx' command is executed on one of
ceph mgr node (running on CentOS 7.4.1708) there are the following messages
in /var/log/ceph/ceph-mgr..log:

2018-02-19 17:11:05.947122 7f33c9b43600  0 ceph version 12.2.2
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
(unknown), pid 96425
2018-02-19 17:11:05.947737 7f33c9b43600  0 pidfile_write: ignore empty
--pid-file
2018-02-19 17:11:05.986676 7f33c9b43600  1 mgr send_beacon standby
2018-02-19 17:11:06.003029 7f33c0e2a700  1 mgr init Loading python module
'balancer'
2018-02-19 17:11:06.031293 7f33c0e2a700  1 mgr init Loading python module
'dashboard'
2018-02-19 17:11:06.119328 7f33c0e2a700  1 mgr init Loading python module
'influx'
2018-02-19 17:11:06.220394 7f33c0e2a700  1 mgr init Loading python module
'restful'
2018-02-19 17:11:06.398380 7f33c0e2a700  1 mgr init Loading python module
'status'
2018-02-19 17:11:06.919109 7f33c0e2a700  1 mgr handle_mgr_map Activating!
2018-02-19 17:11:06.919454 7f33c0e2a700  1 mgr handle_mgr_map I am now
activating
2018-02-19 17:11:06.952174 7f33a58ed700  1 mgr load Constructed class from
module: balancer
2018-02-19 17:11:06.953259 7f33a58ed700  1 mgr load Constructed class from
module: dashbo

Re: [ceph-users] Missing clones

2018-02-20 Thread Eugen Block
And does the re-import of the PG work? From the logs I assumed that  
the snapshot(s) prevented a successful import, but now that they are  
deleted it could work.



Zitat von Karsten Becker :


Hi Eugen,

hmmm, that should be :

rbd -p cpVirtualMachines list | while read LINE; do osdmaptool  
--test-map-object $LINE --pool 10 osdmap 2>&1; rbd snap ls  
cpVirtualMachines/$LINE | grep -v SNAPID | awk '{ print $2 }' |  
while read LINE2; do echo "$LINE"; osdmaptool --test-map-object  
$LINE2 --pool 10 osdmap 2>&1; done; done | less


It's a Proxmox system. There were only two snapshots on the PG, which I
deleted now. Now nothing gets displayed on the PG... is that possible? A
repair still fails unfortunately...

Best & thank you for the hint!
Karsten



On 19.02.2018 22:42, Eugen Block wrote:

BTW - how can I find out, which RBDs are affected by this problem. Maybe
a copy/remove of the affected RBDs could help? But how to find out to
which RBDs this PG belongs to?


Depending on how many PGs your cluster/pool has, you could dump your
osdmap and then run the osdmaptool [1] for every rbd object in your pool
and grep for the affected PG. That would be quick for a few objects, I
guess:

ceph1:~ # ceph osd getmap -o /tmp/osdmap

ceph1:~ # osdmaptool --test-map-object image1 --pool 5 /tmp/osdmap
osdmaptool: osdmap file '/tmp/osdmap'
 object 'image1' -> 5.2 -> [0]

ceph1:~ # osdmaptool --test-map-object image2 --pool 5 /tmp/osdmap
osdmaptool: osdmap file '/tmp/osdmap'
 object 'image2' -> 5.f -> [0]


[1]
https://www.hastexo.com/resources/hints-and-kinks/which-osd-stores-specific-rados-object/


Zitat von Karsten Becker :


BTW - how can I find out, which RBDs are affected by this problem. Maybe
a copy/remove of the affected RBDs could help? But how to find out to
which RBDs this PG belongs to?

Best
Karsten

On 19.02.2018 19:26, Karsten Becker wrote:

Hi.

Thank you for the tip. I just tried... but unfortunately the import
aborts:


Write #10:9de96eca:::rbd_data.f5b8603d1b58ba.1d82:head#
snapset 0=[]:{}
Write #10:9de973fe:::rbd_data.966489238e1f29.250b:18#
Write #10:9de973fe:::rbd_data.966489238e1f29.250b:24#
Write #10:9de973fe:::rbd_data.966489238e1f29.250b:head#
snapset 628=[24,21,17]:{18=[17],24=[24,21]}
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: In function
'void SnapMapper::add_oid(const hobject_t&, const
std::set&,
MapCacher::Transaction,
ceph::buffer::list>*)' thread 7facba7de400 time 2018-02-19
19:24:18.917515
/home/builder/source/ceph-12.2.2/src/osd/SnapMapper.cc: 246: FAILED
assert(r == -2)
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7facb0c2a8f2]
 2: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&,
MapCacher::Transaction, std::allocator >,
ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9]
 3: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x55eef35f901b]
 4: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
[0x55eef35f9ae8]
 5: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string,
std::allocator >, ObjectStore::Sequencer&)+0x1135)
[0x55eef36002f5]
 6: (main()+0x3909) [0x55eef3561349]
 7: (__libc_start_main()+0xf1) [0x7facae0892b1]
 8: (_start()+0x2a) [0x55eef35e901a]
 NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.
*** Caught signal (Aborted) **
 in thread 7facba7de400 thread_name:ceph-objectstor
 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf)
luminous (stable)
 1: (()+0x913f14) [0x55eef3c10f14]
 2: (()+0x110c0) [0x7facaf5020c0]
 3: (gsignal()+0xcf) [0x7facae09bfcf]
 4: (abort()+0x16a) [0x7facae09d3fa]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x28e) [0x7facb0c2aa7e]
 6: (SnapMapper::add_oid(hobject_t const&, std::set, std::allocator > const&,
MapCacher::Transaction, std::allocator >,
ceph::buffer::list>*)+0x8e9) [0x55eef3894fe9]
 7: (get_attrs(ObjectStore*, coll_t, ghobject_t,
ObjectStore::Transaction*, ceph::buffer::list&, OSDriver&,
SnapMapper&)+0xafb) [0x55eef35f901b]
 8: (ObjectStoreTool::get_object(ObjectStore*, coll_t,
ceph::buffer::list&, OSDMap&, bool*, ObjectStore::Sequencer&)+0x738)
[0x55eef35f9ae8]
 9: (ObjectStoreTool::do_import(ObjectStore*, OSDSuperblock&, bool,
std::__cxx11::basic_string,
std::allocator >, ObjectStore::Sequencer&)+0x1135)
[0x55eef36002f5]
 10: (main()+0x3909) [0x55eef3561349]
 11: (__libc_start_main()+0xf1) [0x7facae0892b1]
 12: (_start()+0x2a) [0x55eef35e901a]
Aborted


Best
Karsten

On 19.02.2018 17:09, Eugen Block wrote:

Could [1] be of interest?
Exporting the intact PG and importing it back to the rescpective OSD
sounds promising.

[1]
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.h